Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
01-05-2021 09:10 AM
I have a very slow search I need to tune up.
I have this schema:
This is a typical email box for a user who might have multiple email accounts.
I want to allow users to search their email, returning all Threads which match
a keyword in any of its own properties, or its participants or messages.
The problem is that my query is very slow at the moment, even for a relatively small data set.
CALL db.index.fulltext.createNodeIndex(
"emailThreadsIndex",
[
"Thread",
"Message",
"Participant",
],
[
"snippet",
"body",
"subject",
"name",
"email",
"description"
]
)
This returns 3 different nodes, but I only want to return 1 to the user.
Therefore I need to
In a small dataset this takes 20 seconds to return no results,
or over 20 minutes before I killed the process to query for data for which I
know there should be results
:param { user_id: "292d41ba-d5b7-4f89-ab1a-8bb4382eeaa5" }
CALL db.index.fulltext.queryNodes(
"emailThreadsIndex",
'chloe'
) YIELD node as root_item, score
WITH root_item
MATCH (user)<-[:emails]-(:Email)-[:account]->(:Account)<-[:account_threads]-(threads:Thread)
WHERE user.uuid = {user_id}
WITH threads, root_item
MATCH (threads)
WHERE
(
threads.uuid = root_item.uuid
OR (threads)-[:messages]->(:Message { uuid: root_item.uuid })
OR (threads)-[:participants]->(:Participant { uuid: root_item.uuid })
)
RETURN threads
The following is surprisingly slightly better, but it takes 2.5 seconds simply to return a count,
which on a small data set is far too long - imagine when I'm searching other
data types in an 'all' site search, or when this user's data grows larger than
6 months worth of email.
I'm also unsure if this is additive, i.e. whether each optional match adds onto the next one.
MATCH (user:User)
WHERE user.uuid = {user_id}
WITH user
CALL db.index.fulltext.queryNodes(
"emailThreadsIndex",
'
joe
'
) YIELD node as root_item, score
OPTIONAL MATCH (user)<-[:emails]-(emails:Email)-[:account]->(account:Account)<-[:account_threads]-(threads:Thread)
WHERE threads.uuid = root_item.uuid
OPTIONAL MATCH (user)<-[:emails]-(emails:Email)-[:account]->(account:Account)<-[:account_threads]-(threads:Thread)
WHERE (threads)-[:messages]->(:Message { uuid: root_item.uuid })
OPTIONAL MATCH (user)<-[:emails]-(emails:Email)-[:account]->(account:Account)<-[:account_threads]-(threads:Thread)
WHERE (threads)-[:thread_participants]->(:Participant { uuid: root_item.uuid })
OPTIONAL MATCH (user)<-[:emails]-(emails:Email)-[:account]->(account:Account)<-[:account_threads]-(threads:Thread)
WHERE (threads)-[:thread_tasks]->(:Task { uuid: root_item.uuid })
WITH threads
WHERE threads IS NOT NULL
RETURN count(DISTINCT (threads)) as count
I need to get this down to sub 100ms to be acceptable speed for modern users, desirably faster even, since there's post-processing to do as well.
I view the problem in two parts:
I happened upon a talk on Youtube where Christophe Willemsen advocated for this method:
ids
to this?Is there a more efficient way to do this? Am I missing something?
It seems my queries above aren't efficient, is there something obvious I'm doing wrong?
Later, I want to be able to search across my entire site, which also has the same kind of rules - I want to be able to search relations, scoped by user. This was working OK with optional matches and Ors, but as soon as I introduced a substantial amount of data the searches are now taking 6+ seconds to execute.
For now this seems complicated enough, so let's leave the 'all' search alone for now 🙂
REALLY appreciate any help / pointers people can give me with this! Thanks!
01-09-2021 03:21 PM
Best to run your query with PROFILE to see where the time is spent.
My suggestion
e.g. how long takes this
CALL db.index.fulltext.queryNodes(
"emailThreadsIndex",
'chloe'
) YIELD node as root_item, score
RETURN count(*)
or this
MATCH (user)<-[:emails]-(:Email)-[:account]->(:Account)<-[:account_threads]-(threads:Thread)
WHERE user.uuid = {user_id}
RETURN *
If the threat is the root item, then use it as such and don't do an UUID lookup but use it directly
try a USING JOIN ON account
to trigger a two sided index lookup that's joined in the middle
01-14-2021 09:03 AM
Thanks very much for your response, Michael - will give your suggestions a shot and get back to you.
01-14-2021 11:59 PM
Definitely run your queries with PROFILE to see where the big DB hits are.
I think subqueries
can help here by winnowing down a lot of the DB before expanding the search.
I suggest something like this (I might not have the syntax exactly right):
CALL{
MATCH(user {uuid:user_id}) // having a Label for user could help
RETURN user // returns only one user.
}
WITH user
CALL db.index.fulltext.queryNodes(
"emailThreadsIndex",
'chloe'
) YIELD node as root_item, score
WITH root_item, user
MATCH (user)<-[:emails]-(:Email)-[:account]->(:Account)<-[:account_threads]-(threads:Thread)
WITH threads, root_item
MATCH (threads)
WHERE
(
threads.uuid = root_item.uuid
OR (threads)-[:messages]->(:Message { uuid: root_item.uuid })
OR (threads)-[:participants]->(:Participant { uuid: root_item.uuid })
)
RETURN threads
(It goes without saying, that you should have indexes on uuid.)
I hope this helps.
02-04-2021 11:34 PM
Thanks for your responses guys, appreciate you taking the time out of your busy days to reply.
At this point I think we'll just move to elasticsearch instead.
Thanks though!
All the sessions of the conference are now available online