Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
04-19-2021 11:31 PM
I will be grateful for your help.
side note: the graph is large with 350 million nodes and 450 million relationships.
Also, anything higher than [:CAUSES*1..2] runs indefinitely and no result is shown.
04-20-2021 02:42 AM
A few questions for you to better analyze your issue :
Other topic : your query builds a cartesian product on the first row, meaning that if you get n Tuberculosis nodes, and m ChestPain nodes, you are passing along n*m nodes to the rest of the execution.
Consider writing it that way :
MATCH path=(n:Concept)-[:CAUSES*]->(m:Concept)
WHERE n.name CONTAINS "xxx" AND m.name CONTAINS "yyy" AND all(a in relationships(path) where a.freq>2)
04-20-2021 09:33 AM
thank you for you response.
no there are no indexes,
if i use
(n)-[:causes]->(n1)-[:causes]->(n3)
i get a return count,
but if i do
**(n)-[:causes]->(n1)-[:causes]->(n3)-[:causes]->(n4), it runs indefinitely
as far as starts with, it still has the same issue.
writing it the way you suggested return value of zero even for the positive examples that i know for sure are not zero. Ithink that the names are a mix of upper and lower cases and for that reason i used toUpper, to eliminate that issue.
keep in mind that the node label is Concept and property key is name.
04-21-2021 01:35 AM
You have several potential issues here.
These two issues do not relate to the issue you have with your 2+-length paths, but you should still consider fixing them as it will become an issue sooner or later.
Now, for your issue with paths. Could you execute the following query and provide back the execution plan (see example picture below the query) ? This will help understand what is happening.
Please note that I left the toUpper(m.name) even though you should change that to benefit from indexing. I'll just leave that for now so you don't have to make changes to your property and indexes right now.
EXPLAIN
MATCH path=(n:Concept)-[:CAUSES*]->(m:Concept)
WHERE toUpper(n.name) CONTAINS "TUBERCULOSIS" AND toUpper(m.name) CONTAINS "CHEST PAIN" AND all(a in relationships(path) where a.freq>2)
RETURN count(distinct (path)) as distinct_paths
04-21-2021 05:05 AM
This is the plan
04-22-2021 02:44 AM
You should enforce the indexes, i.e. store your indexed names as uppercase so you don't need to do the function and it uses the index, and can start from both sides for the variable path length.
it's also better to use concrete matching entries which is faster than contains (or use medical identifiers instead of names)
Adding a max limit to the path will also help
Unfortunately accessing and checking rel-properties on a lot of relationships takes time.
One thing you can do in your model is to create those relationships then your filter will be faster:
MATCH (a:Concept)-[:CAUSES]->(b:Concept) WHERE r.freq > 2
CREATE (a)-[:FREQUENT]->(b)
and then use that FREQUENT
relationship in your path query.
match path=(n:Concept)-[:FREQUENT*]->(m:Concept)
where n.name contains "TUBERCULOSIS" AND m.name contains "CHEST PAIN"
RETURN count(*) as distinct_paths
05-20-2021 04:05 PM
06-23-2021 04:19 PM
With neo4j 4.3 you can create a relationship property index which should speed up your query quite a bit.
All the sessions of the conference are now available online