Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
02-01-2023 09:01 PM
Hi everyone!
I'm trying to query the following pattern:
(a)-[r1]->(b)-[r1|r2*]->(c)-[r1]->(d)
and a, b, c, d might belong to different categories, and they all have the same label.
The Cypher query I'm running is (cat means category):
MATCH p=(a)-[:r1]->(b)-[:r1|r2*1..]->(c)-[:r1]->(d)
WHERE a.cat <> b.cat AND b.cat = c.cat and c.cat <> d.cat and a.cat <> d.cat
RETURN p
The equivalent APOC query I'm using is:
MATCH (a)
CALL apoc.path.expandConfig(a, {
relationshipFilter: 'r1>',
minLevel: 1,
maxLevel:1
})
YIELD path WITH path AS atob,
last(nodes(path)) AS b, a
WHERE a.cat <> b.cat
CALL apoc.path.expandConfig(b, {
relationshipFilter: 'r1>|r2>',
minLevel: 1
})
YIELD path WITH path AS btoc,
last(nodes(path)) AS c, a, b, atob
WHERE b.cat = c.cat
CALL apoc.path.expandConfig(c, {
relationshipFilter: 'r1>',
minLevel: 1,
maxLevel:1
})
YIELD path WITH path AS ctod,
last(nodes(path)) AS d, a, b, c, atob, btoc
WHERE c.cat <> d.cat AND a.cat <> d.cat
WITH atob, [btoc, ctod] AS btod
WITH reduce(acc = atob, x IN btod | apoc.path.combine(acc, x)) AS atod
RETURN atod
The main goal is to compare the performance of these two queries (Cypher and APOC) for research purposes.
The Cypher query is fine, but the APOC query runs forever and returns 0 result (I use apoc.export.csv.query to save it to my disk and the file has 0 lines of record).
I'm running this query on a database with 3087 nodes and 9885 relationships (I guess it's not a huge database?)
I'm wondering if there is anything wrong with my query?
What I'm guessing is the problem of the transitive closure [r1|r2*], if I change it to [r1*] or [r2*], then I can get the result.
Is there anything I could do to improve my APOC query?
Thank you so much!
02-02-2023 05:02 AM
not directly answering your initial inquiry but in your cypher statement there is no usage of labels. and so for example the initial query of
MATCH p=(a)-[:r1]->(b)-[:r1|r2*1..]->(c)-[:r1]->(d)
WHERE a.cat <> b.cat AND b.cat = c.cat and c.cat <> d.cat and a.cat <> d.cat
RETURN p
is going to behave such that match (a) will do a all nodes scan. If your graph has 100 million nodes then the 1st step is to iterate over these 100 million nodes. And the same for the other ( ) refernces.
Does your nodes have labels. And for example if your graph has 100 million nodes and 10 million of these 100 million have a label of :Person then match (a:Person) will only require scanning/iterating over the 100 million nodes with label :Person as opposed to match (a) which is scanning/iterating over the 100 million
02-02-2023 08:36 AM - edited 02-02-2023 08:37 AM
Hi @dana_canzano , thank you so much for your response!
I changed my queries:
MATCH p=(a:label)-[:r1]->(b:label)-[:r1|r2*1..]->(c:label)-[:r1]->(d:label)
WHERE a.cat <> b.cat AND b.cat = c.cat and c.cat <> d.cat and a.cat <> d.cat
RETURN p
and APOC query (added labelFilter)
MATCH (a)
CALL apoc.path.expandConfig(a, {
relationshipFilter: 'r1>',
labelFilter: '>label',
minLevel: 1,
maxLevel:1
})
YIELD path WITH path AS atob,
last(nodes(path)) AS b, a
WHERE a.cat <> b.cat
CALL apoc.path.expandConfig(b, {
relationshipFilter: 'r1>|r2>',
labelFilter: '>label',
minLevel: 1
})
YIELD path WITH path AS btoc,
last(nodes(path)) AS c, a, b, atob
WHERE b.cat = c.cat
CALL apoc.path.expandConfig(c, {
relationshipFilter: 'r1>',
labelFilter: '>label',
minLevel: 1,
maxLevel:1
})
YIELD path WITH path AS ctod,
last(nodes(path)) AS d, a, b, c, atob, btoc
WHERE c.cat <> d.cat AND a.cat <> d.cat
WITH atob, [btoc, ctod] AS btod
WITH reduce(acc = atob, x IN btod | apoc.path.combine(acc, x)) AS atod
RETURN atod
The APOC query still runs forever and returns nothing.
So I'm wondering if there is anything else I should change?
02-02-2023 11:53 AM
Your node labels have a label named 'label` ? is that correct? and in the query all nodes involved all have this lable named 'label' ?
02-02-2023 11:59 AM
@dana_canzano Yes, I just use "label" here to show that a, b, c, and d all have the same label. I cannot share their actual label because they are sensitive data. 😅
I modify the query but still get nothing, so I'm just wondering if there is anything I did wrong?
02-02-2023 05:59 PM
Your query worked with a simple single path.
create(a:label{cat:123})-[:r1]->(:label{cat:543})-[:r1]->(:label{cat:543})-[:r1]->(:label{cat:234324})-[:r1]->(:label{cat:12456})
Does your real query start with 'match(a:label), or does it have a condition to uniquely identify one node. If not, it will be executing this for every node with label 'label'.
Just a note, this is not a typical use of the apoc.path.expandConfig procedure. You are basically piecewise expanding the path yourself from the 'a' node. That is what cypher will do. Maybe you should pick a scenario that is more of a typical use of the apoc procedure. Maybe try determining all the nodes of a complex subgraph.
02-02-2023 06:20 PM - edited 02-02-2023 07:31 PM
@glilienfield Thank you! But I think you missed the r2 relationship in the middle transitive closure.
I use apoc.path.expandConfig procedure this way because it does not support transitive closure in relationshipFilter, for example:
relationshipFilter: 'r1>, (r1>|r2>)*, r1>'
This is the pattern I want but is not supported by apoc, that's why I split them into three pieces and then combine them at the end.
The purpose is to compare pure apoc query and pure cypher query, but the apoc query is not working.
That's why I'm asking if there is anything wrong with my apoc query, or if there is anything I can do to improve my apoc query.
All the sessions of the conference are now available online