Neo4j

echo_xiangchen · ‎02-01-2023

Hi everyone!

I'm trying to query the following pattern:

(a)-[r1]->(b)-[r1|r2*]->(c)-[r1]->(d)

and a, b, c, d might belong to different categories, and they all have the same label.

The Cypher query I'm running is (cat means category):

MATCH p=(a)-[:r1]->(b)-[:r1|r2*1..]->(c)-[:r1]->(d) 
WHERE a.cat <> b.cat AND b.cat = c.cat and c.cat <> d.cat and a.cat <> d.cat
RETURN p

The equivalent APOC query I'm using is:

MATCH (a)
CALL apoc.path.expandConfig(a, {
    relationshipFilter: 'r1>',
    minLevel: 1,
    maxLevel:1
})
YIELD path WITH path AS atob, 
last(nodes(path)) AS b, a
WHERE a.cat <> b.cat 

CALL apoc.path.expandConfig(b, {
    relationshipFilter: 'r1>|r2>',
    minLevel: 1
})
YIELD path WITH path AS btoc, 
last(nodes(path)) AS c, a, b, atob
WHERE b.cat = c.cat 

CALL apoc.path.expandConfig(c, {
    relationshipFilter: 'r1>',
    minLevel: 1,
    maxLevel:1
})
YIELD path WITH path AS ctod, 
last(nodes(path)) AS d, a, b, c, atob, btoc
WHERE c.cat <> d.cat AND a.cat <> d.cat 

WITH atob, [btoc, ctod] AS btod

WITH reduce(acc = atob, x IN btod | apoc.path.combine(acc, x)) AS atod

RETURN atod

The main goal is to compare the performance of these two queries (Cypher and APOC) for research purposes.

The Cypher query is fine, but the APOC query runs forever and returns 0 result (I use apoc.export.csv.query to save it to my disk and the file has 0 lines of record).

I'm running this query on a database with 3087 nodes and 9885 relationships (I guess it's not a huge database?)

I'm wondering if there is anything wrong with my query?

What I'm guessing is the problem of the transitive closure [r1|r2*], if I change it to [r1*] or [r2*], then I can get the result.

Is there anything I could do to improve my APOC query?

Thank you so much!

dana_canzano · ‎02-02-2023

@echo_xiangchen

not directly answering your initial inquiry but in your cypher statement there is no usage of labels. and so for example the initial query of

MATCH p=(a)-[:r1]->(b)-[:r1|r2*1..]->(c)-[:r1]->(d) 
WHERE a.cat <> b.cat AND b.cat = c.cat and c.cat <> d.cat and a.cat <> d.cat
RETURN p

is going to behave such that match (a) will do a all nodes scan. If your graph has 100 million nodes then the 1st step is to iterate over these 100 million nodes. And the same for the other ( ) refernces.

Does your nodes have labels. And for example if your graph has 100 million nodes and 10 million of these 100 million have a label of :Person then match (a:Person) will only require scanning/iterating over the 100 million nodes with label :Person as opposed to match (a) which is scanning/iterating over the 100 million

echo_xiangchen · ‎02-02-2023

Hi @dana_canzano , thank you so much for your response!

I changed my queries:

MATCH p=(a:label)-[:r1]->(b:label)-[:r1|r2*1..]->(c:label)-[:r1]->(d:label) 
WHERE a.cat <> b.cat AND b.cat = c.cat and c.cat <> d.cat and a.cat <> d.cat
RETURN p

and APOC query (added labelFilter)

MATCH (a)
CALL apoc.path.expandConfig(a, {
    relationshipFilter: 'r1>',
    labelFilter: '>label',
    minLevel: 1,
    maxLevel:1
})
YIELD path WITH path AS atob, 
last(nodes(path)) AS b, a
WHERE a.cat <> b.cat 

CALL apoc.path.expandConfig(b, {
    relationshipFilter: 'r1>|r2>',
    labelFilter: '>label',
    minLevel: 1
})
YIELD path WITH path AS btoc, 
last(nodes(path)) AS c, a, b, atob
WHERE b.cat = c.cat 

CALL apoc.path.expandConfig(c, {
    relationshipFilter: 'r1>',
    labelFilter: '>label',
    minLevel: 1,
    maxLevel:1
})
YIELD path WITH path AS ctod, 
last(nodes(path)) AS d, a, b, c, atob, btoc
WHERE c.cat <> d.cat AND a.cat <> d.cat 

WITH atob, [btoc, ctod] AS btod

WITH reduce(acc = atob, x IN btod | apoc.path.combine(acc, x)) AS atod

RETURN atod

The APOC query still runs forever and returns nothing.

So I'm wondering if there is anything else I should change?

dana_canzano · ‎02-02-2023

@echo_xiangchen

Your node labels have a label named 'label` ? is that correct? and in the query all nodes involved all have this lable named 'label' ?

echo_xiangchen · ‎02-02-2023

@dana_canzano Yes, I just use "label" here to show that a, b, c, and d all have the same label. I cannot share their actual label because they are sensitive data. 😅

I modify the query but still get nothing, so I'm just wondering if there is anything I did wrong?

glilienfield · ‎02-02-2023

Your query worked with a simple single path.

create(a:label{cat:123})-[:r1]->(:label{cat:543})-[:r1]->(:label{cat:543})-[:r1]->(:label{cat:234324})-[:r1]->(:label{cat:12456})

Does your real query start with 'match(a:label), or does it have a condition to uniquely identify one node. If not, it will be executing this for every node with label 'label'.

Just a note, this is not a typical use of the apoc.path.expandConfig procedure. You are basically piecewise expanding the path yourself from the 'a' node. That is what cypher will do. Maybe you should pick a scenario that is more of a typical use of the apoc procedure. Maybe try determining all the nodes of a complex subgraph.

echo_xiangchen · ‎02-02-2023

@glilienfield Thank you! But I think you missed the r2 relationship in the middle transitive closure.

I use apoc.path.expandConfig procedure this way because it does not support transitive closure in relationshipFilter, for example:

relationshipFilter: 'r1>, (r1>|r2>)*, r1>'

This is the pattern I want but is not supported by apoc, that's why I split them into three pieces and then combine them at the end.

The purpose is to compare pure apoc query and pure cypher query, but the apoc query is not working.

That's why I'm asking if there is anything wrong with my apoc query, or if there is anything I can do to improve my apoc query.

Neo4j

Running forever when using APOC with transitive closures and combining the paths