Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
08-31-2020 04:42 PM
Hi!
I have a special requirement where I need to set up a single relationship out of three different node labels using 2 properties. I can achieve this requirement by running 3 different queries like below but I am failing when I try to achieve the same using a single query.
Query 1.
MATCH (A:NODE_1) WHERE A.p1 = ''
WITH A
MATCH (B:NODE_2) WHERE A.p2 = B.p2
MERGE (B)-[:REL]->(B)
Query 2.
MATCH (A:NODE_1) WHERE A.p2 = ''
WITH A
MATCH (C:NODE_3) WHERE A.p1 = C.p1
MERGE (A)-[:REL]->(C)
Query 3.
MATCH (A:NODE_1) WHERE A.p1 <> '' AND A.p2 <> ''
WITH A
MATCH (B:NODE_2) WHERE A.p2 = B.p2
MERGE (A)-[:REL]->(B)
A
contains ~ 1M nodes and B
and C
contains 10K and 500 respectively. Can these 3 queries be merged together and run in a single query? If so, would someone please help? thanks!!!!
09-01-2020 12:33 AM
Hi,
I created test data.
CREATE (:NODE_1 {p1: '', p2: 'aa'}),
(:NODE_1 {p1: 'bb', p2: ''}),
(:NODE_1 {p1: 'cc', p2: 'dd'}),
(:NODE_2 {p2: 'aa'}),
(:NODE_2 {p2: 'bb'}),
(:NODE_2 {p2: 'cc'}),
(:NODE_2 {p2: 'dd'}),
(:NODE_3 {p1: 'aa'}),
(:NODE_3 {p1: 'bb'});
And I edited your query to a single query for Neo4j 3.5.
By the way, I thought the 3rd line was wrong, so I fixed it
from MERGE (B)-[:REL]->(B) to MERGE (A)-[:REL]->(B)
MATCH (A:NODE_1) WHERE A.p1 = ''
MATCH (B:NODE_2) WHERE A.p2 = B.p2
MERGE (A)-[:REL]->(B)
WITH 1 as dummy
MATCH (A:NODE_1) WHERE A.p2 = ''
MATCH (C:NODE_3) WHERE A.p1 = C.p1
MERGE (A)-[:REL]->(C)
WITH 1 as dummy
MATCH (A:NODE_1) WHERE A.p1 <> '' AND A.p2 <> ''
MATCH (B:NODE_2) WHERE A.p2 = B.p2
MERGE (A)-[:REL]->(B);
It's not good code, but it works.
09-01-2020 08:24 AM
Thanks so much @koji!
I ran this query and it's been running for a while. Is there way I could use apoc.periodic.iterate
to speed up the process with batches?
Thanks!
09-01-2020 11:51 AM
Hi,
I think it's a good idea to add an index first, then use apoc.periodic.iterate.
for 3.x
CREATE INDEX ON :NODE_1(p1);
CREATE INDEX ON :NODE_1(p2);
CREATE INDEX ON :NODE_2(p2);
CREATE INDEX ON :NODE_3(p1);
for 4.x
CREATE INDEX node1p1 FOR (n:NODE_1) ON (n.p1);
CREATE INDEX node1p2 FOR (n:NODE_1) ON (n.p2);
CREATE INDEX node2p2 FOR (n:NODE_2) ON (n.p2);
CREATE INDEX node3p1 FOR (n:NODE_3) ON (n.p1);
09-01-2020 12:09 PM
Yes! I did set up 4 indexes this morning but still, it was running for long and I had to kill it. I think the reason is my Node_1
which has almost 1M nodes. That's why it's taking time to load everything to memory?
I used apoc.periodic.iterate
like below previously for individual queries. I am just lost on how to use apoc.periodic.iterate
for the single query you helped me put together.
call apoc.periodic.iterate("MATCH (A:NODE_1) WHERE A.p1 = '' WITH A
MATCH (B:NODE_2) WHERE A.p2 = B.p2
RETURN A,B",
"MERGE (B)-[:REL]->(B)",
{batchSize:10000,parallel:false})
09-01-2020 04:38 PM
How about this one.
CALL apoc.periodic.iterate("MATCH (X:NODE_1) WHERE X.p1 = ''
MATCH (B:NODE_2) WHERE X.p2 = B.p2
MATCH (Y:NODE_1) WHERE Y.p2 = ''
MATCH (C:NODE_3) WHERE Y.p1 = C.p1
MATCH (Z:NODE_1) WHERE Z.p1 <> '' AND Z.p2 <> ''
MATCH (D:NODE_2) WHERE Z.p2 = D.p2
RETURN X,Y,Z,B,C,D",
"MERGE (X)-[:REL]->(B)
MERGE (Y)-[:REL]->(C)
MERGE (Z)-[:REL]->(D)",
{batchSize:10000,parallel:false})
09-01-2020 04:49 PM
A note on this one, I see what you're intending (using the dummy to reset between each), but the problem arising from this is cardinality. The WITH 1 as dummy
does not reset the cardinality. You would need to either use an aggregation or use WITH DISTINCT 1 as dummy
to reset it properly in between.
Here's a knowledge base article on this:
Another approach you might consider...
You could use UNION ALL between the subqueries here (returning maybe a count(*) from each of them). This makes it easier to ensure the queries are isolated from each other, with no cardinality issues between them.
09-01-2020 05:35 PM
@andrew.bowman
Thanks for your comment, I was initially going with this 🙂
WITH count(*) AS dummy
In this case, I thought your comment "UNION ALL" is good and easy to read.
All the sessions of the conference are now available online