Neo4j

jtylerdadh · ‎06-21-2020

Hi,
I have a fairly small database: 5,773 nodes, 14,756 relationships. I have a number of queries I am running at once, combined with UNION ALL. I noticed that the individual queries (i.e. without any union statements) run fast, but when I combine them they run much slower - i.e. if I ran them all individually and added the times, I would get a cumulative time much lower than the time it takes to run them all as a single query using UNION ALL statements.

I am warming the cache with apoc.warmup.run(true, true, true) but it has no effect on the time it takes to run the queries. I have indexed node IDs and dates, which are the primary properties used in the queries.

The queries I am running are all pretty similar to the one below, which takes 223 ms to run the first time I run it (after warming the cache), and 107 ms to run the second time I run it. I am combining 250 of these short queries into a single query using UNION ALL - which takes about 10 minutes to run. If I group them into 22 separate queries, it takes them 4 minutes to run (cumulatively).

Example Query:

MATCH (di3:apple)-[:R1]->(:pear {Action: "Start"})
OPTIONAL MATCH (:banana)-[:R2]->(di3)-[:R3]->(l3:cherry)
WITH COLLECT(l3.ID) AS lids
MATCH (g:banana)-[:R2]->(di:apple)-[:R3]->(l:cherry)-[:R4]->(c:grape), (d:peach)<-[:R5]-(di)-[:R1]->(dt:pear)
WHERE NOT l.ID in lids
RETURN DISTINCT c.ID AS aa, dt.ID AS bb, dt.prop AS cc, g.ID AS dd, dt.Name AS ee, etc.

Is there anything I can do to speed up the combined query? My understanding is that UNION ALL just combines the results of each query - it's not trying to eliminate duplicates or anything. Why then does it take so much longer to run than it should?

Thank you

Cobra · ‎06-23-2020

Hello @jtylerdadh and welcome to the Neo4j community

Can you show us the query with the UNION?

Regards,
Cobra

jtylerdadh · ‎06-23-2020

Sure - first a couple of side notes. heap initial size = 10g, heap max size = 10g, pagecache size = 2g (apoc warmup does not run out of room in the cache), using version 4.0.6. Every once in a while I get a message that an error occurred during update, which tells me to check the logs for details, but there's never anything in the logs about the error.

Below are two sections of the query. The full query has about 250 of these sections, and all are structured very similarly to these.

If I just take the two sections below and run them once, it takes 6 ms to finish. If I copy these 16 times and combine them into a single query that is 32 lines long (using UNION ALL MATCH throughout) I would think it should take about 96 ms to complete. Instead, it takes 24006 ms - which is 250 times longer than expected.

MATCH (di3:Apple)-[:R1]->(:Pear {Action: "Start"}) OPTIONAL MATCH (:Banana)-[:R2]->(di3)-[:R3]->(l3:Cherry) WITH COLLECT(l3.ID) AS lids MATCH (g:Banana)-[:R2]->(di)-[:R3]->(l:Cherry)-[:R4]->(c:Grape), (d:Peach)<-[:R5]-(di:Apple)-[:R1]->(dt:Pear) WHERE NOT l.ID in lids RETURN DISTINCT c.ID AS aa, dt.ID AS bb, dt.FirstText AS cc, g.ID AS dd, dt.Name AS ee, dt.SecondText AS ff, l.ID AS gg, dt.AfterSecondText AS hh, g.ID + "." + l.ID + "." + c.ID AS ii, c.ID AS jj, d.ID AS kk, l.Name AS ll, l.ID AS mm, 0 AS nn, "" AS oo, 90019 AS pp, 900227 AS qq

UNION ALL MATCH (d3:Peach)<-[:R5]-(di3:Apple)-[:R1]->(:Pear {Action: "Start"}), (:Banana)-[:R2]->(di3)-[:R3]->(l:Cherry), (g:Banana)-[:R2]->(di:Apple)-[:R3]->(l)-[:R4]->(c:Grape), (d:Peach)<-[:R5]-(di)-[:R1]->(dt:Pear) WHERE d.Date < d3.Date RETURN DISTINCT c.ID AS aa, dt.ID AS bb, dt.FirstText AS cc, g.ID AS dd, dt.Name AS ee, dt.SecondText AS ff, l.ID AS gg, dt.AfterSecondText AS hh, g.ID + "." + l.ID + "." + c.ID AS ii, c.ID AS jj, d.ID AS kk, l.Name AS ll, l.ID AS mm, 0 AS nn, "" AS oo, 90019 AS pp, 900227 AS qq

Thanks for your help!

Neo4j

Union all query takes much longer than it should based on time to run each individual query