Neo4j

tom_russell · ‎01-18-2021

We see very slow performance for some of our queries that returns lots of results (100ks to 1Ms of rows). Analyzing the PROFILE it seems that the consumption of the results is what is eating up the time and not the querying itself. We tried reducing the overhead of data being returned by doing a collection of the property that returns in one row vs millions of rows of one property. However, this approach has limitations and wandering if there is any insight on what else can be done. The following is a general representation of the query. The pattern match does not seem to be the bottleneck and the availability only increases slightly in version 2 since we encapsulated in collect() aggregation.

Neo4j Version: 3.5
Is there any improvement on the transfer/consumption of results in 4.X or is that primarily a bandwidth and driver issue?

Basic Return Version:
Query: MATCH (a:Label1)-[:relation*]->(b:Label2) RETURN b.Property as result
Available: 13s
Consumed: 189.5s
Screenshot of Actual PROFILE

Collection Return Version:
Query: MATCH (a:Label1)-[:relation*]->(b:Label2) RETURN collect(b.Property) as result
Available: 17s
Consumed: 0.15s
Screenshot of Actual PROFILE

clem · ‎01-18-2021

One obvious (sorry) question, is do you really need to do [:relation*] (infinite depth) or can you make the search depth smaller? E.g. [:relation*5]

As mentioned here:
https://neo4j.com/docs/graph-data-science/current/alpha-algorithms/overlap/

(nodeA)-[:RELTYPE*]->(nodeB)

Retrieve all paths of any length with the relationship, :RELTYPE from nodeA to nodeB or from nodeB to nodeA and beyond. This is usually a very expensive query so you should place limits on how many nodes are retrieved:

tom_russell · ‎01-18-2021

We do have constructors/paramterized queries that get created that introduce depth limits (which I omitted origianlly as I did not expect that to be a factor). Given what I have researched about this I do not expect changes in the pattern match part of the query to have an impact. However, in doing a quick test, introducing a the depth limitation had similar result availability performance and did not impact the consumption performance at all for either version of the query.

Neo4j

Poor Performance on Consuming/Returning Millions of Rows