Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
10-20-2020 10:16 AM
I have a cypher query that runs a bit long. It takes 3 hours to go through a few thousand nodes. But my technical folks tell me I am using 1 processor and only about 3G of memory. I have 8 and 16g memory, with 12g swap.
What are the config settings to open this up a bit more. Thanks
10-20-2020 12:56 PM
Here's some documentation on tuning, which includes sections for memory tuning (heap and pagecache)
However 3 hours for processing a few thousand nodes isn't normal, we would expect to see at most a couple seconds.
You can run an EXPLAIN of the query to check is profile plan. You don't want to see AllNodesScans in there, and also (if at all possible) you should avoid NodeByLabelScans too. If you're looking up nodes by label + property, then you should create indexes on those for fast lookup.
10-20-2020 03:45 PM
Most of the reason for the slowdown is due to the lack of indexes on the properties used in the match clause. Depending on the data, I don't think memory is the problem.
10-21-2020 07:28 AM
This was the query
// Compare structure
MATCH (g:Compliant)
WITH collect(g.text) AS g1Names
MATCH (g:NonCompliant)
WITH apoc.coll.union(g1Names, collect(g.text)) AS uniqueNames
UNWIND uniqueNames AS dim1
UNWIND uniqueNames AS dim2
OPTIONAL MATCH p = (g1:Compliant {text: dim1})<--(g2:Compliant {text: dim2})
WITH uniqueNames, dim1, dim2, CASE WHEN p is null THEN 0 ELSE count(p) END AS edgeCount
ORDER BY dim1, dim2
WITH uniqueNames, dim1 AS g1DimNames, collect(edgeCount) AS g1Matrix
ORDER BY g1DimNames
WITH uniqueNames, g1DimNames, g1Matrix
UNWIND uniqueNames AS dim1
UNWIND uniqueNames AS dim2
OPTIONAL MATCH p = (g1:NonCompliant {text: dim1})<--(g2:NonCompliant {text: dim2})
WITH g1DimNames, g1Matrix, dim1, dim2, CASE WHEN p is NULL THEN 0 ELSE count(p) END AS edges
ORDER BY dim1, dim2
WITH g1DimNames, g1Matrix, dim1 AS g2DimNames, collect(edges) as g2Matrix
ORDER BY g1DimNames, g2DimNames
WHERE g1DimNames = g2DimNames AND g1Matrix <> g2Matrix
RETURN g1DimNames, g1Matrix, g2DimNames, g2Matrix
// Compare content
MATCH (a:Compliant)
WITH collect({text: a.text, KDM: a.KDM, node: a.node, inode: a.inode, isu: coalesce(a.isu, 0)}) AS compliantContent
MATCH (b:NonCompliant)
RETURN compliantContent, collect({text: b.text, KDM: b.KDM, node: b.node, inode: b.inode, isu: coalesce(b.isu, 0)}) AS NonCompliantContent
10-21-2020 07:29 AM
Max can answer specific questions on it.
10-21-2020 08:05 AM
The objective of the first query is to compare the strucuture of two graphs:
The solution is from stackoverflow.
The second one was done to compare content.
The problem is that there are a lot graphs to compare and to compare one, it takes too long. So questions are:
Regards,
Cobra
10-21-2020 08:14 AM
10-21-2020 02:31 PM
Love the graph though
10-22-2020 07:17 AM
Still looking for some help. Not even sure how to read this.
10-22-2020 02:50 PM
There is a lot that can be improved in this query.
For starters, you're lacking indexes on :Compliant(text)
and :NonCompliant(text)
, without these the query is going to have terrible performance with that OPTIONAL MATCH patterns.
You can probably replace the top part (matching and collecting and unioning the text values) with this:
MATCH (g)
WHERE g:Compliant OR g:NonCompliant
WITH collect(DISTINCT g.text) AS uniqueNames
...
That said the approach for getting the number of edges between these nodes isn't optimal. Too many UNWINDs blowing up your cardinality. We can probably make this much easier.
If you're using Neo4j 4.1.x we can probably use UNION within a subquery so we can post-process results:
CALL {
MATCH (g:Compliant)<--(g2:Compliant)
WITH g, count(g2) as compliantEdges
RETURN g.text as text, compliantEdges, 0 as nonCompliantEdges
UNION ALL
MATCH (g:NonCompliant)<--(g2:NonCompliant)
WITH g, count(g2) as nonCompliantEdges
RETURN g.text as text, 0 as compliantEdges, nonCompliantEdges
}
RETURN text, sum(compliantEdges) as compliantEdges, sum(nonCompliantEdges) as nonCompliantEdges
If you're using an earlier version, you could leverage apoc.cypher.run()
from APOC Procedures instead:
CALL apoc.cypher.run("
MATCH (g:Compliant)<--(g2:Compliant)
WITH g, count(g2) as compliantEdges
RETURN g.text as text, compliantEdges, 0 as nonCompliantEdges
UNION ALL
MATCH (g:NonCompliant)<--(g2:NonCompliant)
WITH g, count(g2) as nonCompliantEdges
RETURN g.text as text, 0 as compliantEdges, nonCompliantEdges", {}) YIELD value
WITH value.text as text, value.compliantEdges as compliantEdges, value.nonCompliantEdges as nonCompliantEdges
RETURN text, sum(compliantEdges) as compliantEdges, sum(nonCompliantEdges) as nonCompliantEdges
10-22-2020 04:05 PM
Thank you. I will point max at it for some help rewriting. Thank you again.
10-23-2020 04:12 PM
Created the index on Compliant/NonCompliant
Swapped out the first paragraph
I am using 4.1.x so I swapped that out. Damn this thing is fast
MATCH (g)
WHERE g:Compliant OR g:NonCompliant
WITH collect(DISTINCT g.text) AS uniqueNames
CALL {
MATCH (g:Compliant)<--(g2:Compliant)
WITH g, count(g2) as compliantEdges
RETURN g.text as text, compliantEdges, 0 as nonCompliantEdges
UNION ALL
MATCH (g:NonCompliant)<--(g2:NonCompliant)
WITH g, count(g2) as nonCompliantEdges
RETURN g.text as text, 0 as compliantEdges, nonCompliantEdges
}
RETURN text, sum(compliantEdges) as compliantEdges, sum(nonCompliantEdges) as nonCompliantEdges
10-23-2020 04:13 PM
Need to check results but its quick.
10-26-2020 03:09 PM
Just to note, if this is your current query, then you can probably get rid of the first three line and just start from the CALL, since it doesn't look like you're using uniqueNames
anywhere.
10-26-2020 04:09 PM
You are right. It must have been left over from an earlier try. Thank you ! We also swapped around some of the variables to use it. Its a very useful query now. Thanks again
10-24-2020 07:15 AM
Update - Thank you - yes, that solves a problem we have been facing for 3 months. Not only is it blistering fast, it's the results we need for a few different problems.
I will have a follow on but now that this doesn't take hours to run, I need some time to do my own homework. Again, thank you
All the sessions of the conference are now available online