cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Scaling-up Cypher query on neo4j, optimisation and memory requirements?

Hello community! I am new to Neo4j and using Cypher. I am trying to find circular money flow from using the approach similar to one mentioned in the following blog.

The approach works very much fine when my graph has 20M nodes and 21M relationships (I get an output within an hour) but when I scale it up to 90M nodes and 120M relationships I don't get any result even after waiting for 6 hours. I wonder if I am using a well optimised query (mentioned below).

For creating HOP relationships

MATCH (t1:txn)-[:OUT]->(:customer)-[:IN]->(t2:txn)
WHERE t1.txn_date < t2.txn_date
AND t2.txn_amount < t1.txn_amount
AND (t1.txn_amount - t2.txn_amount) / t1.txn_amount < 0.50
MERGE (t1)-[:HOP]->(t2)

To query circular patterns

PROFILE MATCH path = (t1:txn)-[:HOP*3..10]->(t2:txn)
WHERE (t2)-[:OUT]->(:customer)-[:IN]->(t1)
  AND t1.txn_date < t2.txn_date
AND t2.txn_amount < t1.txn_amount
  AND (t1.txn_amount - t2.txn_amount) / t1.txn_amount < 0.70

My graph looks like this3X_8_a_8a4af86a4f23a2ef1555c1b19e01a3bb0593934a.png

And also, please let me know if I am using the right memory configuration? My graph is 10 GB

dbms.memory.heap.initial_size=31g
dbms.memory.heap.max_size=31g
dbms.memory.pagecache.size=63g

dbms.memory.transaction.global_max_size=96g
dbms.tx_state.memory_allocation= ON_HEAP

Will it help if I further expand my dbms.memory.transaction.global_max_size and pagecache.size speed up the process?

I have read in neo4j blogs not to have a heap size greater than 31GB but when I use APOC library to create HOP relationships(first code) it gives me an memory out error but the same doesn't happen when I use a 64GB heap size. Should I continue using 64GB heap size?

2 REPLIES 2

What version of Neo4j is installed?

If in fact your graph is 10GB ? then your settings of

dbms.memory.heap.initial_size=31g
dbms.memory.heap.max_size=31g
dbms.memory.pagecache.size=63g

appear incorrect. Having heap 3x as large as the graph and the pagecache 6x seems incorrect.
Memory configuration - Operations Manual describes how these parameters should be defined.
Specifically and as this states

Page cache
The page cache is used to cache the Neo4j data stored on disk. The caching of graph data and indexes into memory helps avoid costly disk access and result in optimal performance.

if your graph is 10G then defining the dbms.memory.pagecache.size to anything more than 10GB is overkill, unless you are doing so because today the graph is 10GB and tomorrow it will be 20GB, for example.

Hello @dana.canzano Thanks for replying. I am using neo4j community 4.2.0 . I am new to neo4j and memory part is something I am still not clear with. neo4j-admin memrec command suggested me to set

dbms.memory.heap.max_size=31g 
 dbms.memory.pagecache.size=459500m

but when I used apoc.periodic.iterate I was getting an heap size out error. This was resolved when I changed the max heap size to 64g. So, I thought having a bigger heap size and page cache size would increase the performance. Please let me know if doing such a thing is worsening the performance. And also suggest me whats could be the best memory configuration. It would be great if you would let me know if the query I am using is optimised?

PROFILE MATCH path = (t1:txn)-[:HOP*3..10]->(t2:txn)
WHERE (t2)-[:OUT]->(:customer)-[:IN]->(t1)
  AND t1.txn_date < t2.txn_date
AND t2.txn_amount < t1.txn_amount
  AND (t1.txn_amount - t2.txn_amount) / t1.txn_amount < 0.70
RETURN path

Thank you.