cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Newbie question: unexpected cypher-shell memory/heap usage

I have a reasonably large AWS Ubuntu instance with 60+Gb of RAM.

I have configured CE with:

dbms.memory.heap.initial_size=20g
dbms.memory.heap.max_size=20g
dbms.memory.pagecache.size=20g

I am running a complex/large query under cypher-shell --format plain --fail-fast and I am surprised that the cypher-shell process is running out of memory... so I am wondering whether the query is running partially/wholly within the cypher-shell client process, rather than the server.

Behaviour: according to top:

  • neo4j database process = 43Gb (fits above configuration) and eventually using minimal CPU
  • cypher-shell process (running as me) = 20Gb (surprise) and eating 1200% CPU (surprise) and tending to run out of heap unless I tweak the query to be conservative, etc.

So... am I correct that cypher-shell is doing part/all of the query work and thereby suffering from resource limit constraints that I had presumed would be handled by the (more generously resourced) main daemon?

5 REPLIES 5

Cypher shell is not streaming it materializes all results.

I have a PR open for cypher-shell which has not been merged yet to add streaming.

Use your own code to consume the results in a streaming manner, in any language you want.

did it work out with your own code?

Hi Michael,

There's an issue with the "Use your own code to consume the results in a streaming manner" approach, in that I am literally nor using any code other than Cypher; there is no embedding in any of the CSV loading that I am doing - it's all run directly in cypher-shell.

Hello, we added better streaming support to cypher-shell in 3.5.5. If you can please upgrade and give that a try, it should work better for your use cases.

Hi Andrew! There does appear to have been a stability improvement, recently, though I can't honestly attribute it to a specific release, or alternately a lot of effort I've put into grouping and sharding the uploads to run in different processes.

I am experiencing something new, though, but that deserves a fresh post, will submit shortly.