cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Confused about performance

kaptenh
Node Clone

Hi!
I have written a java-plugin to handle insertion of data into neo4j. I have tried calling it in two different ways from python, one where I supply the data in the python code, and one where I call apoc.load.json.
i.e, the first is:

c = "WITH $nodes as nodes
UNWIND nodes as n
CALL mymodule.mergeNodes(n.labels, n.identity, n.properties) yield node return node"

with driver.session() as session:
res = session.run(c, nodes= nodes)

and the second one is:
c = "CALL apoc.load.json("some internal webserver") YIELD value as n
CALL mymodule.mergeNodes(n.labels, n.identity, n.properties) yield node return node"

with driver.session() as session:
res = session.run(c)

The, to me, weird thing is that second approach is a lot faster, my tests went from about 10s to 4s, so the first and most obvious (?) approach is half the speed of the second.

Any ideas as to why this is?

2 REPLIES 2

MuddyBootsCode
Graph Steward

The APOC procedures often do things faster as they're optimized under the hood. With UNWIND you're looping over things which is bound to increase your load time a bit.

mdfrenchman
Graph Voyager

The WITH $nodes as nodes if you pass a very large dataset as that $nodes param it has to go into memory, which depending on size, can cripple or cause a delay at the start of the run. I've caused an OOM Exception this way before

The apoc.load.json is more similar to LOAD CSV performance wise, which would be better for large sets.

Also, I suppose, it'd depend on how the python driver implements params.

-Mike