cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

How to guage how many nodes and UNWIND can handle?

Rogie
Node Link

I'm currently needing to executing large queries of the form:

UNWIND $data as row
Merge (n:Node {id: row.id, name: row.name, ...})

I have millions of data rows I need to do this for, but I can't just do this with a single UNWIND query because neo4j crashes with a memory error. If I make batches so that $data contains around 20,000 rows at a time, this seems to be OK. But is there a way to increase this? Are there any tricks for dealing with these kinds of situations?

5 REPLIES 5

md7
Node Link

could be many more efficient solutions.
This scenario is like migrating data.

Here one can think in simple way.
At the source node level maintain a attribute as migrated =false
And at very first stage select only those record where migrated is false and LIMIT=20000
then MERGE it.

Rogie
Node Link

I don't really understand what you're suggesting. Are you telling me to try to avoid merging any nodes that are already present in the graph?

I wonder if there is a some other underlying problem here, since it's taking 20 minutes to merge around 100,000 nodes. Each node has around 10 attributes, one of which is name_id, for which I have set

CREATE CONSTRAINT ON (n:Node)
ASSERT n.name_id is unique

I see posts about people merging millions of nodes in a few minutes, so I am wondering what I'm doing wrong.

Hey Rogie,
Since name_id has a constraint, you can merge on name_id and set the other properties using SET clause.

UNWIND $data as row
MERGE (n:Node {name_id:row.name_id})
SET n.id = row.id

and so on..
Can you try this approach?

Rogie
Node Link

I just tried that and it took 1 minute to merge 20,000 nodes. Is that normal?

Umm...can be made faster. What syntax are you using for creating batches? Maybe try playing with the batch size a bit, say 10,000.