Neo4j

Rogie · ‎06-09-2020

I'm currently needing to executing large queries of the form:

UNWIND $data as row
Merge (n:Node {id: row.id, name: row.name, ...})

I have millions of data rows I need to do this for, but I can't just do this with a single UNWIND query because neo4j crashes with a memory error. If I make batches so that $data contains around 20,000 rows at a time, this seems to be OK. But is there a way to increase this? Are there any tricks for dealing with these kinds of situations?

md7 · ‎06-09-2020

could be many more efficient solutions.
This scenario is like migrating data.

Here one can think in simple way.
At the source node level maintain a attribute as migrated =false
And at very first stage select only those record where migrated is false and LIMIT=20000
then MERGE it.

Rogie · ‎06-09-2020

I don't really understand what you're suggesting. Are you telling me to try to avoid merging any nodes that are already present in the graph?

I wonder if there is a some other underlying problem here, since it's taking 20 minutes to merge around 100,000 nodes. Each node has around 10 attributes, one of which is name_id, for which I have set

CREATE CONSTRAINT ON (n:Node)
ASSERT n.name_id is unique

I see posts about people merging millions of nodes in a few minutes, so I am wondering what I'm doing wrong.

soham_dhodapka1 · ‎06-09-2020

Hey Rogie,
Since name_id has a constraint, you can merge on name_id and set the other properties using SET clause.

UNWIND $data as row
MERGE (n:Node {name_id:row.name_id})
SET n.id = row.id

and so on..
Can you try this approach?

Rogie · ‎06-09-2020

I just tried that and it took 1 minute to merge 20,000 nodes. Is that normal?

soham_dhodapka1 · ‎06-09-2020

Umm...can be made faster. What syntax are you using for creating batches? Maybe try playing with the batch size a bit, say 10,000.

Neo4j

How to guage how many nodes and UNWIND can handle?