cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Merging two nodes running endlessly

Venu
Node

I am new to neo4j, I am trying to merge two nodes as mentioned below:

MATCH (n:node2) MERGE (p:node1 {p.id:n.id}) ON CREATE SET p.column1=n.column1,p.column2=n.column2, p.column3=n.column3,p.column4=n.column4,p.column5=n.column5,p.column6=n.column6, p.column7=n.column7 ON MATCH SET p.column1=n.column1,p.column2=n.column2, p.column3=n.column3,p.column4=n.column4,p.column5=n.column5,p.column6=n.column6, p.column7=n.column7;

Node1 contains 2 million nodes with 8 properties and node2 contains 184000 nodes with 8 properties.

I am trying to merge node2 records with node1, but this merge runs endlessly. Is there any way to run this merge command in less time?

2 REPLIES 2

The way the query is written, you are getting the Cartesian product between the two node matches. Considering there are 2 million nodes with label 'node2' and you are merging on the other node, you will have at least 2 million rows to data to process. Is there only one node of node1 type associated with a single node to node2 type? 

Besides the large number of rows to process, your merge will need to match on the node. Do you have an index on the property 'id' for label node1? If not, the database will do a full scan each time of all node1 labels to find the one that matches with the 'id'. 

Next, it looks like you are just replicating node1 to a equivalent node2, as the properties and values are identical. Can you just modify the node2 to have the label node1 or add the label node1 to it, so you don't create another node?

If this is still what you need, I suggest you use something like apoc periodic iterate to batch the updates into small batches of around 1000 or 10,000 nodes at a time.  You can try something like this:

CALL apoc.periodic.iterate(  
  "
    MATCH (n:node2)
    RETURN n
  ",
  "
    MERGE (p:node1 {id:n.id}) 
    SET p.column1=n.column1,p.column2=n.column2, p.column3=n.column3,p.column4=n.column4,p.column5=n.column5,p.column6=n.column6, p.column7=n.column7 
  ",
 {batchSize:10000})

The way I understand it, the first query will run and the second query will be run for each result from the first query. 

BTW, if you are setting all the properties of 'p' to the identical properties of 'n', then you can replace your SET statement with 'SET p = n'