cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Why parallel:true can't be used in apoc.load.csv?

lingvisa
Graph Fellow

CALL apoc.periodic.iterate("
CALL apoc.load.csv('/Users/martin/test/test.csv', {nullValues:['','na','NAN',false], sep:' ' })
yield map as row",
"MERGE (m🏷Generics {nid: row.nid})
ON CREATE SET m += row
ON MATCH SET m += row
RETURN count(m) as mcount", {batchSize:1000, iterateList:true, parallel:true})

This used to work, but now in neo4j-community-4.4.12, it reports this an error below with  'parallel:true'. But if I change it to "parallel:false", it works fine. Why is that? I got the message when I copy the command to the Browser to test it.

 

{
  "ForsetiClient[transactionId=1121, clientId=2] can't acquire ExclusiveLock{owner=ForsetiClient[transactionId=1120, clientId=12]} on NODE(1062), because holders of that lock are waiting for ForsetiClient[transactionId=1121, clientId=2].\n Wait list:ExclusiveLock[\nClient[1120] waits for [ForsetiClient[transactionId=1121, clientId=2]]]": 1
}

 

 

 

1 ACCEPTED SOLUTION

Hi @lingvisa,

The error actually shows that you are having locks, which is the case when you use parallelization on your query and Merge operation. 
What is happening is that on two different threads (parallel sessions), the node (1062) is being accessed and that is creating a conflict. That happens when the nid column in your CSV file is not unique. 

Anyway, it will still retry until it succeeds in the background but it will show you the error anyway. If it does not succeed is going to stop the operation with errors (maybe your instance does not have enough time set to wait for the transaction to finish).

In short, if there are duplicates in a column and you are using it with MERGE, it would be better to not use parallelization to avoid such errors from happening.

Regards,

View solution in original post

1 REPLY 1

Hi @lingvisa,

The error actually shows that you are having locks, which is the case when you use parallelization on your query and Merge operation. 
What is happening is that on two different threads (parallel sessions), the node (1062) is being accessed and that is creating a conflict. That happens when the nid column in your CSV file is not unique. 

Anyway, it will still retry until it succeeds in the background but it will show you the error anyway. If it does not succeed is going to stop the operation with errors (maybe your instance does not have enough time set to wait for the transaction to finish).

In short, if there are duplicates in a column and you are using it with MERGE, it would be better to not use parallelization to avoid such errors from happening.

Regards,

Nodes 2022
Nodes
NODES 2022, Neo4j Online Education Summit

All the sessions of the conference are now available online