Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
07-12-2020 06:28 AM
I've managed to solve my issue but in a way that seems not particularly efficient.
'CALL apoc.periodic.iterate('UNWIND $batch as row RETURN row',
'MATCH (s:STORY), (t:ISSUE) WHERE s.id = row.id AND t.id = row.cat_id
CREATE (s)-[r:IS_TAGGED_WITH]->(t)',
{batchSize:10000, parallel:false, iterateList:true, params:{batch:$edge_list}})'
This query works for around 50K relationships between STORY nodes and ISSUE nodes. I use the python driver to pass in a list of dicts as the $edge_list parameter. However, if I set parallel:true
the procedure only writes what is probably the first batch, i.e. I only get 10,000 relationships created.
Is this just a quirk of apoc.periodic,iterate, or can I change the query to ensure parallel works as expected?
Many thanks,
07-14-2020 08:00 AM
Do you see any errors when you're using the parallel version? I'm wondering if you're getting a deadlock exception because it's trying to write two relationships to the same node in parallel...
07-15-2020 12:36 AM
Yes, after running a toy version in the browser rather than through python I saw the errors regarding the lock. Is there a way to rewrite the query that works around that, or is it just the nature of neo4j?
Many thanks for your reply!
07-28-2020 03:37 AM
I don't think there's a way to work around it by rewriting the query, but you can set the retries
parameter, which will retry up to a specified number of times if it runs into problems.
See https://neo4j.com/docs/labs/apoc/current/graph-updates/periodic-execution/#commit-batching for more details.
07-29-2020 04:55 AM
Thanks Mark! I'll keep retries in mind.
All the sessions of the conference are now available online