Neo4j

Minyall · ‎07-12-2020

I've managed to solve my issue but in a way that seems not particularly efficient.

'CALL apoc.periodic.iterate('UNWIND $batch as row RETURN row',
'MATCH (s:STORY), (t:ISSUE) WHERE s.id = row.id AND t.id = row.cat_id 
CREATE (s)-[r:IS_TAGGED_WITH]->(t)', 
{batchSize:10000, parallel:false, iterateList:true, params:{batch:$edge_list}})'

This query works for around 50K relationships between STORY nodes and ISSUE nodes. I use the python driver to pass in a list of dicts as the $edge_list parameter. However, if I set parallel:true the procedure only writes what is probably the first batch, i.e. I only get 10,000 relationships created.

Is this just a quirk of apoc.periodic,iterate, or can I change the query to ensure parallel works as expected?

Many thanks,

mark_needham · ‎07-14-2020

Do you see any errors when you're using the parallel version? I'm wondering if you're getting a deadlock exception because it's trying to write two relationships to the same node in parallel...

Minyall · ‎07-15-2020

Yes, after running a toy version in the browser rather than through python I saw the errors regarding the lock. Is there a way to rewrite the query that works around that, or is it just the nature of neo4j?

Many thanks for your reply!

mark_needham · ‎07-28-2020

I don't think there's a way to work around it by rewriting the query, but you can set the retries parameter, which will retry up to a specified number of times if it runs into problems.

See https://neo4j.com/docs/labs/apoc/current/graph-updates/periodic-execution/#commit-batching for more details.

Minyall · ‎07-29-2020

Thanks Mark! I'll keep retries in mind.

Neo4j

Apoc.periodic.iterate only writing one batch with parallel