Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
12-16-2021 01:56 AM
The following query can't run on a dataset with ~2M nodes. What should i do to make it run faster?
MATCH (cc:ConComp)-[r1:IN_CONCOMP]-(p1:Person)-[r2:SAME_CLUSTER]-(p2:Person)
WHERE cc.cluster_type = "household"
MERGE (cluster:Cluster {CLUSTER_TMP_ID:cc.CONCOMP_ID + '|' + r2.root_id, cluster_type:cc.cluster_type })
MERGE (cluster)-[r3:IN_CLUSTER]-(p1)
12-16-2021 03:24 PM
Do you have any indexes defined? For example on :Concomp(cluser_type)?
12-17-2021 01:01 AM
I would start with putting MERGE commnands into apoc.periodic.iterate procedure with parallel execution enabled.
12-20-2021 03:32 AM
I finally found a solution by using the following query (and by indexing cc.cluster_type and cc.CONCOMP_ID):
CALL apoc.periodic.iterate('MATCH (cc:ConComp)<-[r1:IN_CONCOMP]-(p1:Person)-[r2:SAME_CLUSTER]-(p2:Person) WHERE cc.cluster_type = "household" WITH DISTINCT cc.CONCOMP_ID + "|" + r2.root_id as id_name, cc.cluster_type as cluster_type_name, p1 RETURN id_name, cluster_type_name, p1', '
MERGE (cluster:Cluster {CLUSTER_TMP_ID: id_name, cluster_type: cluster_type_name})
MERGE (cluster)-[r3:IN_CLUSTER]->(p1)', {batchSize:10000, parallel:false})
I precise that I had previously ran my initial question query with apoc.periodic.iterate without success.
All the sessions of the conference are now available online