Neo4j

yunzhoun · ‎07-28-2020

Hi guys,

I need some hints from experts to help me to do bulk update to properties on relationships.

I'm using java API to parameterize the data I want to update, one piece of my data looks like:
{"from":"241919837","to":"330970321","label":"ACT_IN","properties":{"AUTHORITY":1.0,"GENERAL_RELATION_WEIGHT":0.0,"SCORE":0.0,"IS_CORE_RELATION":0.0}}

The Cypher I'm using is:


CALL apoc.periodic.iterate(
                            "unwind $update1 as row match (n:Entity{eid:row.from})-[r]-> (m:Entity{eid:row.to}) where type(r) = row.label return r, row.properties as properties", 
                            "CALL apoc.create.setRelProperties(r, keys(properties), [k in keys(properties)|properties[k]]) yield rel return 'n'",{batchSize:1, parallel:true,params:{update1:$update1}})

This query is relatively slow, some reasons I guess can be:

plain Cypher doesn't support dynamic type match on relationship, so I have to use clause like type(r) = row.label to find the relationship
Something strange happened in apoc.create.setRelProperties

By the way, I'm using enterprise version 3.2.x, so in this version it seems like apoc.merge.relationship doesn't work as expected, it will not update the properties. so following doesn't work for me.

CALL apoc.periodic.iterate(
                            "unwind $update1 as row match (from:Entity{eid:row.from}), (to:Entity{eid:row.to}) return from, to, row.label as label, row.properties as properties", 
                            "CALL apoc.merge.relationship(from, label, {}, properties, to) yield rel return 'n'",{batchSize:1, parallel:true,params:{update1:$update1}})

Is there any way to accelerate my update? Any hint will be well appreciated.

stefan_armbrust · ‎07-28-2020

You can use apoc.cypher.run using string concatenation to build the first query - this will avoid expensive filtering.
I guess you just want to update all relationship properties therefore I guess you don't need to use apoc.create.setRelProperties:
Also a batchSize of 1 is unreasonable, use maybe 10k.

CALL apoc.periodic.iterate(
                            "unwind $update1 as row call apoc.run.cypher('match (n:Entity{eid:row.from})-[:' + row.label + ']-> (m:Entity{eid:row.to}) return r', {row:row}) yield value return value.r as r, row.properties as properties", 
                            "SET r = properties'",{batchSize:10000, parallel:true,params:{update1:$update1}})

I haven't actually tested the statement, but the idea should be clear.

yunzhoun · ‎07-28-2020

Hi Stefan,

Thank you so much for your respond. I did try your solution, and I encountered new problems:

First, I was trying to run

CALL apoc.periodic.iterate(
"unwind $update1 as row call apoc.cypher.run('match (n:Entity{eid:row.from})-[r:' + row.label + ']-> (m:Entity{eid:row.to}) return r', {row:row}) yield value return value.r as r, row.properties as properties"
,
"SET r = properties",{batchSize:10000, parallel:true,params:{update1:$update1}})

However, it gave me NullPointerException. I did some tuning and found out that the apoc.cypher.run procedure returned relationship as a map rather than relationship itself, so we cannot set properties to a map.

As a result, I had to pass relationship id and match it again at the second caluse:

CALL apoc.periodic.iterate(
"unwind $update1 as row call apoc.cypher.run('match (n:Entity{eid:row.from})-[r:' + row.label + ']-> (m:Entity{eid:row.to}) return id(r) as x', {row:row}) yield value return value.x as id, row.properties as properties"
,
"Match ()-[r]->() where id(r) = id SET r = properties",{batchSize:10000, parallel:true,params:{update1:$update1}})

But this solution only increased a little bit performance from the original query I used (because we match twice.

Btw, using apoc.cypher.run is a good idea.

Is there anyway to further improve?

Another follow up question: what if I just want to delete the relationship. It's very slow using:

CALL apoc.periodic.iterate(
"unwind $props as row call apoc.cypher.run('match (n:Entity{eid:row.from})-[r:' + row.label + ']-> (m:Entity{eid:row.to}) return id(r) as x', {row:row}) yield value return value.x as id"
,
 "Match ()-[r]->() where id(r) = id delete r"
,{batchSize:1000, parallel:true, params:{props:$props}})

Neo4j

Batch update properties on relationships too slow