Neo4j

guilherme_junqu · ‎10-08-2018

Guys, how are you?

I am updating my graph to get rid of my Year nodes. I have set the following query and it has been running for roughly 8 hours, but it only executed 15% of the job so far:

call apoc.periodic.commit("
match
  (afe:Measurement)-[:taken_on]->(y:Year)
where
  afe.date is null
WITH
  afe, y
LIMIT
  {limit}
SET
  afe.date = date({year: y.value})
return
   count(*)",
{limit:1500})

My feeling is that the query starts to slow down because it gets increasingly harder to find the nodes with null values as the nodes get updated.

Question: would it be better to build a index on that property beforehand ? Or would it make it worse since I would have to update also the index during the execution?

Is there any other way to speed things up?

Thanks in advance,

andrew_bowman · ‎10-08-2018

I believe you're correct. Since the query is executed repeatedly, it will also be matching to and evaluating the properties from the same nodes over and over each iteration.

You may want to try apoc.periodic.iterate() instead, it's designed to only match once, and stream results and process in batches:

CALL apoc.periodic.iterate("
MATCH
  (afe:Measurement)-[:taken_on]->(y:Year)
WHERE
  afe.date is null
RETURN afe, y",
"SET
  afe.date = date({year: y.value})", {batchSize:1500}) YIELD total, batches, errorMessages
RETURN total, batches, errorMessages

The batching is handled for you here, no need for explicit usage of limit.

View solution in original post

andrew_bowman · ‎10-08-2018

I believe you're correct. Since the query is executed repeatedly, it will also be matching to and evaluating the properties from the same nodes over and over each iteration.

You may want to try apoc.periodic.iterate() instead, it's designed to only match once, and stream results and process in batches:

CALL apoc.periodic.iterate("
MATCH
  (afe:Measurement)-[:taken_on]->(y:Year)
WHERE
  afe.date is null
RETURN afe, y",
"SET
  afe.date = date({year: y.value})", {batchSize:1500}) YIELD total, batches, errorMessages
RETURN total, batches, errorMessages

The batching is handled for you here, no need for explicit usage of limit.

guilherme_junqu · ‎10-08-2018

Hey Andrew,

Thank you for your answer. I will give apoc.periodic.iterate a go.

Could you clarify one thing? Should I build a index beforehand or not?

Bests,

guilherme_junqu · ‎10-08-2018

Wow...

I tried the suggested APOC and it is running way faster!

I did not create the index, as I wanted to have a fair comparison with the previous one.

Thanks!

andrew_bowman · ‎10-08-2018

Glad to hear it!

With your query, an index wouldn't have helped, as there's no property lookup here (null checks on properties of nodes don't use index lookup).

Neo4j

Speeding up my graph update