cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Speeding up my graph update

Guys, how are you?

I am updating my graph to get rid of my Year nodes. I have set the following query and it has been running for roughly 8 hours, but it only executed 15% of the job so far:

call apoc.periodic.commit("
match
  (afe:Measurement)-[:taken_on]->(y:Year)
where
  afe.date is null
WITH
  afe, y
LIMIT
  {limit}
SET
  afe.date = date({year: y.value})
return
   count(*)",
{limit:1500})

My feeling is that the query starts to slow down because it gets increasingly harder to find the nodes with null values as the nodes get updated.

Question: would it be better to build a index on that property beforehand ? Or would it make it worse since I would have to update also the index during the execution?

Is there any other way to speed things up?

Thanks in advance,

1 ACCEPTED SOLUTION

I believe you're correct. Since the query is executed repeatedly, it will also be matching to and evaluating the properties from the same nodes over and over each iteration.

You may want to try apoc.periodic.iterate() instead, it's designed to only match once, and stream results and process in batches:

CALL apoc.periodic.iterate("
MATCH
  (afe:Measurement)-[:taken_on]->(y:Year)
WHERE
  afe.date is null
RETURN afe, y",
"SET
  afe.date = date({year: y.value})", {batchSize:1500}) YIELD total, batches, errorMessages
RETURN total, batches, errorMessages

The batching is handled for you here, no need for explicit usage of limit.

View solution in original post

4 REPLIES 4

I believe you're correct. Since the query is executed repeatedly, it will also be matching to and evaluating the properties from the same nodes over and over each iteration.

You may want to try apoc.periodic.iterate() instead, it's designed to only match once, and stream results and process in batches:

CALL apoc.periodic.iterate("
MATCH
  (afe:Measurement)-[:taken_on]->(y:Year)
WHERE
  afe.date is null
RETURN afe, y",
"SET
  afe.date = date({year: y.value})", {batchSize:1500}) YIELD total, batches, errorMessages
RETURN total, batches, errorMessages

The batching is handled for you here, no need for explicit usage of limit.

Hey Andrew,

Thank you for your answer. I will give apoc.periodic.iterate a go.

Could you clarify one thing? Should I build a index beforehand or not?

Bests,

Wow...

I tried the suggested APOC and it is running way faster!

I did not create the index, as I wanted to have a fair comparison with the previous one.

Thanks!

Glad to hear it!

With your query, an index wouldn't have helped, as there's no property lookup here (null checks on properties of nodes don't use index lookup).