Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
10-08-2018 02:04 PM
Guys, how are you?
I am updating my graph to get rid of my Year
nodes. I have set the following query and it has been running for roughly 8 hours, but it only executed 15% of the job so far:
call apoc.periodic.commit("
match
(afe:Measurement)-[:taken_on]->(y:Year)
where
afe.date is null
WITH
afe, y
LIMIT
{limit}
SET
afe.date = date({year: y.value})
return
count(*)",
{limit:1500})
My feeling is that the query starts to slow down because it gets increasingly harder to find the nodes with null
values as the nodes get updated.
Question: would it be better to build a index on that property beforehand ? Or would it make it worse since I would have to update also the index during the execution?
Is there any other way to speed things up?
Thanks in advance,
Solved! Go to Solution.
10-08-2018 02:28 PM
I believe you're correct. Since the query is executed repeatedly, it will also be matching to and evaluating the properties from the same nodes over and over each iteration.
You may want to try apoc.periodic.iterate()
instead, it's designed to only match once, and stream results and process in batches:
CALL apoc.periodic.iterate("
MATCH
(afe:Measurement)-[:taken_on]->(y:Year)
WHERE
afe.date is null
RETURN afe, y",
"SET
afe.date = date({year: y.value})", {batchSize:1500}) YIELD total, batches, errorMessages
RETURN total, batches, errorMessages
The batching is handled for you here, no need for explicit usage of limit.
10-08-2018 02:28 PM
I believe you're correct. Since the query is executed repeatedly, it will also be matching to and evaluating the properties from the same nodes over and over each iteration.
You may want to try apoc.periodic.iterate()
instead, it's designed to only match once, and stream results and process in batches:
CALL apoc.periodic.iterate("
MATCH
(afe:Measurement)-[:taken_on]->(y:Year)
WHERE
afe.date is null
RETURN afe, y",
"SET
afe.date = date({year: y.value})", {batchSize:1500}) YIELD total, batches, errorMessages
RETURN total, batches, errorMessages
The batching is handled for you here, no need for explicit usage of limit.
10-08-2018 02:41 PM
Hey Andrew,
Thank you for your answer. I will give apoc.periodic.iterate
a go.
Could you clarify one thing? Should I build a index beforehand or not?
Bests,
10-08-2018 02:54 PM
Wow...
I tried the suggested APOC and it is running way faster!
I did not create the index, as I wanted to have a fair comparison with the previous one.
Thanks!
10-08-2018 03:59 PM
Glad to hear it!
With your query, an index wouldn't have helped, as there's no property lookup here (null checks on properties of nodes don't use index lookup).
All the sessions of the conference are now available online