Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
11-09-2020 07:54 PM
Hi there,
It is taking my server about 3 days to delete just 20,000 nodes in my graph, and they have at most 50 relationships each.
I am using Neo4j Python driver 1.7.6 to do this operation on an Ubuntu 14.04 LTS server with 32GB RAM running Neo4j Enterprise 3.5.19.
Here's my query:
MATCH (:AWSAccount{id: '1234'})-[:RESOURCE]->(:AWSVpc)<-[:MEMBER_OF_AWS_VPC]-(:EC2Subnet)<-[:PART_OF_SUBNET]-(n:NetworkInterface)
WHERE n.lastupdated <> {UPDATE_TAG}
WITH n LIMIT {LIMIT_SIZE}
DETACH DELETE (n)
RETURN COUNT(*) as TotalCompleted
UPDATE_TAG
is a timestamp like 1604973243
, and LIMIT_SIZE
is set to 100
. My code deletes nodes that don't have a given timestamp value, and as per this guide, I run the above query to delete in small batches until TotalCompleted
returns 0. My problem is that this process takes days. Any ideas on why this is and how I can fix this? The actual Python code that calls this query is here, and although it could clearly be written much more safely, I don't see this slow behavior on any of my other dozens of deletion jobs.
Before running my delete job, I have 21,209 nodes with a timestamp of 1604836804
:
I run the delete job in my Python script as described above and check the Neo4j web interface after a couple minutes while it continues:
It seems to have deleted about 17,000 nodes.
The count has been stuck here at 4300 for the past hour and I am very confused. My system memory usage stays around 31% usage out of a total 32GB, CPU hovers around 30%. When I manually run the delete query in the web interface, it works:
Here's my PROFILE:
I have read these links searching for an answer but have not found one yet:
Neo4j.log and query.log don't show anything. I'd like to avoid APOC if possible because I don't want to force users of my software library to need to install anything other than a fairly vanilla Neo4j setup. Any ideas? Thanks very much in advance.
11-09-2020 08:45 PM
What is your heap and max heap and page cache in Neo4j.conf?
Is there anything in particular about those 4300? They all have at most 50 rels?
You could delete the relationships individually, this might look like:
MATCH (:AWSAccount{id: '1234'})-[:RESOURCE]->(:AWSVpc)<-[:MEMBER_OF_AWS_VPC]-(:EC2Subnet)<-[r:PART_OF_SUBNET]-(n:NetworkInterface)
WHERE n.lastupdated <> {UPDATE_TAG}
WITH n,r {LIMIT_SIZE}
DELETE r SET n:DeleteMe
RETURN COUNT(*) as TotalCompleted
Then run something like
match (n:DeleteMe) Delete n limit {LIMIT_SIZE} RETURN
COUNT(*) as TotalCompleted
11-09-2020 08:53 PM
What is your heap and max heap and page cache in Neo4j.conf?
My neo4j.conf has heap size:
dbms.memory.heap.max_size=5g
and cache size in the file says...
# The default page cache memory assumes the machine is dedicated to running
# Neo4j, and is heuristically set to 50% of RAM minus the max Java heap size.
#dbms.memory.pagecache.size=10g
Admittedly I've never tuned these values. Any recommendations?
Is there anything in particular about those 4300? They all have at most 50 rels?
Nothing in particular, they have about 12 fields each and they all have at most 50 rels.
11-09-2020 08:58 PM
I would change that to explicit settings each of maybe 10g heap and 10g page cache. There are two settings for heap, change them both.
11-10-2020 12:53 AM
I upped the heap values to 10g but accidentally forgot to set pagecache - still hanged for half an hour with 8000 nodes left. I'm doing another run with pagecache set correctly and will report back.
11-10-2020 08:16 AM
I re-ran it with the new cache and heap settings and it's still stuck after 8 hours. Before I rewrite some code and give APOC a try, any other ideas?
11-10-2020 12:40 AM
Hello @alexchantavy
You can use APOC:
CALL apoc.periodic.iterate('MATCH (n) RETURN n', 'DETACH DELETE n', {batchSize: 1000})
Regards,
Cobra
11-10-2020 12:55 AM
You can use APOC
Hi, as said in my post, I'd like to try to avoid apoc if possible. I'll give it a try if nothing else works though.
11-10-2020 08:43 AM
Do you have indices on AWSAccount.id
and NetworkInterface.lastupdated
? Without indexes, a linear search would need to be done which would be painfully slow (depending on many nodes there are in the DB.)
Is there a reason why AWSAccount.id
are strings instead of numbers? Numbers would be faster.
NOTE: indexes are not a magic bullet. Don't index everything! Only index things that you are searching on (e.g. with conditions) or where there aren't many different values (e.g. booleans).
See: https://neo4j.com/docs/cypher-manual/current/administration/indexes-for-search-performance/
11-10-2020 09:05 AM
Do you have indices on
AWSAccount.id
andNetworkInterface.lastupdated
?
I have indexes on AWSAccount.id
but not on {any label}.lastupdated
. It's still odd how none of my other labels have this problem, but thanks for the idea: I'll do some testing with an index on lastupdated for just the NetworkInterface.
Is there a reason why AWSAccount.id are strings instead of numbers? Numbers would be faster.
Not really other than "someone else decided this a long time ago and now that's how it is" .
11-10-2020 09:12 AM
I think you can change them all to integers. (Not too hard.).
You might experiment with a clone of a DB using ints for id's instead of strings. Obviously, comparing two integers can be done in a few instructions, where as a string comparison is a loop. (To make a hash out of string, you still need a loop.)
I don't know what sorts of Java profiling tools can be used on Neo4J but maybe those can be used to find out where the bottlenecks.
11-10-2020 10:01 AM
No luck, it still hangs short of finishing the last 4000 nodes. I looked closer and it appears 5 of them have 50 rels, and the rest have fewer than 5 rels each. All of them have about 12 properties each.
I'm going to go stare at code some more
11-11-2020 10:49 AM
At this point I don't think APOC will help you as the issue is fundamentally due to the node in particular you are deleting, not the speed. You should try and maybe delete a few of them manually one at a time.
Did you try deleting the rels independently of the Nodes?
11-13-2020 08:48 PM
Have you tried PROFILE? I haven't tried it but it looks like it could help:
All the sessions of the conference are now available online