Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
01-02-2019 10:09 PM
I have been having trouble with doing a mass delete of data observed on the first day. I am trying to delete around 20-30 Million nodes each day to cycle my data.
I have 3 proposed methods and they are all too slow.
method 1:
step 1 -Mark all the nodes to be deleted in parallel using apoc.periodic
step 2 -Detach delete user nodes using apoc.periodic (non parallel)
method 2:
step 1- Mark all the nodes to be deleted in parallel using apoc.periodic
step 2- using apoc.periodic + apoc.nodes.delete (non-parallel)
method 3:
step 1- Mark all the nodes to be deleted in parallel using apoc.periodic
step 2- Delete all the relationships using apoc.periodic (non-parallel)
step 3- Delete all the disconnected marked nodes in parallel using apoc.periodic
Right now my estimates are to delete 20 Million nodes using method 1 will take 55 hours.
Method 2 is even slower
Using Method 3 it will take 6 hours to delete all the relationships and I am testing how long to delete all the nodes in parallel now that the relationships have been removed.
Looking for thoughts and ideas on improving this. Seems like Neo4j struggles at scale.
01-03-2019 04:01 AM
Hi @benjamin.squire ,
There is a trick to delete large dataset from neo4j.
simply use LIMIT with each delete and perform this operation at 100k records once.
you have to give manual efforts here but it would be very much faster .
01-03-2019 07:39 AM
Just curious to know why have to delete such 20M data and relationship everyday.
[1] If so i believe the Data Model needs to be revisited.
[2] If cost and time factors allows , can we create a one more GraphDB and just take only the needed data and relationship and import to the new GraphDB instead of messing up with existing one (assume the data is not sensitive , storage cost may not be big constraint)
[3] I think any DB kind of this operation wud take time
01-03-2019 10:12 AM
@12kunal34 I have deleted in this fashion using limit 1000 and 10000, it takes 8-10 seconds for 1k and 80-90 seconds for 10k deletions and detaches. Hence, the time estimates of 55 hours, further apoc.periodic commit or iterate should batch the same as limit does and so should be considered equivalent.
@senthilc78 the reasoning of deleting 20M records per day is related to the amount of data I am dealing with and the length of time. I need to maintain at least 1 year of data of all clickstream records for 150 websites. Right now at 10 Months the data is 7.5 Billion records and 15 Billion relationships. When I hit 12 months it will be getting higher, the fact is I am using community and have a limit of 34 billion rels and relationships, in a previous data model I already hit that amount at 10 months. Since then I have revised the model to bare minimum needs. The 20M records per day relates to deleting the first day of logs a year ago. Since I am adding about 20M records per day and will be getting closer to the 34 Bill limit (plus hardware constraints of my machine) I need to be able to recycle that same and delete some of these old logs which we identify as no longer required.
01-03-2019 01:31 PM
It's not a technical solution but it does solve getting around the size limit of community, purchase an enterprise license. Then you won't have a size limit and don't need to prune your data. It would also open you up to more flexibility such as clustering as your data grows. Otherwise 2 hrs to prune that much data isn't that bad, you could consider moving to an hourly pruning schedule if daily is too large.
01-04-2019 11:56 PM
Got it Benjamin . Thanks for the details. so you made me thinking on Time to live nodes . I cud see some thirdparty plug in for that but believe using trigger and common counter with little change in data model we could achieve in community edition as an alternative i think. Let me check that out in detail and update. Thanks!
01-07-2019 01:50 PM
What are your batch sizes for period.iterate calls?
If you observe the machine is it bound on IO or CPU?
What are the degree's of the nodes you are deleting?
Are those nodes related/connected to each other or unrelated?
I will try to reproduce your use-case and see how it behaves on our test machine.
01-07-2019 01:58 PM
Also what is your heap config and your page-cache size?
And how do you invoke the procedures?
01-07-2019 02:34 PM
@michael.hunger the batch size I tested was for 1k and 10k. Using method 1 detach delete node in series batches took 8-10 seconds for 1k and 80-100 seconds for 10k. When I tested apoc.nodes.delete (method 2) it was even slower 15-17 seconds for 1k and 10k took almost 2.5 mins per batch. Method 3 where I delete rels in batches was much faster, 1k took 5 to 6 seconds and 10k took like 30 seconds. The estimated time for Method 1 is around 50 hours ((20 MM / 10000 node batches) * 100 sec = 50 hrs). Method 2 was even slower. and Method three finished deleting all rels in 7 hours and finished deleteing the remaining floating nodes in 9 hours.
I don't have the exact number degree of each node but It should have an average of 4 nodes to each user, given there were 5 MM users on the first day and 20 MM connected nodes, but I will have to do some sampling to give a direct answer. They do have some connecting third party nodes.
dbms.memory.heap.initial_size=25g
dbms.memory.heap.max_size=25g
dbms.memory.pagecache.size=165g
total machine memory is 244 GB and all of it is dedicated to the neo4j server
This is Method 3 (deleting rels in series and deleting nodes in parallel that remain)
Call apoc.periodic.iterate("Match (u:User) where u.last_obs < datetime('2018-01-03') return u","Match (u) with apoc.coll.toSet([(u)-[:OBSERVED_WITH]->(a) | a]) + collect(distinct u) as nodes unwind nodes as n Set n:MarkDel",{batchSize:100,iteratelist:true,parallel:true,retries:3});
Call apoc.periodic.iterate("Match (u:MarkDel:User)-[r]->() return r", "delete r",{batchSize:10000});
Call apoc.periodic.iterate("Match (u:MarkDel) where size((u)--()) = 0 return u", "detach delete u",{batchSize:1000,iteratelist:true,parallel:true,retries:3});
The reason I markDel but only delete rels from Users is: 1.) because users are the central connector in the graph hence everything that is connected has a rel from a user so it looks like (User)-[r]->(id)
2.) I don't want to delete all (id) connected to Users markDel, because some (id) also touch users who were more recent, i.e. an (id) that connects to a user from the first day was also observed connected to a user yesterday, I don't want them to be deleted so the third iterate looks for orphaned nodes after I delete all the users which connect them. Note - there is no race condition once all the nodes rels are deleted because they all have size 0.
01-07-2019 11:27 PM
Did you observe the machine during the deletion process to see where the bottleneck was?
Also check with iotop how much write throughput is happening.
For single threaded deletes you can also increase the batch-size to 100k to 10m.
One other thing that could help is to order the to be deleted rels by id but I have to try that first.
One other thing one could do is to run a clustering algo on the :MarkDel nodes and delete clusters (nodes and rels) in parallel.
01-10-2019 11:26 AM
I checked Iotop and these are the results. I upped the delete rels from 1000 to 1000000 in batches in series:
Total DISK READ : 23.07 M/s | Total DISK WRITE : 1703.66 K/s
Actual DISK READ: 23.01 M/s | Actual DISK WRITE: 0.00 B/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
50759 be/4 neo4j 10.91 M/s 46.25 K/s 0.00 % 73.48 % java -cp /var/lib/neo4j/plugins:/etc/ne~=/var/lib/neo4j --config-dir=/etc/neo4j
46510 be/4 neo4j 11.14 M/s 34.69 K/s 0.00 % 70.67 % java -cp /var/lib/neo4j/plugins:/etc/ne~=/var/lib/neo4j --config-dir=/etc/neo4j
42817 be/4 neo4j 1040.70 K/s 1622.72 K/s 0.00 % 15.63 % java -cp /var/lib/neo4j/plugins:/etc/ne~=/var/lib/neo4j --config-dir=/etc/neo4j
43008 be/4 neo4j 0.00 B/s 0.00 B/s 0.00 % 0.00 % java -cp /var/lib/neo4j/plugins:/etc/ne~=/var/lib/neo4j --config-dir=/etc/neo4j
Are these expected results? Do I need a higher IOPS SSD- currently it is a 16000 IOPS EBS
Tasks: 344 total, 2 running, 182 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.2 us, 0.1 sy, 0.0 ni, 97.3 id, 2.4 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 25174608+total, 1702016 free, 20220547+used, 47838596 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 47205400 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
42623 neo4j 20 0 210.2g 192.0g 3656 S 25.9 80.0 1961:36 java
01-10-2019 02:29 PM
Those numbers are not really high.
I thinnk we should at least see 10x (250MB/s) on IO.
cpu utilization is zero only io/wait.
I need to test this on our machine to see how it behaves there.
01-10-2019 03:20 PM
I have reviewed my EC2 and everything appears to be correct, Neo4j is highly underutilizing the available hardware.
ec2 - 32 vCPUs - 244 GB Ram - 8 TB 16000 IOPS Provisioned SSD - r4.8xlarge
Instance type EBS-optimized by default Maximum bandwidth (Mbps) Maximum throughput (MB/s, 128 KB I/O) Maximum IOPS (16 KB I/O)
r4.8xlarge Yes 7,000 875 37,500
These instance types can support maximum performance for 30 minutes at least once every 24 hours. For example, r4.8xlarge
instances can deliver 875 MB/s for 30 minutes at least once every 24 hours.
01-10-2019 03:35 PM
I have reached out to AWS support team to see what their opinion on this is.
01-10-2019 05:19 PM
I forgot to mention one thing. Due to the way the compiled runtime works, it doesn't behave well in periodic iterate
that's why it's best to prefix the driving statement with cypher runtime=slotted
.
I have here tried to delete 1M rels, which took about 90s
Call apoc.periodic.iterate("cypher runtime=slotted Match (:MarkDel)-[r]->() return r limit 1000000", "delete r",{batchSize:100000});
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| batches | total | timeTaken | committedOperations | failedOperations | failedBatches | retries | errorMessages | batch | operations | wasTerminated |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 10 | 1000000 | 91 | 1000000 | 0 | 0 | 0 | {} | {total -> 10, committed -> 10, failed -> 0, errors -> {}} | {total -> 1000000, committed -> 1000000, failed -> 0, errors -> {}} | false |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row
92048 ms
I'm recording my experiments here:
01-11-2019 03:25 AM
I now deleted 82291607M rels on our machine and it took. 2hrs 44 minutes
With disk I/O going up to these levels.
Total DISK READ : 0.00 B/s | Total DISK WRITE : 772.07 M/s
Actual DISK READ: 0.00 B/s | Actual DISK WRITE: 1790.73 M/s
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| batches | total | timeTaken | committedOperations | failedOperations | failedBatches | retries | errorMessages | batch | operations | wasTerminated |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 823 | 82291607 | 8677 | 82291607 | 0 | 0 | 0 | {} | {total -> 823, committed -> 823, failed -> 0, errors -> {}} | {total -> 82291607, committed -> 82291607, failed -> 0, errors -> {}} | false |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row
8677673 ms
Now off to delete the nodes.
02-12-2019 11:07 PM
we did it like the following codes and took a lot of time, so we tried Michael's guide, but it took a little longer than our methods below:
MATCH (n:News)-[r]-(w:Word2)
WITH r LIMIT 10000
DELETE r;
#!/bin/bash
counter=1
while [ $counter -le 50772 ]
do
echo $counter
time cat cypher/delete.cypher | bin/cypher-shell -a bolt://192.168.1.:* -u neo4j -p "********"
((counter++))
done
02-24-2019 11:29 PM
Btw. I realized that I had too little page-cache configured on that machine, that's why it took much longer than expected. If you can up your page-cache that works better. Also you can delete more of it, e.g. 100k per batch.
I would have used APOC's periodic.iterate instead so you don't start a new driver/connection for every query.
In graph-algos we're currently trying to use clustering (e.g. unionFind) to find independent clusters which then can have their relationships created/deleted in concurrently. That might also be an option for you.
04-11-2019 12:24 PM
In graph-algos we're currently trying to use clustering (e.g. unionFind) to find independent clusters which then can have their relationships created/deleted in concurrently.
@michael.hunger does this imply that relationships should be able to be deleted in parallel, if neither of the relationship's nodes are also being deleted? I tried to run a parallel delete of relationships via apoc.path.subgraphAll
to extract fully disjoint subgraphs (which I had thought would be similar to your above suggested unionFind approach), but got NullPointerExceptions
in the logs.
e.g.
CALL apoc.periodic.iterate(
"MATCH (metadata:METADATA)
WHERE metadata.uri STARTS WITH $sourceUri
RETURN metadata",
"CALL apoc.path.subgraphAll(metadata, {relationshipFilter: 'METADATA>,RELATIONSHIP'}) YIELD relationships
UNWIND relationships as r
DELETE r",
{ batchSize: 1000, iterateList: true, parallel: true, concurrency: 50, params: { sourceUri: $sourceUri } }
)
04-12-2019 05:04 PM
@jameslaneconkling What @michael.hunger was talking about is when you try to delete in parallel, the reason you get null.point.exception error's is basically because there is a race condition if you can't guarantee segregation of the nodes and relationships per thread. when he mentioned UnionFind, he is saying if you can basically cluster all the data (e.g. running subgraph all on all of your data) and then use periodic iterate so that the Delete Rels per thread only within a given community, you can avoid the race condition.
What you are doing is gathering all the rels for each of the nodes from your first query, but then on successive iterations, some of the nodes might have been deleted because they were connected to your subgraphAll in one of the batches. What you want to do to delete in parallel is make sure that for each delete r unwind, no other threads touch the data that is being deleted in a different thread.
Hence, run UnionFind with writes on all data, it writes a property community (write:true) or streams (UnionFind.Stream()) those clusters to a periodic iterate, then delete in parallel the clusters that show up in periodic iterate so that for each thread the batch contains data exclusive to that thread and can't be accessed by another thread.
Hope that helps
05-24-2019 01:30 AM
Hi Benjamin,
Did AWS get back to you on why the disk was being underutilized? We are seeing 1/10th the max throughput on our Neo4j box, even though iotop says 99% for most threads.
Any ideas?
05-24-2019 10:49 AM
Yes, get a different EC2 type.. After working for a few months on various boxes I found the only way Neo4j is performant is on i3 type boxes. EBS will not perform well under high IO loads and it throttles it based on size of box, i.e. 16xlarge types get less of a throttle. Using local instance storage, although ephemeral and a bit of a pain to back up to ebs/s3 as a GZIP, is 10-20x faster. If you don't want to use i3 then look at the nitro boost systems. We did not go that route because the m5 and c5 instance types with nitro boost didn't offer enough ram and has too much CPU for us to use properly. i3 or i3en are your best bets. I get reads and write exceeding 1000M/s whereas with EBS on a 32000 IOPS which are very expensive I might add, were only gett 100k/s if that.
05-26-2019 09:21 PM
Thanks benjamin, that helps a lot. We will be using equivalent machines on Azure (Ls_v2). Out of curiousity, how do you managed your database backups? If I understand correctly, having a non ebs store means that your data is much more susceptible to data loss.
05-27-2019 10:32 AM
Yes, if the EC2 freezes or bugs out the db needs to be started from scratch. We are working on our own auto deployment but at a minimum of keeping data backed up use pigz or zstd on a daily basis, upload it to s3. I ran a compression test between zstd and gzip with a 170 GB tarfile
zstd - start 16:58, end 17:28, size 58G
gzip - start 18:01, end 20:38, size 56G
Note as you hit TB it will inevitably get slower.
Alternatively you could just upload the database directly to s3. As long as it is stored in the cloud somewhere it is not too bad.
05-27-2019 07:42 PM
Ah, we used to do something similar. But with 2.3Tb of data its painfully slow. Now we run neo4j-backup on another machine and snapshot that disk. We keep 3 snapshots around incase one of them is corrupt.
05-27-2019 10:58 PM
How does your neo4j-backup on another machine and snapshot work? Could you describe the process, I worried about the back up and we have not scaled to the full db size yet which makes me worried for when we reach out 2 + TB db mark
05-28-2019 03:30 AM
The backup tool can be run on an external machine, as long as the backup port of your production neo4j is accessible to that machine. Most of the workflow is managed through a buch of lambdas, but in a nutshell here is what happens:
neo4j-backup -from $EXTERNAL_IP
on the newly allocated boxYou can choose to keep the backups lying around on your EBS volume rather than taking a snapshot. We mainly do it to save cost as 4TB of SSD is much more expensive than storing a 2.3TB snapshot. If you have a large database, you also want have decent bandwidth on your backup machine.
05-25-2019 12:40 AM
Thanks Ben,
We have a database that is 2.3TB in size so your suggestion will really help us manage the scale.
All the sessions of the conference are now available online