Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
01-10-2023 11:36 PM - edited 01-10-2023 11:38 PM
Suppose the data are list below
Create(buyer:Buyer{buyid:data.buyid, date:data.date}
Create(seller:Seller{sellid:data.sellid, date:data.date}
Math(n:Buyer) Match(p:Seller{sellid:n.buyid}) Create (p)->[sell]-(n)
i.e, the database of buyer-seller-graph.
However, since my data was collected over time and import into Neo4j in each day and delete the old data which already 30 day left. I.e., for example, suppose today is 2023/01/31, then I would :
(1) Delete 1/1 data (Neo4j exist 1/1~1/30 data already)
(2) Import 1/31 data
The following cypher I used to delete 1/1 data
CALL apoc.periodic.iterate(
'MATCH (n) WHERE n.data="20230101" RETURN id(n) AS id',
'MATCH (n) WHERE id(n)=id DETACH DELETE n',
{batchSize: 5000, parallel :true});
The following cypher I used to add 1/31 data (in fact the following all use apoc.periodic.iterate)
Call{
Create(buyer:Buyer{buyid:data.buyid, date:data.date}
Create(seller:Seller{sellid:data.sellid, date:data.date}
}
Call{
Math(n:Buyer{date:'20230131'}) Match(p:Seller{sellid : n.buyid, date:'20230131'})
Create (p)->[sell]-(n)
}
Remark :
(1) There're billions data neo4j already (about 200 million in each day)
(2) There're no any duplicate node (although the data list I presented are duplicate in sellid column, but there're no duplicate seller node in neo4j since I deduplicate data in python already and then import into it.)
Problem :
Although "Create (p)->[sell]-(n)" can avoid making any duplicate relation that already present in 1/1~1/30 in Neo4j, but it take lots of time (5 hours) compare to create new node (only 30mins). I realize that this difference was result from create node do not need to travel entire data that already in neo4j (1/1~1/30) but only my data (1/31). However, in order to create relation Math(n:Buyer{date:'20230131'}) must travel entire graph (including 1/1~1/30) and then find the specific property which is the key that result in long-time-spending.
Question : How can I reach my result and do not need travel the entire graph (or anything that can improve my performance, i.e, less time?)
Thanks.
All the sessions of the conference are now available online