Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
02-22-2022 07:38 AM
I've been working on an import script (APOC based) to input the contents of a customer built Excel spreadsheet and built the nodes & relationships. So far that has gone well for the initial import and creation of the graph. As this spreadsheet is updated monthly by the customer, I originally was going to use a modified version of the import script to update the graph. I should be able to check for the existence of the nodes and relationships just fine (ON CREATE, ON MERGE, etc.), however how should I address if the prior node/relationship no longer exists in the update (likely possibility)? It almost sounds like the better (easier) option is just to rebuilt the graph from each monthly update from the customer. This is basically a reverse parsing situation (parse the graph and compare with the spreadsheet).
Your thoughts?
02-28-2022 01:16 AM
If you don’t want to delete all nodes and relationships, then create all of them again, you could add a property lastUpdated which is set during the update-import. After the import-update you could thus delete only the nodes which where not updated within the last 24 hours or so.
02-28-2022 08:40 AM
I do have a Last_Updated property on the nodes. Where things get interesting is information may not necessarily change for months/years so I don't want to induce the potential for false positives when doing a query due to old information still contained in the graph (e.g., a product no longer being tracked for obsolescence).
03-01-2022 08:54 AM
Maybe we had a misunderstanding. To make clear what I meant, let’s call the proposed property last_imported. As the last step of your import script you would then run something like
MATCH (n) WHERE n.last_imported < datetime() - duration({hours:24})
CALL {
WITH n
DETACH DELETE n
} IN TRANSACTIONS OF 1000 ROWS
03-01-2022 10:05 AM
Hmm, I hadn't considered this as the last step. You might be onto something. Thanks for the suggestion!
03-02-2022 12:20 AM
You’re welcome 🙂 Don’t forget to put an index on last_imported if you go along with this solution. That will speed up the deletion step considerably, I think.
All the sessions of the conference are now available online