cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

How to keep data up-to date in Neo4j?

Hello Team,

I am looking for a suggestion on keeping the data up-to date in Neo4j. I mean to say - If data is loaded in Neo4j lable either using CSV or JDBC DB connection on day 1.
On Day 2 , Some data is removed from the data source ( either from csv or DB ) but it will remain in Neo4j as a stale record.
So I am looking for a solution which can handle this type of stale record from Neo4j and keep the neo4j in sync with source data.

Thanks in Advance!

Best Regards
Akshat

4 REPLIES 4

Please suggest on this !!

Can anyone help in this??

If you are always applying your entire data source to Neo4j on some regular interval, then you need some means to track when the record was last touched.

If the presence of a node or relationship is the only thing you need to track, then when you MATCH or MERGE to existing graph structures, then you can update some property, like lastUpdate to a temporal value of some sort (date or dateTime), and index this value. You will need some kind of general label for all nodes in the database in order to apply this index. And of course you would initially set this property when the nodes were first created.

At some periodic interval (whatever makes sense for you) you can MATCH to all nodes that haven't been updated in some reasonable window, and you can infer that means the nodes haven't been touched by the update, and are likely gone from your source data, so they can be deleted.

If there can be many nodes or relationships that need deletion in this way, then you can leverage apoc.periodic.iterate() from APOC Procedures to batch the deletes.

Or if you need something more automatic, you can leverage GraphAware's Expire module to handle this.

Now if you have to manage more granular data, such as on the property level (a change in your source data removes a property, but the node remains) then the above approach won't be able to cover that.

In that case you either need to get a list of changes that were made on your data source so you can apply those changes to Neo4j, or since you need to recheck all nodes and properties anyway, just start over with a clean database each time and rebuild it all.

pdrangeid
Graph Voyager

I have quite a bit of data I query via JDBC (and CSV/XLS) and I will typically solve this with a few methods:

1: Store a node that timestamps the LAST time I ran an import procedure:

MATCH (dl:Dataloader {name:'mydatasource'})
WITH dl.lastrun as lastimporttimestamp
set dl.lastrun=timestamp()
....

Then in my JDBC query
SELECT property1,property2,property3 WHERE createdate > ''+lastimporttimestamp+''

So this gives you ONLY the newer values. (I'll perform a similar check using WHERE modifieddate >)
Obviously this only works if your datasource has create and modified datestamps stored that you can use to narrow down the dataset.

Then I will have a more complex "modify" procedure that performs several OPTIONAL MATCH operations to find existing nodes, relationships, and properties, and then modifies/adds/removes nodes/relationships/properties using FOREACH:
FOREACH (ignoreMe in CASE WHEN coalesce(x.property,'isnull') <> coalesce(row.property,'isnull') THEN [1] ELSE [] END | SET x.property=row.property)

This does add a LOT of complexity to the code, so I only do this if the datasets are larger, or I want to perform very efficient updates frequently.
I use this method to update my graph with Ticket information from our CRM system to perform automation on tickets because I want to freshen my neo4j DB every 5 minutes, but it would be impractical to re-import thousands of :Tickets and charges every 5 minutes.

But if I import data nightly from a static CSV file, I may do something a bit more blunt. Prior to the import I'll mark all the nodes as unverified:
MATCH (n:Mynodelabel) set n.unverified=true

Then as I do a MATCH or MERGE as I ingest the csv with that node within the code, I will perform:
REMOVE n.unverified
This indicates, ok this node is good, we found it again in the CSV.

Finally at the end of the import process I will just get rid of unverified nodes (as they must no longer exist):
MATCH (n:Mynodelabel {unverified:true}) DETACH DELETE n