Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
12-08-2020 05:23 AM
I apologize if this is a common question, but I haven't seen any clear and definitive answer so I figured I'd ask..
Part of our structure requires that we track changes to a given entity - lets say, a "person". Now, this person may have a name, ID, or a lot of other things it's "related to", but each of them have the potential to change. I.e. in the event of a name change, the previous name must be in record, not "updated in place".
In order to do this, I had planned to create an "empty" node called "Person" with a relationship to a node called "Name" so that the relationships could store the time at which the name had changed, thus keeping all records intact. The idea behind this is also that each person will be linked to at least one unique identifier (ID form), lets say Drivers License Number. As this may also change, expire, etc, it would be stored in it's own "DLN" node. In the event that I wish to search by DLN, I do not want to find two people linked to that DLN, but rather one person with a current and previous name. (If that makes sense..)
Can anyone either correct this logic or help me fill in the blanks on how to create one central Node which is absent any properties of it's own..? (Maybe I'm still thinking too.. "relational"?)
Thanks in advance for your time/willingness to help!
Solved! Go to Solution.
12-09-2020 08:23 AM
For a new person you could do:
CREATE (p:Person)-[:HAS {since: "2020-12-09"}]->(:Name {first: "Joe"})
MERGE (p)-[:HAS {since: "2020-12-09"}]-(:DL {number: "12345"})
return p
Repeat the MERGE statement for all other properties for that person.
Of course you can never match on the (empty) Person node itself, but would have to match on some of the properties you created:
match (n)-[:HAS]-(p:Person)-[:HAS]-(dl:DL {number: "12345"})
return p,dl,n
will return all the nodes connected with that person.
12-08-2020 11:19 AM
It depends on your use case.
If your use case is that of fraud detection, where it's important to find how Nodes share things like addresses, phone numbers, SSN, Driver Lic, etc, then it makes a lot of sense to do that. (There are a number of Videos on this use case. Especially interested in the Paradise Papers DB.). Here Nodes represent possible fraud entity.
If your use case isn't of fraud detection, then it probably makes more sense to keep these pieces of ID as part of the properties (unless you care about the connectivity). Here a Node represents are real person.
12-08-2020 07:05 PM
Indeed, fraud detection would be something we'd want to be able to do in the future. For right now, it's sort of a proof-of-concept. I guess I was hoping for a bit of cypher syntax to show how one would do this, but I guess videos work too. Would you be able to recommend a specific one that might help?
Much appreciated!
12-09-2020 08:23 AM
For a new person you could do:
CREATE (p:Person)-[:HAS {since: "2020-12-09"}]->(:Name {first: "Joe"})
MERGE (p)-[:HAS {since: "2020-12-09"}]-(:DL {number: "12345"})
return p
Repeat the MERGE statement for all other properties for that person.
Of course you can never match on the (empty) Person node itself, but would have to match on some of the properties you created:
match (n)-[:HAS]-(p:Person)-[:HAS]-(dl:DL {number: "12345"})
return p,dl,n
will return all the nodes connected with that person.
12-09-2020 05:44 AM
You can certainly have an "empty" node and treat all properties as nodes. This is similar to the RDF model. But I'm not sure that is the best model for your auditing/history requirements. You should probably focus on the requirements (i.e. audit patterns, change history, etc.).
I haven't got a chance to try this, but it might help https://graphaware.com/products/audit-module/
12-09-2020 08:40 AM
Much appreciated Klaus! That clears a lot up for me. Thank you 🙂
12-09-2020 08:46 AM
what I do in a somewhat similar case of longitudinal data:
I would create name, dln, etc. properties in the empty Person node
and SET them every time a property node was changed, so the Person node always shows the current properties...
12-09-2020 09:05 AM
Is duplication of data not seen as a caveat? I'd thought about a similar approach but had considered it to be carried over from my MySQL structuring with unique identifiers and foreign keys. If I understand you correctly, when an "update" comes, it would then place new properties in the property nodes, create a relationship between them, then update the person node with that same information - meaning that it could be obtained either via a direct person query or via traversing the relationship tree. Is this seen as "best practice"?
12-09-2020 09:15 AM
Yes, exactly like that.
since this is done in the same transaction the 'duplicate' data items will always be the same,
no danger of inconsistencies. And it will simplify the search for current properties by not having to go through all possible HAS links of all possible property nodes to find the most recent one.
I think it is good practice from a performance and clarity point of view. Purists' opinions may differ...
All the sessions of the conference are now available online