Neo4j

SvenBarbarian · ‎09-22-2022

I have designed a Cypher query that:

- checks if a parent node has a child node of certain relationship, if not creates child node
- checks if the parent node has a child node where property does not match new data, if so it creates a new child node
- returns the old child node, if existed
- returns the new node

My problem is when I invoke this when no prior child node existed, sometimes it returns the newly created node as the old child node object, sometimes it returns what is expected: null.

Looking at the query I cannot see how this could happen, unless somehow the query is executed twice, first time creating the new node, 2nd time seeing the newly created node as the old child node.

I wanted to ask if anyone can see a reason for this, perhaps there is an error in my query?

MATCH (parent:rateTable { rateTableId: "test" } )
OPTIONAL MATCH (parent)-[currentRel:  current_rateTableRates ]->(currentVersion)

FOREACH(ignoreMe IN CASE WHEN currentRel IS NULL THEN [1] ELSE [] END |
        MERGE (newVersion:rateTableRates { rate:  10, forMatchedId: "xx" } )
)      	
WITH currentVersion, currentRel

FOREACH(ignoreMe IN CASE WHEN currentRel IS NOT NULL AND currentVersion IS NOT NULL AND currentVersion.forMatchedId <> "xx" THEN [1] ELSE [] END |
        MERGE (newVersion:rateTableRates { rate:  10, forMatchedId: "xx" } )
)
WITH currentVersion

OPTIONAL MATCH (newVersion:rateTableRates { rates: 10, forMatchedId: "xx" } )

RETURN currentVersion, newVersion

glilienfield · ‎09-22-2022

Is there an error on line 14, where you have 'rates' instead of 'rate'?

The logic is a little odd. I am not sure what the intent is, but I think you can get the same result with following simplification. The reason I believe so, is that you are merging the new node whether the result is null or the result is null and the property is missing. As such, the node will get created in those two cases. In the case where the result is not null and the property exists, you already have the node your are returning, so it will be returned. So, in every case you will get back the following node: (:rateTableRates { rate: 10, forMatchedId: "xx" } ).

MATCH (parent:rateTable { rateTableId: "test" } )
OPTIONAL MATCH (parent)-[currentRel:  current_rateTableRates ]->(currentVersion)
MERGE (newVersion:rateTableRates { rate:  10, forMatchedId: "xx" } )
RETURN currentVersion, newVersion

I always got back a null value for currentVersion if it did not exists, regardless of the number of times I re-executed the query. I would not expect the behavior to be different each query based on the query's logic.

SvenBarbarian · ‎09-25-2022

Thanks for the response, I made a typo for "rates" when simplifying the query for posting.

My goal is to created an immutable record of changes ("version"'s), these are linked to the parent node by a "has" relationship (not important in this query), and the most current version is linked to the parent by a "current" relationship.

When client submits a change I want to create a new node when either:
- no version exists, ie: no "current" relationship exists for the parent
- a "current" relationship exists but points to a version that does not match the submitted version's properties
- old (non-current) versions might match the properties of the new version, but if the new version does not match the current version, then a new node should still be created

I want to return both the old version and the new version to test what properties, if any, changed.

My design at the moment runs one query to create the new node when above requirements are met, then I have a second query that sets up the relationships, eg the current relationship will point to the node with the most recent timestamp.

I considered using MERGE but I feel I cannot encompass all above requirements in an single query, eg I believe the below query would create the node if the current version did not match the new version:

MATCH (parent:rateTable { rateTableId: "test" } )
OPTIONAL MATCH (parent)-[currentRel: current_rateTableRates ]->(currentVersion)
MERGE (parent)-[currentRel]-(newVersion:rateTableRates { rate:  10, forMatchedId: "xx" } )
RETURN currentVersion, newVersion

But it would not create the node if currentRel does not yet exist?

The query I originally posted seems to meet my requirements, handling the two conditions (forMatchedId is a hash of the version properties and is used to test if existing version properties match the old ones), but we are experiencing the race condition-like error I originally posted, any idea why sometimes the newly created node is detected in the returned currentVersion?

It would be nice to simplify this into simple MERGE's but I can't yet figure it out..

glilienfield · ‎09-25-2022

Stepping back first, can we simplify your approach. Would your requirements be met if you created a linked list of nodes, where the head node is the current version and each element of the list represents the historical value of the elements. They are linked in time sequence and you stamp each with a create date when entering the new node and stamp each with a modified date when moving from being the current node to a historical node.

The process to consume an event from the client would be to find the head node corresponding to the new event and compare hash codes to detect a difference. Do nothing if there is no difference. Do the following if there is a difference: 1) add a label to head node to identify it as an audit or historical node, 2) add a new property like 'end date' with the current time/date, 3) create a new node for the event, and 4) relate the two nodes with a relationship like 'HAS_PAST' or 'PREVIOUS_EVENT'. The list is immutable if you don't change anything, but only keep adding to the head. You can determine the value of any property at any point in time by traversing the linked list and extracting the properties.

I would image you could experience race-conditions in a highly concurrent environment. Consider you have a scenario where you want to find a node and create a new one if it does not exist. What would happen if two separate events occur looking for the same non-existing node. If the timing is real tight, each would not find the node and create their own version of the node. This would result in duplicate nodes. The solution would be to have a unique constraint on the unique identifier, so two nodes can not create the same node when not found. Even so, the failing event will need to handle this exception and retry. this time the node will be found.

BTY, if you are using a language driver to implement this, you do not have to complete all of your operations in one query. You can have multiple queries within the same transaction function.

SvenBarbarian · ‎09-26-2022

A linked list is another way to implement idempotent versioning but I don't think it's a solution to what I am experiencing.

I initially considered a linked list but felt it was not needed, traversing a history of edits node by node did not seem important for my application, most important was an efficient link between parent and the current version.

I believe your suggestion does not solve creating a node when no version node exists, at least not in a single query?

I could probably chain queries in a transaction but thought something like my original query gets it done cleaner, however hit the unexpected return values.

In my case this is a test environment that receives a single Cypher query, no concurrency, but somehow it finds the node it created itself as already existing in line 2's OPTIONAL MATCH.

I am running on AWS Lambda using Node and neo4j-driver, there is a chance it is due to duplicate communication reaching neo4j in this environment, it happens every few invocations, I thought I would check if it was a query error before hacking my code to handle it..

glilienfield · ‎09-26-2022

Got it, so let's go back to your original query/approach. I also don't see how 'currentVersion' is not always null, if you don't have an existing relationship to a parent. Even if you execute the query multiple time, the result should be null. The main reason being that the new node you create is never related back to the parent in this query. Are you running your second query that establishes the relationship at anytime during this testing?

Sorry for all of my confusion.

SvenBarbarian · ‎09-27-2022

At the time of running this query it is unknown if an existing link exists, it might be null or it might point to an existing rel/node (which it then needs to check if forMatchedId matches).

Yes, there is a separate query that runs and sets the "current" to the node with the most recent timestamp.

I partly run this as a separate query because the timestamp I use is not the time the node was created, but the time the client sent the request, (an additional property stored in the node), so there is a chance that a newly created version was sent earlier than a version that already exists (race condition in our distributed async system), but my design will always link "current" to the version sent last by a client.

For my original query this should not be important, all that is important is a "current" version may or may not exist, if it doesn't we create it, if it does and forMatchedId does not match we create it.

Neo4j

Cypher returning a node object when node should not exist