cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Multiple Node chain query by validity date

I have a problem creating a query that will travers two parallell Node chains and return the first OR second node in chain, based on validity date. The nodes are updated the day before valid date and I wan't to retrieve the current valid node.

MATCH (c:Country {alpha2: 'SE'})-[:IS_USING_RISK_FREE_INTEREST*1..2|IS_USING_SWAP_RATE*1..2]->(n) WHERE datetime(n.validTo) > datetime() AND datetime(n.validFrom) < datetime() return n

I'll get the following error
Invalid input '|': expected '0'-'9', a property map or ']' (line 1, column 68 (offset: 67))
"MATCH (c:Country {alpha2: 'SE'})-[:IS_USING_RISK_FREE_INTEREST*1..2|IS_USING_SWAP_RATE*1..2]->(n) WHERE datetime(n.validTo) > datetime() AND datetime(n.validFrom) < datetime() return n"

Code that almost works:
MATCH (c:Country {alpha2: 'SE'})-[:IS_USING_RISK_FREE_INTEREST|IS_USING_SWAP_RATE*1..2]->(n) WHERE datetime(n.validTo) > datetime() AND datetime(n.validFrom) < datetime() return n

But it just takes the first node in IS_USING_RISK_FREE_INTEREST relation and don't bother about validFrom date on that node.

Is this possible to do in one query?

1 ACCEPTED SOLUTION

So if I understand correctly, the two queries that work when run individually are these? is that right?

MATCH (c:Country {alpha2: 'SE'})-[:IS_USING_RISK_FREE_INTEREST*1..2]->(n) WHERE datetime(n.validTo) > datetime() AND datetime(n.validFrom) < datetime() return n

and

MATCH (c:Country {alpha2: 'SE'})-[:IS_USING_SWAP_RATE*1..2]->(n) WHERE datetime(n.validTo) > datetime() AND datetime(n.validFrom) < datetime() return n

Compared side by side and run at the same time as the combined query, they give different results than the two queries?

Ok so I built what I think may be a similar query in my dataset,

match p=(g:Gene {name:'H3C12'})-[:GENE_GENE|INVOLVED_IN*1..2]->(g2)
where g2.degree <10
return p

the combined query returns what I expected, however I did observe one interesting aspect when I separated it into two queries.

When I query just for GENE_GENE I get this

2X_7_7e5f1092e63219677689ad51add61179c656dabd.png

This provided an interesting clarification of what the cypher query is doing. The WHERE clause applies to the last node in a path, in other words the query can return two hop paths even when a node in the middle does not meet that WHERE criteria. I checked the degree values, UBB.degree=2008 and TRIM24.degree=221, but Cypher returned paths/nodes beyond them because (with an added relationship type) it found leaf nodes that met the WHERE criteria which were still within the path length restriction.

View solution in original post

8 REPLIES 8

Hello @per.frykhammar

Your second query is right, you could order the nodes which are returned if you want to be sure.

Regards,
Cobra

Agree with @Cobra, as I understand the requirements that last cypher should do what you want. Maybe there are additional yet to be understood requirements or it is giving the correct answer even though it isn't what you expected to see.

Could you illustrate an example set of nodes/paths from the dataset showing what you expected to be returned (but isn't)?

Thank you for your replies! The thing is that I have two chains with nodes (Labels: SwapRate and InterestRate). These nodes is updated on daily (SwapRate) or weekly (InterestRate) basis. They are updated in the afternoon, but they are not valid until the next day to be used in calculation. To fetch them in single queries is no problem, but when I trie to bundle them (to limit number of db access) I don't have a query that works that take into account that it may be the second node that is valid in the chain right now (a couple of hours during the afternoon / evening)


Red arrows are illustration of the valid nodes, but could also be the second node in both chains.

So if I understand correctly, the two queries that work when run individually are these? is that right?

MATCH (c:Country {alpha2: 'SE'})-[:IS_USING_RISK_FREE_INTEREST*1..2]->(n) WHERE datetime(n.validTo) > datetime() AND datetime(n.validFrom) < datetime() return n

and

MATCH (c:Country {alpha2: 'SE'})-[:IS_USING_SWAP_RATE*1..2]->(n) WHERE datetime(n.validTo) > datetime() AND datetime(n.validFrom) < datetime() return n

Compared side by side and run at the same time as the combined query, they give different results than the two queries?

Ok so I built what I think may be a similar query in my dataset,

match p=(g:Gene {name:'H3C12'})-[:GENE_GENE|INVOLVED_IN*1..2]->(g2)
where g2.degree <10
return p

the combined query returns what I expected, however I did observe one interesting aspect when I separated it into two queries.

When I query just for GENE_GENE I get this

2X_7_7e5f1092e63219677689ad51add61179c656dabd.png

This provided an interesting clarification of what the cypher query is doing. The WHERE clause applies to the last node in a path, in other words the query can return two hop paths even when a node in the middle does not meet that WHERE criteria. I checked the degree values, UBB.degree=2008 and TRIM24.degree=221, but Cypher returned paths/nodes beyond them because (with an added relationship type) it found leaf nodes that met the WHERE criteria which were still within the path length restriction.

Thank you for this and that explains why a [:XXXXX1..2|YYYYY1..2] doesn't work with a general WHERE clouse that should be applied on both nodes with relations. Then I will use two MATCH clauses but in one transaction.

Applying the above advice for your query, this should work:

MATCH path = (c:Country {alpha2: 'SE'})-[:IS_USING_RISK_FREE_INTEREST|IS_USING_SWAP_RATE*1..2]->(n) 
WHERE all(node in nodes(path) WHERE node = c OR datetime(node.validTo) > datetime() AND datetime(node.validFrom) < datetime()) 
RETURN n

If you need to apply a restriction to all nodes in the chain, you can add a path variable to your MATCH pattern, and use the all(), any(), none(), or single() predicates to apply to nodes(path) or relationships(path).

For your query, to apply to all nodes in the path, you could use:

match p=(g:Gene {name:'H3C12'})-[:GENE_GENE|INVOLVED_IN*1..2]->(g2)
where all(node IN nodes(p) WHERE node.degree <10)
return p

Also, you don't have to set degree as a property on the node, you can use size((node)--()) as degree to get the degree of rels on the node (you can also include type and/or direction to the pattern).

Thank you Andrew for providing this solution. The WHERE all(loop) structure could come in handy in other situations too!

Regarding degree, yes in a Cypher query degree is readily available without the need for a property.

However, I work with graphs where high degree counts are common, and have found it very useful to store the degree on nodes just for the Neo4j browser experience. Having this property on the nodes can be very helpful for users manually exploring a graph with the browser, it provides a way to know how many edges are on a node prior to clicking on "Expand child relationships". The browser experience can degrade significantly when it hits the max node count on some client computers. Also the degree value reveals if the browser is not showing all the relationships for a node (due to the browser max limit settings).

Which reminds me, there is a browser feature request I've had in mind, but probably haven't filed. I wish the left click node menu would show a list of the available relationships and rel count for each (perhaps with total degree at the bottom) Maybe something like this?

GENE_GENE 4505
INVOLVED_IN 23
degree 4528

choosing one from the list would open up those relationships, but even just having the browser display the degree for the node in the left click menu would eliminate the need for me to store it.