cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Variable path ordering

I have a relationship that is basically a linear parentage chain. A node can only have a single incoming relationship of this type. This can be arbitrarily deep, but reasonably doesn't get deeper than ~10.

example data
(a:Asset {id:1})-[:PART]->(b:Asset {id:2})->[:PART]->(c:Asset {id:3})-[:PART]->(d:Asset {id:4})...

Given any node, I need to be able to return it's lineage all the way up to the top. However I need it to be ordered.

This query does the trick well:

MATCH (:Asset {id: '3'})<-[:PART*0..20]-(a:Asset)
RETURN a.id;

This returns me:
3
2
1

What I'm wondering is the order of this response guaranteed to be visit order?
Empirically I cannot find a result that is out of order, but I want to make sure that order is stable or find a different way to ensure the stable order.

Thanks in advance

1 ACCEPTED SOLUTION

Variable-length path matches should be using dfs, so for the examples provided your initial query should work just fine.

What may be throwing off your results for other kinds of queries is which node the planner uses to lookup as a starting node, since the results you would see would be depth-first from the starting node. Depending on the indexes available, it could be that you don't have an index that can be used for your desired starting node, or maybe in your pattern there are multiple places where an index lookup could be used, and the planner may not be choosing the one you want.

You can use EXPLAIN or PROFILE to see a plan of how the planner will execute the query, which you can use to see which node will be the starting node for the expansion.

If it's not what you expect, or if you need to use this in other scenarios and need to enforce which node is the starting node, you can use an index hint (provided you have created an index for the label/property) to force the planner to start there.

View solution in original post

3 REPLIES 3

I've now found scenarios where this does not come back ordered.

Any suggestions on how to get this ordered by traversal path?

The best I can come up with is this, which doesn't seem ideal:

MATCH p=(root:Asset {id: '3'})<-[:PART*0..20]-(a:Asset) 
WITH p ORDER BY length(p) DESC LIMIT 1
UNWIND nodes(p) as a
RETURN a;

Is there a better way?

Variable-length path matches should be using dfs, so for the examples provided your initial query should work just fine.

What may be throwing off your results for other kinds of queries is which node the planner uses to lookup as a starting node, since the results you would see would be depth-first from the starting node. Depending on the indexes available, it could be that you don't have an index that can be used for your desired starting node, or maybe in your pattern there are multiple places where an index lookup could be used, and the planner may not be choosing the one you want.

You can use EXPLAIN or PROFILE to see a plan of how the planner will execute the query, which you can use to see which node will be the starting node for the expansion.

If it's not what you expect, or if you need to use this in other scenarios and need to enforce which node is the starting node, you can use an index hint (provided you have created an index for the label/property) to force the planner to start there.

Just to double check if I understood your comment correctly Andrew. For simple queries it's sufficient to rely on the fact that variable path matching is done using DFS. So in our project we create paths of trains for example, which are sequences of "TrackingPoint"'s being connected with each other through a "NEXT" relation. So if we would want to query the list of Tracking Point nodes in order (any order, as long as it is consistent) in between 2 given Tracking Points, then following query would be sufficient?

MATCH p = (startTP:TrackingPoint{id: 1})-[:NEXT*]->(endTP:TrackingPoint{id: 2})
return nodes(p)

If I understood your explanation correctly it might only become a problem when we start combining different types of nodes, where different indexes on these node types might influence what is used by the query planner as a starting node? (so in our case this wouldn't present any issue?)

Many thanks for any additional explanation or confirmation. This type of querying is quite significant to the success of our project.

Thomas