Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
06-07-2022 10:13 AM
I have a use case where I have a specific node as a starting point, and I want to find the closest nodes (with a specific label), when only traversing specific relationships in specified direction. I've made some progress but would appreciate any suggestions on a more elegant approach.
The diagram below shows an example data set. This is just a subset of the overall model and there are many other relationships and nodes/labels.
The data model is basically document meta data that consists of a (:DocVersion), which [;CONTAINS] a hierarchy of (:Clause). The (:Applicability) is meta data that can be associated with (:DocVersion) or (:Clause)
Starting at any Clause, I want to return the closest Applicability node. The closest can be directly related to the starting Clause, or it could be found by traversing up the hierarchy like (c0:Clause)<-[:CONTAINS]-(c1:Clause)<-[:CONTAINS]-(:DocVersion) hierarchy.
Some examples of the starting point and expected output:
Starting Clause | Expected Output |
D1V1 C1 | D1V1 A1a, D1V1 A2b |
D1V1 C2 | D1V1 A2 |
D1V1 C1.1 | D1V1 A1a, D1V1 A2b |
D1V1 C1.2.1.1 | D1V1 A5a, D1V1A5b |
Sample Data
This script will create the sample data above.
Currently I have 2 approaches that get close to what I'm looking for. They both return a superset of what I'm wanting.
Method 1: plain cypher
This returns ALL the possible (:Applicability) and the number of relationships. What I want is just the ones that have the shortest distance.
Method 2: using apoc
This returns a set of paths and lengths which would then need to be parsed to see which path includes (:Applicability), and which are the closest.
Solved! Go to Solution.
06-07-2022 12:16 PM - edited 06-07-2022 12:17 PM
Try this modification to your cypher approach:
WITH "D1V1 C1.2.1" as Cname
MATCH (c:Clause {name:Cname})
MATCH p=(a:Applicability)<-[:APPLIES_TO]-(n)-[:CONTAINS*0..10]->(c)
WITH collect({a:a, length: length(p)}) as paths, min(length(p)) as minLength
unwind [i in paths where i.length = minLength] as result
return result.a as a, result.length as length
06-07-2022 12:16 PM - edited 06-07-2022 12:17 PM
Try this modification to your cypher approach:
WITH "D1V1 C1.2.1" as Cname
MATCH (c:Clause {name:Cname})
MATCH p=(a:Applicability)<-[:APPLIES_TO]-(n)-[:CONTAINS*0..10]->(c)
WITH collect({a:a, length: length(p)}) as paths, min(length(p)) as minLength
unwind [i in paths where i.length = minLength] as result
return result.a as a, result.length as length
06-07-2022 01:35 PM
Thanks! This does seem to provide the expected output, at least for a single starting point. I'll be curious to see how performant it is with a larger data set. My concern is the step that gets all the paths (which could be hundreds), then finds the minLength.
06-07-2022 02:22 PM
Give it a try to see. I am not sure it will be too bad. You have a defined anchor node to start with; make sure there is an index on the search property. Also, your graph looks like a tree that does not have crossovers and any looping, so the expansion should not be too bad. Let us know how it works.
06-07-2022 01:25 PM
Try this:
06-07-2022 01:53 PM
Thanks. I think the issue with this approach is the maxLevel will be different depending on the starting point.
For example in this slightly modified example, the maxLevel needs to be 3.
match (a:Clause) where a.name = "D1V1 C1.1"
CALL apoc.path.spanningTree(a,{labelFilter:'Clause|DocVersion|Applicability', relationshipFilter:'<CONTAINS|APPLIES_TO>',maxLevel:3}) YIELD path
return path
But using 3 for a different start node results in more Applicability nodes than expected.
06-07-2022 02:24 PM
06-07-2022 02:28 PM
Try without having any maxLevel. This should give your expected results.
All the sessions of the conference are now available online