Neo4j

F_Bastiat · ‎02-01-2021

Hi, I have a tiny database that contains dependencies between a SQL DB objects. I have three types of nodes: all nodes have the 'objects' label, then they are split into 'procedures' and 'tables' labels. We are only interested in the 'objects' label. Type of relationships also do not matter, only their direction.

All the nodes have a 'schema' property which we need for this query.

What I want Cypher to return are all the nodes from a certain schema and all their optional ancestors, regardless of the number of hops. It must include isolated nodes from the selected schema, as well.

In other words, starting from each node with the specified value for the 'schema' property, it must recursively traverse upwards against the relationships' direction, until it reaches nodes without inbound relationships (roots).

Seems to be quite a basic pattern, but for some reason I can't find similar posts with a good solution.

Per my example, I have 70 nodes in the selected schema, and the result set should contain 155 nodes.

I tried so far every pattern that seemed reasonable, the only one that returns at least somewhat close results to the correct ones was something like this:

MATCH p=(o:objects{schema:'val'})<-[*0..6]-(a:objects)
WITH *, RELATIONSHIPS(p) AS rels
RETURN o, rels, a;

This gives a graph with 150 nodes, instead of the expected 155. If I increase the depth, the number of nodes returned actually starts to decrease (how can it decrease at larger depths I do not understand).

While something like this:

MATCH p=(o:objects{schema:'val'})<-[*0..]-(a:objects)
WITH *, RELATIONSHIPS(p) AS rels
RETURN o, rels, a;

returns only 19 nodes. The only thing that I changed is relationship depth from [*0..6] to [*0..]. Seems that what it returns is just one path related to the first node from the subset, as per the input file (and id/index order, I guess). So how do I make it iterate over all 70ish paths and combine them into one ? Seems I need to use FOREACH or UNWIND somehow, but the docs are too basic and brief and do not cover any real life scenarios.

This variant returns the same result as above, with only one path.

MATCH (o:objects{schema:$param})
WITH COLLECT(o) as obj
MATCH p=(a:objects)-[*0..]->(c:objects)
WHERE c in obj
RETURN p

And another unsuccessful try that again returns just the first path as per first node of the input subset:

MATCH p=(o:objects{schema:$param})
WITH COLLECT(o) as obj
UNWIND obj as object
MATCH r=(object)<-[*]-()
RETURN r;

What I get is this:

What I need to get is this (obtained by starting with schema nodes and manually expanding relationships in Bloom), nodes colors are the various schemas, the pink one is my active schema in this case:

I use Neo4j Desktop 1.4.1 with a 4.2.3 database on a 64GB ThinkPad with performance configs as per memrec (about 25gb for heap and 28gb for pagecache). I run all of the above from Bloom search phrases as the Browser is not even able to return a result for most of these scripts, it just runs and runs forever (which seems ridiculous for a 239 nodes and 469 relationships database).

Btw, I had no issues implementing a similar pattern, when there is just one starting (anchor) node:

MATCH r=(p:objects {name: $param1, schema: $param2})-->(c)
WITH c 
MATCH v=(c)<-[*]-()
RETURN v;

Here I go one hop down to grab the children of my input node, then recursively up to grab all the ancestors. Of course, here I have at most one connected graph.

andrew_bowman · ‎02-01-2021

The queries look reasonable, you might try checking against the Neo4j Browser output if you suspect that Bloom isn't showing you all results. The shrinking result set is particularly troubling.

What exactly do you want from the output? Paths? Or just the nodes? If it's paths you want, you can't really combine them (unless you're just collecting them). If it's nodes you want, then we have some options for path finding while keeping only distinct nodes.

F_Bastiat · ‎02-02-2021

Yes, I would want the paths, that is a subset of the whole graph.

While drafting the question I realized that the small result set (the 19 nodes as per image above) is just one path as related to the first node that comes from the first input (MATCH), I checked against the input data and that is the first node in my table from that particular schema. So it kind of makes sense. The anomaly is the shrinking result set when increasing the depth, but that is just a corner case. I was actually able to find through trial and error appropriate depths for a couple of the schemas, so I can query them directly, but for the others there is no appropriate depth that would yield the correct results, so I don't have a universal solution yet.

Maybe this pattern is not that trivial or common in the graph world. I am coming from the SQL world and seemed strange not to be able to implement something like that.

I can live without a solution to this problem, was more of a proof of concept to test the possibilities of the technology, need a better understanding before I start promoting it internally and especially, externally.

Neo4j

Find all ancestors of a subset of nodes