cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Finding a variant on a reference pattern

First, kudos for calling out Maxime - He has been very helpful. He has helped me with this so far but he is out of pocket so I am calling for help.

2X_8_820f4dc4ec5560a213a5bb7e2ceb16b8e5f843b2.png

We have a series of paths in the db. In our example here, they are all a->b->c but there are a host of other PATHS that start with different letters. I don't care about those. in here, PATH1 (blue reference) SHOULD be identical to PATH2(red). They are both type 'a' patterns. But every once in a while, its not. The d->e pattern crops up. In this case, its attached to b, but it could have been attached to 'a' or 'c'. Having an isolated 'd' and/or an 'e' is not bad as long as they are not attached to each other.

My approach with Maxime help, to this problem was to grab the set of 'a' nodes, and walk the set to the end building a list. Then I used the APOC function to compare lists. Something like this

MATCH (s:node {name:"a"})-[*]->(n) WITH id(s) AS method_id, collect(n.name) AS names WITH collect(names) AS paths RETURN apoc.coll.disjunction(paths[0], paths[1]) AS output

But what I get back seems to be a simple list WITHOUT the connections. That works ok, but because d,e can occur as long as they are unconnected, its not quite what I want. What comes back must be usable in a second query.

The ask:

  1. I need to modify my cypher to pull out the whole pattern first
    (discovery of a variant based on a reference)
  2. Then, I need to
    write a cypher that uses the result of the first query to return PATHn,where the PATH is type 'a' and contains subgraph d->e

Background: We look at the PATHS that have anomalies (the d->e pattern being one such), do we need to be discover and research. In our case, PATH1 is the reference. The cypher above does an OK job of that. If they decide that the anomaly is not an issue (*f->g->h not shown was OK) then we have no further action. If it wasn't ok, then it needs to be turned around and all occurrences of that in each path has to be investigated.

If you want to try this yourself, this should produce the same graph

MERGE (:ProgNode:test {nodeSeq:0,name:'root'});
MERGE (:ProgNode:test {nodeSeq:1,name:'a'});
MERGE (:ProgNode:test {nodeSeq:2,name:'b'});
MERGE (:ProgNode:test {nodeSeq:3,name:'c'});

MERGE (:ProgNode:test {nodeSeq:4,name:'a'});
MERGE (:ProgNode:test {nodeSeq:5,name:'b'});
MERGE (:ProgNode:test {nodeSeq:6,name:'c'});
MERGE (:ProgNode:test {nodeSeq:7,name:'d'});
MERGE (:ProgNode:test {nodeSeq:8,name:'e'});

MERGE (:ProgNode:test {nodeSeq:9,name:'a'});
MERGE (:ProgNode:test {nodeSeq:10,name:'c'});
MERGE (:ProgNode:test {nodeSeq:11,name:'f'});
MERGE (:ProgNode:test {nodeSeq:12,name:'d'});
MERGE (:ProgNode:test {nodeSeq:13,name:'e'});

MATCH (a:ProgNode),(b:ProgNode) WITH a,b WHERE a.nodeSeq = 0 AND b.nodeSeq = 1 MERGE (a)-[r:REFERENCE]->(b);
MATCH (a:ProgNode),(b:ProgNode) WITH a,b WHERE a.nodeSeq = 1 AND b.nodeSeq = 2 MERGE (a)-[r:OWNS]->(b);
MATCH (a:ProgNode),(b:ProgNode) WITH a,b WHERE a.nodeSeq = 2 AND b.nodeSeq = 3 MERGE (a)-[r:OWNS]->(b);
MATCH (a:ProgNode),(b:ProgNode) WITH a,b WHERE a.nodeSeq = 0 AND b.nodeSeq = 4 MERGE (a)-[r:BAD]->(b);
MATCH (a:ProgNode),(b:ProgNode) WITH a,b WHERE a.nodeSeq = 4 AND b.nodeSeq = 5 MERGE (a)-[r:OWNS]->(b);
MATCH (a:ProgNode),(b:ProgNode) WITH a,b WHERE a.nodeSeq = 5 AND b.nodeSeq = 6 MERGE (a)-[r:OWNS]->(b);
MATCH (a:ProgNode),(b:ProgNode) WITH a,b WHERE a.nodeSeq = 5 AND b.nodeSeq = 7 MERGE (a)-[r:OWNS]->(b);
MATCH (a:ProgNode),(b:ProgNode) WITH a,b WHERE a.nodeSeq = 7 AND b.nodeSeq = 8 MERGE (a)-[r:OWNS]->(b);

MATCH (a:ProgNode),(b:ProgNode) WITH a,b WHERE a.nodeSeq = 0 AND b.nodeSeq = 9 MERGE (a)-[r:UNINTERESTING]->(b);
MATCH (a:ProgNode),(b:ProgNode) WITH a,b WHERE a.nodeSeq = 9 AND b.nodeSeq = 10 MERGE (a)-[r:OWNS]->(b);
MATCH (a:ProgNode),(b:ProgNode) WITH a,b WHERE a.nodeSeq = 10 AND b.nodeSeq = 11 MERGE (a)-[r:OWNS]->(b);
MATCH (a:ProgNode),(b:ProgNode) WITH a,b WHERE a.nodeSeq = 10 AND b.nodeSeq = 12 MERGE (a)-[r:OWNS]->(b);
MATCH (a:ProgNode),(b:ProgNode) WITH a,b WHERE a.nodeSeq = 11 AND b.nodeSeq = 13 MERGE (a)-[r:OWNS]->(b);

(and yes, I am aware that there is a faster way. I just grabbed the first one I found which had the where)

I included a "uninteresting" example just to reinforce the need. Apologies to the TV show "House"

Thanks

2 REPLIES 2

I think we'll need some clarification on what you want out of this.

Having an isolated 'd' and/or an 'e' is not bad as long as they are not attached to each other.

Do you mean that if red b was connected to green d, but green d wasn't connected to green e that it would be okay? Or did you mean that any path that deviates from the expected a, b, c isn't okay (the existence of some path forking off from any of the nodes)?

If d and e (or equivalent copies) were attached to the blue nodes at point blue b, such that the graphs of abc + de on both were isomorphic, would that be acceptable? Or are you only looking for isomorphic 2-d paths, and not more complex structures?

If you're only looking for isomorphic 2-d paths, would it be enough to check that all nodes in the path (excepting the last node) only have a single outgoing :OWNS relationship and no others?

So first, thank you for taking this on. I'd given up on this one. I do have an almost solution but an exact one would be good.

The d->e grouping is bad. Think of them as their own group, almost like a detour that took you to a route that didn't connect to your destination.

Do you mean that if red b was connected to green d , but green d wasn't connected to green e that it would be okay? Yes. The individual nodes in green are ok by themselves.

Or did you mean that any path that deviates from the expected a, b, c isn't okay (the existence of some path forking off from any of the nodes)? No, there may be LOTS of deviations from abc, as long as one of them isn't de, all is good.

If d and e (or equivalent copies) were attached to the blue nodes at point blue b , such that the graphs of abc + de on both were isomorphic, would that be acceptable? Or are you only looking for isomorphic 2-d paths, and not more complex structures? You are looking at an example and the real life version was more complex, just not helpful ! One 'de' equivalent pattern had 80 nodes. So in real life, we have 132 "detours" so far. But to your question, any combination of red:a->green:de, or red:b->green:de, or red:c-green:de are bad. Any OTHER combination (red:a->green:ed - Node pattern reversed) is good. (or at least not part of this run)

If you're only looking for isomorphic 2-d paths, would it be enough to check that all nodes in the path (excepting the last node) only have a single outgoing :OWNS relationship and no others? No. I wish it did but no.

It may help you to think of this as a set of stops on a route. We do this route every day. But if we see a situation where someone hops off the route and comes right back, we don't care. If they hop off and then go to 2-80 other stops, we have a concern. If they are finished delivery and come back, stopping at other places, we don't care. As long as the cargo gets where it was supposed to get in the order it was supposed to get there is "preserved" we are fine.