Neo4j

depire · ‎09-01-2020

Dear all,

I have a problem that I don't understand.

Code:
CREATE (a:LEU) SET a.LEID = "A" RETURN a
CREATE (b:LEU) SET b.LEID = "B" RETURN b
CREATE (c:LEU) SET c.LEID = "C" RETURN c
CREATE (d:LEU) SET d.LEID = "D" RETURN d
CREATE (e:LEU) SET e.LEID = "E" RETURN e

MATCH (parent:LEU {LEID:"A"})
MATCH (child:LEU {LEID:"B"})
MERGE (parent)-[l:REL {share:toFloat(100.0)}]->(child) RETURN parent.LEID, child.LEID, l.share
MATCH (parent:LEU {LEID:"A"})
MATCH (child:LEU {LEID:"E"})
MERGE (parent)-[l:REL {share:toFloat(100.0)}]->(child) RETURN parent.LEID, child.LEID, l.share
MATCH (parent:LEU {LEID:"A"})
MATCH (child:LEU {LEID:"C"})
MERGE (parent)-[l:REL {share:toFloat(100.0)}]->(child) RETURN parent.LEID, child.LEID, l.share
MATCH (parent:LEU {LEID:"B"})
MATCH (child:LEU {LEID:"C"})
MERGE (parent)-[l:REL {share:toFloat(100.0)}]->(child) RETURN parent.LEID, child.LEID, l.share
MATCH (parent:LEU {LEID:"B"})
MATCH (child:LEU {LEID:"D"})
MERGE (parent)-[l:REL {share:toFloat(100.0)}]->(child) RETURN parent.LEID, child.LEID, l.share

I want to find the common parent of B, C D and E

My code is
MATCH
p1=(l0:LEU)-[:REL0..]->(:LEU{LEID:"A"}),
p2=(l0:LEU)-[:REL0..]->(:LEU{LEID:"B"}),
p3=(l0:LEU)-[:REL0..]->(:LEU{LEID:"C"}),
p4=(l0:LEU)-[:REL0..]->(:LEU{LEID:"D"}),
p5=(l0:LEU)-[:REL*0..]->(:LEU{LEID:"E"})
RETURN DISTINCT l0.LEID

which returns nothing; it must return "A"

If I run the following code (I delete the clause on B); it works
MATCH
p1=(l0:LEU)-[:REL0..]->(:LEU{LEID:"A"}),
p3=(l0:LEU)-[:REL0..]->(:LEU{LEID:"C"}),
p4=(l0:LEU)-[:REL0..]->(:LEU{LEID:"D"}),
p5=(l0:LEU)-[:REL0..]->(:LEU{LEID:"E"})
RETURN DISTINCT l0.LEID

Could you help me ?
Best regards,
Alexandre

andrew_bowman · ‎09-01-2020

The reason why you're seeing this behavior is because of the relationship isomorphism behavior that applies to Cypher MATCH clauses (and for any pattern in Cypher).

https://neo4j.com/docs/cypher-manual/current/introduction/uniqueness/

Because you're using only a single MATCH in your query for l0, even though the patterns are comma-separated, relationship isomorphism applies: the same relationship cannot be used more than once to fulfill the paths that match the pattern.

When the line for p2 is removed, then for every node (A, C, D, E), there are paths to reach those nodes, and the relationships used for those paths are only each used once.

But once we need to match to B as well, we can't fulfill that. The paths to B and C must both use the relationship between A and B. Since relationships cannot be used more than once to fulfill the MATCH paths, no viable paths are found.

If instead of using a single MATCH we broke each one up into its own MATCH, then the relationship isomorphism behavior no longer applies between the paths:

MATCH p1=(l0:LEU)-[:REL*0..]->(:LEU{LEID:"A"})
MATCH p2=(l0:LEU)-[:REL*0..]->(:LEU{LEID:"B"})
MATCH p3=(l0:LEU)-[:REL*0..]->(:LEU{LEID:"C"})
MATCH p4=(l0:LEU)-[:REL*0..]->(:LEU{LEID:"D"})
MATCH p5=(l0:LEU)-[:REL*0..]->(:LEU{LEID:"E"})
RETURN DISTINCT l0.LEID

With that out of the way, we can address the issue of first common parent. Your test here isn't sufficient for it, because with this sample graph there is only a single common parent. Additionally you have nodes that have multiple parents, which complicates things. In a pure tree structure (where one node only has a single parent) this is an easy query: For each node in consideration, find and collect its parents...then intersect all collections to find common parents. Then from any node, MATCH backward until you hit the first common parent LIMIT 1.

But when nodes have multiple parents like this, then the approach isn't so simple.

If we were only looking at nodes D and C, their common parents are A and B. From C's perspective, both are 1-hop distant, but to D only B is 1-hop distant. It's not enough to gather common parents and start from any node and take the first encountered, you have to do more work...and then maybe take an average of hops distant to each common node or use some other formula.

Neo4j

First common parent