cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Writing out performant queries for n-relationship queries on a single node where n > 2

Not a bug, just couldn't find documentation on this.

Starting with a known reference point: 1 node with 2 connections to 2 different nodes:

MATCH (associated_node1)-[:REL1]-(start_node {name: "Start"})-[:REL2]-(associated_node2) RETURN *
Make sense, and it's easy.

Now let's say a node with 5 connections to 5 different nodes, each relationship with a different relationship type.

Do we:

MATCH (associated_node1)-[:REL1]-(start_node {name: "Start"})-[:REL2]-(associated_node2), (associated_node3)-[:REL3]-(start_node)-[:REL4]-(associated_node4), (associated_node5)-[:REL5]-(start_node)
RETURN *
MATCH (associated_node1)-[:REL1]-(start_node {name: "Start"})-[:REL2]-(associated_node2)
MATCH (associated_node3)-[:REL3]-(start_node)-[:REL4]-(associated_node4)
MATCH (associated_node5)-[:REL5]-(start_node)
RETURN *
  1. ?
20 REPLIES 20

I am interpreting the question to be "how do get query from a node and get all the nodes it has relationships to without defining each relationship". Created a model that I think is what you're looking for. It connects five different nodes to a single node with five different relationships. The Cypher query pattern here is to anchor off the start node and not define relationship(s). The model has all nodes with the label Node (you should use a label in a query when you can in almost all cases):

2X_0_0282383831b8ea2e215a19473d9cd1bb416174fd.jpeg

CREATE 
  (`0` :Node {name:'start'}) ,
  (`1` :Node {name:"associated_node1"}) ,
  (`2` :Node {name:"associated_node2"}) ,
  (`3` :Node {name:"associated_node3"}) ,
  (`4` :Node {name:"associated_node4"}) ,
  (`5` :Node {name:"associated_node5"}) ,
  (`0`)-[:`REL1` ]->(`1`),
  (`0`)-[:`REL2` ]->(`2`),
  (`0`)-[:`REL3` ]->(`3`),
  (`0`)-[:`REL4` ]->(`4`),
  (`0`)-[:`REL5` ]->(`5`)

The image and rudimentary create code was built using the "arrows" tool found at: http://www.apcjones.com/arrows/

The simplest query would be to start with the node that has the name property value 'start', and get all nodes via any relationship to any other node with the label :Node:

MATCH path=(:Node {name: 'start'})-->(:Node) RETURN path

A more explicit approach would be to define how many relationship 'hops' you want the query to traverse from the start node. In this case there's only one level of relationships:

MATCH path=(:Node {name: 'start'})-[*..1]->(:Node) RETURN path

Or, get anything to any depth related to the node with label :Node with the property name:'start'

MATCH path=(:Node {name: 'start'})-[*]->() RETURN path

As always, you should PROFILE or EXPLAIN your queries to make sure they're efficient, using indexes where it makes sense to have them, etc.

@dan.flavin I appreciate you putting in the work to flesh out the answer above, but that is not my question.

The question is about syntax, specifically about the “best” (aka most performant) way to write out the query.

Say we’re looking for all nodes connected via REL1, REL3, REL4, REL5

How should we write that out in cypher?
How about if we’re looking for all but 1 label?

ameyasoft
Graph Maven

Filter the relationship:

MATCH (n:Node {name:"start"})-[r]-(d)
WHERE type(r) <> "REL2"
RETURN n, d;

Result:
2X_2_2a70cd111de9800f72cebd3c132421b5fd853c78.png

To get all:

MATCH (n:Node {name:"start"})-[r]-(d)
//WHERE type(r) <> "REL2"
RETURN n, d;

Result:
2X_f_f5b3c5480173d9aee994fd90c4ac15b506c0eb78.png

@ameyasoft That does not seem performant or best-practice.

How would you structure that if there were 10 labels and you wanted 3 of them, or 8?
What if there were 50 connected nodes each of a different type? (yes, I know: reassess the graph structure. 50 is only there as a thought experiment)

As stated above, try something similar to:

MATCH (n)-[r]-(m) 
WHERE type(r) IN ['REL1', 'REL2',  'REL4']
return n,r,m

"Best practice" is in the details since Neo4j graph traversal is very efficient at its core. How any one query behaves can be quantified by looking at the PROFILE output. How it works in your environment is a matter of query complexity, graph structure, and graph size as a function of hardware capabilities. What might not be "best practice" may work just fine in "reality".

But since you brought it up, it is not generally a good idea to anchor a query based on relationships. For example, the original example query in the post had no node labels. That will force a complete scan through the node store because any node has the potential to be part of the result set. You will see a AllNodesScan in the PROFILE output. Add a node label and you'll see the first step being NodeByLabelScan. Much better. Add a property with an index and even better. Note that the main reason for indexes is to find the starting nodes for a query.

It might help to understand the context of the query pattern. It is often questioned if you're starting with a relationship as the anchor for a query pattern, then could it be better served as a property (e.g. think something that modifies the noun that defines a label)?

I apologize: I don’t seem to be communicating my request well, and may have clouded the request with specifics.

I suspect that the real answer is going to be found by testing each possible syntax. Maybe Neo4j already optimizes and my question is trival, I’m not sure.

I’ve have only seen official documentation about writing cypher for 3-node paths, and none about finding more complicated paths. Looking to fill in the gaps in my knowledge while avoiding bad habits.

See above:

Unbounded graph traversals are just part of Cypher. But always be mindful of your query pattern. Graphs are highly connected and you can find it easy to inadvertently return a large amount of data.

No, unbounded traversals are not what I’m asking about.

The answer will:

  • Be Cypher
  • Relate to at minimum 4 nodes
  • Have 4 node indicators “()”
  • Not be a simple chain ((a)->(b)->(c)->(d))
  • Instead have a way to indicate this query:
(a)->(b)<-(c)
      ^-(d)

or this query:

      v-(e)
(a)->(b)<-(c)
      ^-(d)

Hope this clarifies.
With reapect,
Josh

If I'm reading your question right, you have a starting node in the middle, and surrounding nodes connected by a single relationship each, with the type of that relationship being different on each.

If that's the case, you can use | in your relationship pattern to separate the relationships you're interested in. It works as an OR in this case, and is documented here.

An example query

MATCH (start_node:Node {name: "Start"})-[:REL1 | REL2 | REL3 | REL4]-(associatedNode)
RETURN associatedNode

If you only wanted associatedNodes of a certain subset of types, then you could use something like WHERE associatedNode:Person OR associatedNode:Place OR associatedNode:Thing after the match.

Ahhhh, excellent! Thank you @andrew.bowman; that is one piece of the puzzle.

Does this same syntax apply to nodes?
Like: MATCH (a:Person)-[]-(b:Location | c:Time)?

It does not. You'll need to handle that in your WHERE clause, either through the example I provided, where you know the labels ahead of time and can hardcode the check into the query, or with a dynamic check such as something like this:

...
WHERE any(rel in ['Person', 'Place', 'Thing'] WHERE rel in labels(associatedNode))
...

The list of course could be supplied as a parameter to the query. In most cases though you will likely know the label or labels of the node you want and can hardcode them into the query.

All those query patterns are not a problem with Cypher. I would offer that you create a simple test graph and queries and post your questions based on what you've observed with the cypher code. That will give us a common base to move forward with.

coolyrat
Node Clone

I have a similar problem that needs help with. I want all the related, distinct Users started from the uid=start User.
2X_4_4577c6a1fcd71379452760245990f2cce9423f34.png

I started with a query like:

MATCH (u:User {uid: $uid}),
(u)-[*1..3]-(t:User)
RETURN count(DISTINCT t)

It ends up a huge number that doesn't match.

So I narrow it down into 3 queries:

// a match the 1 hop, the two way FOLLOWING relationships
MATCH (u:User {uid: $uid})--(t:User)
return count(distinct t)
// b match the 2 hops, start User's to Post relationships
MATCH (u:User {uid: $uid})-->(:Post)<--(t:User)
return count(distinct t)
// c match the 3 hops, the chat post relationships
MATCH (u:User {uid: $uid})-->(:Chat)<--(:Post)<--(t:User)
return count(distinct t)

Then I don't know how to get distinct count with a,b,c's target users.

Your original query using count(DISTINCT t) looks correct to me. Are you certain this isn't what's in your graph? Are you certain you only have one node that has the given uid?

Can you get a PROFILE plan of the query, with all elements of the plan expanded?

Thank you for your reply! @andrew.bowman

I execute the query below:

MATCH (u:User {uid: $uid})--(t:User)
return count(distinct t)
UNION
MATCH (u:User {uid: $uid})-->(:Post)<--(t:User)
return count(distinct t)
UNION
MATCH (u:User {uid: $uid})-->(:Chat)<--(:Post)<--(t:User)
return count(distinct t)

Outputs:
╒═══════════════════╕
│"count(distinct t)"│
╞═══════════════════╡
│11                 │
├───────────────────┤
│807                │
├───────────────────┤
│0                  │
└───────────────────┘

By Adding three counts, it's a reasonable total related users counts. There might have maximum 818 related users if query a and query b don't have any duplicate users.

But I execute the query, i got a huge total.

MATCH (u:User {uid: $uid}), (u)-[*1..3]-(t:User) RETURN count(DISTINCT t)

Outputs:
╒═══════════════════╕
│"count(DISTINCT t)"│
╞═══════════════════╡
│35860              │
└───────────────────┘

And the Profile output

For your UNION query, try removing the directions of the relationships, as well as the labels on the middle nodes of the path, check the count results for that. More than likely the smaller counts you saw there were because the labels and relationship directions helped filter out quite a few paths to reachable :User nodes.

MATCH (u:User {uid: $uid})--(t:User)
return count(distinct t)
UNION
MATCH (u:User {uid: $uid})--(:Post)--(t:User)
return count(distinct t)
UNION
MATCH (u:User {uid: $uid})--(:Chat)--(:Post)--(t:User)
return count(distinct t)

I removed all the directions, and the result are the same.

╒═══════════════════╕
│"count(distinct t)"│
╞═══════════════════╡
│11                 │
├───────────────────┤
│807                │
├───────────────────┤
│0                  │
└───────────────────┘

Is it possible to combine 3 t:User from each query?

Can you try again removing the labels from the middle nodes of the path?

MATCH (u:User {uid: $uid})--(t:User)
return count(distinct t)
UNION
MATCH (u:User {uid: $uid})--()--(t:User)
return count(distinct t)
UNION
MATCH (u:User {uid: $uid})--()--()--(t:User)
return count(distinct t)

That should better match your variable-length query, since that didn't have any restrictions on the types of nodes present in the expanded path. My guess is the counts you see there should be much greater, and lend some validity to the count of distinct nodes you saw from the variable-length query.

yes, the counts are much greater. I am getting confuse, from the thrid query, I am not sure why the count get so huge. Does empty () match any labels? If that's the case, there might be a path like (u)--(:User)-(:User)-(t:User), since users are directional. And that's not I am looking for.

╒═══════════════════╕
│"count(distinct t)"│
╞═══════════════════╡
│11                 │
├───────────────────┤
│6573               │
├───────────────────┤
│52355              │
└───────────────────┘

So are you looking for a query that finds a user within 3 hops, but doesn't continue expansion once it encounters a user node? In that case you may need to add a predicate to ensure you filter out such results:

MATCH path = (u:User {uid: $uid})-[*1..3]-(t:User) 
WHERE single(node in tail(nodes(path)) WHERE node:User)
RETURN count(DISTINCT t)