cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Can't get neo4j to perform on undirected relationships

I've been working with neo4j 4.1 for a while now and whilst I feel that the graph structure should be a good fit for my problem, I can't get it to perform in any reasonable time.

I'll detail the model and problem below, but I'm wondering whether (a) graphs are just not a good fit or (b) I've modelled the problem incorrectly.

In my domain, I have two labels: Person and Skill. Each as an id attribute and there is an index on this attribute.

Skills are related to one another in a parent-child relationship, implying that one or more child skills belong to the parent skill, as follows:

(s:Skill)-[r:IS_IN_CAT]->(s2:Skill)

A Person is related to a Skill as follows:

(p:Person)-[r:HAS_SKILL]->(s:Skill)

This is illustrated as below:

The question I want to ask is, given a Person who has a skill, find me all paths to all other people via that skill.

In the diagram above, if Person A was the person, I'd expect 2 paths:
(Person A) - [HAS_SKILL] - (Skill 1-1-1) - [IS_IN_CAT] - (Skill 1-1) - [IS_IN_CAT] - (Skill 1-1-2) - [HAS_SKILL] - (Person B)
And
(Person A) - [HAS_SKILL] - (Skill 1-1-1) - [IS_IN_CAT] - (Skill 1-1) - [IS_IN_CAT] - (Skill 1) - [IS_IN_CAT] - (Skill 1-2) - [HAS_SKILL] - (Person C)

The way I'm asking this query is as follows.

MATCH (p:Person {id: 100}) - [h:HAS_SKILL] -> (s:Skill) - [r:IS_IN_CAT*..] - (s2:Skill) <- [h2:HAS_SKILL] - (p2:Person)

For any moderately sized graph (10,000 skills, 1000 people, 5 skills per person) this doesn't ever return.

I'm fairly sure it's the undirected nature of the [r:IS_IN_CAT*..] part of the query but I don't see how to re-model to make this perform any better.

Any help would be appreciated.

1 ACCEPTED SOLUTION

Thanks guys, a combo of both your answers got me to the solution, which uses directed queries and the idea of root skills, which seem to perform all around better.

The query now looks like:

MATCH (p:Person {id: 100}) - [h:HAS_SKILL] -> (s:Skill) - [r:IS_IN_CAT*..] -> (parent:Skill) <- [r:IS_IN_CAT*..] - (s2:Skill) <- [h2:HAS_SKILL] - (p2:Person)

The introduction of the parent made the difference since it allows the queries to remain directed.

The queries are now generally sub-second for very large graphs

View solution in original post

4 REPLIES 4

It looks like your Cypher should traverse every node and every relationship in the entire graph in order to return a path to every Person in the graph, because IS_IN_CAT relationship is specified as directionless and infinite, path finding across the entire (single?) skills tree.

I suspect you want something smaller? missing filters?

clem
Graph Steward

To clarify: Does your query work for small data but not with big data?

Then there is a performance issue.

Try running the query with the PROFILE and look at the plan. If the query doesn't filter out large enough nodes, then you could have a combinatorial explosion which will take a very long time and appear not to come back.

See:

And this specific example of using a subquery to speed things up.

I really like your project,

You know it more then me, it's your project, but:
First, what do you really want to know? Can you translate it in English.
The graph modelling doesn't seem to fit with the need, not directly.

--You could MATCH your roots skills before and add a WITH clause here--

MATCH (p1:Person {id: 100})-[]->(s:Skill)<-[]-(p2:Person) WHERE s IN (LIST OF ROOT SKILL)
It's up to you to define what's a root skill, but I guess this query might help.

Thanks guys, a combo of both your answers got me to the solution, which uses directed queries and the idea of root skills, which seem to perform all around better.

The query now looks like:

MATCH (p:Person {id: 100}) - [h:HAS_SKILL] -> (s:Skill) - [r:IS_IN_CAT*..] -> (parent:Skill) <- [r:IS_IN_CAT*..] - (s2:Skill) <- [h2:HAS_SKILL] - (p2:Person)

The introduction of the parent made the difference since it allows the queries to remain directed.

The queries are now generally sub-second for very large graphs