cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Max Number of relationships to a node - Best Modelling

We have a service that process a potential "match" between a user (Profile) and a Project (a paid opportunity)

In the graph we create relationship with a score property.

The number of relationships created to the :Project node could be 20k or more

Some stats about the data

  • 50k projects
  • 500k profiles (growing 1000 a day)

Right now we have the following model:

MATCH (p:Project{id:""})<-[r:MATCHES]-(pm:ProjectMatch)
MATCH (profile:Profile)-[:MATCH_PROJECT]->(pm:ProjectMatch)
RETURN profile
order by r.score desc

r contains the score between the project and the profile
ProjectMatch is a node created for each month and year for a specific profile

year: 2019
month: 8
profileId: ""

We've experienced slow queries i.e to get all matches ordered by score which made us rethink the model and to potentially simplify it to just:

MATCH (p:Project{id:""})<-[r:MATCHES]-(pm:ProjectMatch)
MATCH (profile:Profile)-[:MATCH_PROJECT]->(pm:ProjectMatch)
RETURN profile
order by r.score desc

I am finding the same number of dbhits or very similar between the 2 models. Any advice?

Which data model is "better" or is supposed to perform better? Is it scalable in the long run?

Query we run

  • Get all profiles returned by score desc
  • Get count of all matches
  • Get all profiles returned by score desc that haven't received an email
WHERE NOT ((profile)-[:HAS_EMAILS]->(:Emails)-[:SENT]->(:Email{projectId: ""}))
2 REPLIES 2

You may find introducing some of the elements from your ProjectMatch node into relationships from the Profile to ProjectMatch and incorporating them into your queries will help speed up your queries.

For example, let's say we take the properties year and month from ProjectMatch, and have something like:
(Profile)-[:MATCH_PROJECT_2019_11]->(ProjectMatch) and we use that specific relationship type, you can filter down the number of relationships that need to be traversed to get to ProjectMatch and Project. Of course this only works if you can be specific with dates, but it gives you an idea of how you can use more fine-grained relationship types to speed up queries.

I would recommend you have a look at the following for some modelling tips and tricks:

Thanks @lju

That wouldn't work for us for example:

What if you're on November 1st and you need matches from last month (a day ago)? You would need to dynamically generate the 2 relationships names

MATCH_PROJECT_2019_10|MATCH_PROJECT_2019_11

Does the current modelling make sense though?

Should we just keep the structure and filter ProjectMatch by year and month?

(Profile)-[:MATCH_PROJECT]->(pm:ProjectMatch{year: 2019})
WHERE pm.month = 10 OR pm.month = 11