Neo4j

pierrick_ganon · ‎11-08-2019

We have a service that process a potential "match" between a user (Profile) and a Project (a paid opportunity)

In the graph we create relationship with a score property.

The number of relationships created to the :Project node could be 20k or more

Some stats about the data

50k projects
500k profiles (growing 1000 a day)

Right now we have the following model:

MATCH (p:Project{id:""})<-[r:MATCHES]-(pm:ProjectMatch)
MATCH (profile:Profile)-[:MATCH_PROJECT]->(pm:ProjectMatch)
RETURN profile
order by r.score desc

r contains the score between the project and the profile
ProjectMatch is a node created for each month and year for a specific profile

year: 2019
month: 8
profileId: ""

We've experienced slow queries i.e to get all matches ordered by score which made us rethink the model and to potentially simplify it to just:

MATCH (p:Project{id:""})<-[r:MATCHES]-(pm:ProjectMatch)
MATCH (profile:Profile)-[:MATCH_PROJECT]->(pm:ProjectMatch)
RETURN profile
order by r.score desc

I am finding the same number of dbhits or very similar between the 2 models. Any advice?

Which data model is "better" or is supposed to perform better? Is it scalable in the long run?

Query we run

Get all profiles returned by score desc
Get count of all matches
Get all profiles returned by score desc that haven't received an email

WHERE NOT ((profile)-[:HAS_EMAILS]->(:Emails)-[:SENT]->(:Email{projectId: ""}))

lju · ‎11-11-2019

You may find introducing some of the elements from your ProjectMatch node into relationships from the Profile to ProjectMatch and incorporating them into your queries will help speed up your queries.

For example, let's say we take the properties year and month from ProjectMatch, and have something like:
(Profile)-[:MATCH_PROJECT_2019_11]->(ProjectMatch) and we use that specific relationship type, you can filter down the number of relationships that need to be traversed to get to ProjectMatch and Project. Of course this only works if you can be specific with dates, but it gives you an idea of how you can use more fine-grained relationship types to speed up queries.

I would recommend you have a look at the following for some modelling tips and tricks:

https://maxdemarzi.com/2015/08/26/modeling-airline-flights-in-neo4j/

pierrick_ganon · ‎11-12-2019

Thanks @lju

That wouldn't work for us for example:

What if you're on November 1st and you need matches from last month (a day ago)? You would need to dynamically generate the 2 relationships names

MATCH_PROJECT_2019_10|MATCH_PROJECT_2019_11

Does the current modelling make sense though?

Should we just keep the structure and filter ProjectMatch by year and month?

(Profile)-[:MATCH_PROJECT]->(pm:ProjectMatch{year: 2019})
WHERE pm.month = 10 OR pm.month = 11

Neo4j

Max Number of relationships to a node - Best Modelling