Neo4j

leelandclay · ‎06-20-2020

I'm building out a dating app that has a two-way distance restriction. I'm also adding in the capability to limit people to your country/state, ignoring the distance.

I've written the query, and tbh, the performance is horrible. I know what the problem is, I currently have to do 3 different scans of of the database looking for profiles that have specific distance/country/state properties. I know that the engine has to read the properties of the profile for each of the 3 sub queries that I'm doing in order to capture all possible matches. I also know that I need to change my model.

I currently have a Person node with (among others) properties for distance, country, state, city, zipCode and location. Location is a Point object. For the distance property, it can be either 10,30,50,100,250,500,999,1000,1001. The first 6 entries are mi/km options, 999 is Anywhere, 1000 is State only and 1001 is Country only.

Some use cases to make it more understandable are these:
Submitter requests 999 (Anywhere) They should see everyone that has anywhere as their entry, as well as those that the distance between the two are within the other person's distance setting.

Submitter requests 1000 (Only my State). They should only see people that are within their state, but not those that the other person's limits would prevent them. Such as they are 100 miles apart and the other person has a 10 mile limit.

So, my question is which way would provide the greatest performance for a query?
A) Create a single node for each of the distance items and then a relationship from every Person object to the matching distance node.
B) Create a Distance node for each Profile and connect it to the Person Node
C) Add a label to each Person node that designates the distance
D) Something else?????

spsingh559 · ‎06-21-2020

Hi Clay, can you share your graph schema to visually see your data model.
call db.schema.visualization

spsingh559 · ‎06-21-2020

I suppose with this data model

create (:Person{name:'PersonA', limit:10, location:point({latitude:24.680541,longitude: 46.653328, crs: 'WGS-84'}), isAnywhere:true, isState:true,isCountry:true, state:'StateX', country:'countryY'})
create (:Person{name:'PersonB', limit:10, location:point({latitude:24.637605,longitude: 46.609432, crs: 'WGS-84'}),isAnywhere:true, isState:true,isCountry:true, state:'StateX', country:'countryY'})
create (:Person{name:'PersonC', limit:10, location:point({latitude:24.677146,longitude: 46.655614, crs: 'WGS-84'}),isAnywhere:true, isState:true,isCountry:true, state:'StateX', country:'countryY'})
create (:Person{name:'PersonD', limit:30, location:point({latitude:24.677542,longitude:44.511428, crs: 'WGS-84'}), isAnywhere:true, isState:true,isCountry:true, state:'StateX', country:'countryY'})

Solution

//Submitter PersonA requests 999 (Anywhere)

match(p:Person) where p.name="PersonA"
with p
match(a:Person) where a.isAnywhere=true and distance (a.location,p.location)/1000<a.limit and a.name<>p.name
return a

Query Explanation
Submitter is PersonA which will find all other person node (a), whose isAnywhere (999) property is true, and distance between PersonA and other person node (a) is less that a's set limit. So PersonB and PersonC fall in that criteria where D is excluded because distance between PeronA and PersonD is 216 Km, more than D's limit i.e 30km

Answer for your queries:
A) distance/limit can be property of node, no need to create as extra node
B) separate the distance value from (999,1000,1001) i.e anywhere, only state and country. see the above person Node in create command
C)No need to make it complex
D) cacluate the distance at run time between two location object for checking each other location limits.

Note : If there is no node except person node, I think neo4j would not be the right choice, you can achieve same thing with other no SQL databases,

Neo4j

Query Optimization Help