Neo4j

leelandclay · ‎12-17-2019

I've been working at this for a while and I've got it down to the best I can do. Before I push it out to a server, I'd like to get some input on possible ways to lower a couple of the high db hits.

I have indexes on all properties that are used in the queries. I am currently planning on converting the lastSeen, signUpStep, minAge and maxAge properties to be nodes, however it will take some time before I can get that completed.

Here's my query. The hardcoded numbers will be converted to parameters that will be passed in.

PROFILE WITH 'test123' as _userId, 48280.2 as _distanceMeters,
date.truncate('day', date() - duration({years: 18})) as ageMax, date.truncate('day', date() - duration({years: 40})) as ageMin,
point({ latitude: 30.4332061, longitude: -97.60057859999999}) as myLocation

MATCH (me:Person {userId: _userId})
MATCH (p:Person)
MATCH (p)-[:ISAGENDER]->(g:Gender)<-[:ISLOOKINGFORGENDER]-(me)
MATCH (p)-[:ISLOOKINGFORGENDER]->(lfg:Gender)<-[:ISAGENDER]-(me)
MATCH (p)-[:ISLOOKINGFORCONNECTIONTYPE]->(c:ConnectionType)<-[:ISLOOKINGFORCONNECTIONTYPE]-(me)

WHERE g.key IN [10] AND lfg.key IN [20] AND c.key IN [10,20,30,40,50]

WITH DISTINCT p, ageMin, ageMax, _distanceMeters, me, g, lfg, c, myLocation
WHERE NOT (p:Test) AND SIZE( (p)-[:DEACTIVATED]->() ) = 0 
AND SIZE( (p)-[:BLOCK|BLOCKED|HAS_HIDDEN|ISAPARTNER|LIKES]->(me) ) = 0

AND distance(myLocation, p.location) < _distanceMeters 
AND (CASE WHEN p.distance = 999 THEN 40072000 ELSE p.distance * 1609.34 END) > distance(myLocation, p.location)


WITH DISTINCT p, g, lfg, c, ageMin, ageMax, me
WHERE p <> me AND p.userId IS NOT NULL AND p.lastSeen IS NOT NULL AND p.signUpStep = 99 AND p.minAge IS NOT NULL AND p.maxAge IS NOT NULL

AND date.truncate('day', date() - duration({years: p.maxAge})) <= date(me.dateOfBirth) <= date.truncate('day', date() - duration({years: p.minAge}))
AND ageMin <= date(p.dateOfBirth) <= ageMax


OPTIONAL MATCH (p)-[:SPEAKS]->(l:Language)
OPTIONAL MATCH (p)-[:IS_A_FOLLOWER_OF]->(r:Religion)
OPTIONAL MATCH (p)-[:ISPOLYAMTYPE]->(pt:PolyAmType)

RETURN distinct p, collect(DISTINCT g.key) as gender, collect(DISTINCT l.key) as languages, collect (DISTINCT r.key) as religions,
collect(DISTINCT lfg.key) as lookingForGender, collect(DISTINCT c.key) as connectionType, collect(DISTINCT pt.key) as polyamtypes,
datetime({epochmillis: p.lastSeen }).month as month, 
apoc.util.md5([id(p), 0]) as hash 

ORDER BY month DESC, hash 

SKIP 0 LIMIT 10

julie_fisher · ‎12-31-2019

Query optimization is a bit of an art. Cypher is designed to move quickly along traversals, but it has to slow down to open properties on the nodes when checking conditionals.

Based on that, I'd suggest trying to make the first match pattern as specific as possible. So something like:

MATCH (me:Person {userId: _userId})-[:ISLOOKINGFORGENDER]-(g:Gender)-[:ISAGENDER]-(p:Person)-[:ISLOOKINGFORGENDER]-(lfg:Gender)-[:ISAGENDER]-(:Person {userId: _userId})

Then play with where you place the conditionals to get the best optimization. You may get better results if you do the basic pattern match on the gender or the connection type, return the results using WITH, then limit the conditionals. Or you may get better results putting the conditionals in the initial match statement.

I was thinking something like this:

PROFILE WITH 'test123' as _userId, 48280.2 as _distanceMeters,
date.truncate('day', date() - duration({years: 18})) as ageMax, date.truncate('day', date() - duration({years: 40})) as ageMin,
point({ latitude: 30.4332061, longitude: -97.60057859999999}) as myLocation

MATCH (me:Person {userId: _userId})-[:ISLOOKINGFORGENDER]->(g:Gender)<-[:ISAGENDER]-(p:Person)-[:ISLOOKINGFORGENDER]->(lfg:Gender)<-[:ISAGENDER]-(:Person {userId: _userId})
WHERE NOT (p:Test)
AND p <> me
AND SIZE( (p)-[:DEACTIVATED]->() ) = 0
AND SIZE( (p)-[:BLOCK|BLOCKED|HAS_HIDDEN|ISAPARTNER|LIKES]->(me) ) = 0
AND p.userId IS NOT NULL
AND p.lastSeen IS NOT NULL
AND p.signUpStep = 99
AND p.minAge IS NOT NULL
AND p.maxAge IS NOT NULL
AND g.key IN [10] AND lfg.key IN [20]

WITH me, p, ageMin, ageMax, myLocation, _distanceMeters
MATCH (me)-[:ISLOOKINGFORCONNECTIONTYPE]->(c:ConnectionType)<-[:ISLOOKINGFORCONNECTIONTYPE]-(p)
WHERE c.key IN [10,20,30,40,50]
AND distance(myLocation, p.location) < _distanceMeters
AND (CASE WHEN p.distance = 999 THEN 40072000 ELSE p.distance * 1609.34 END) > distance(myLocation, p.location)
AND date.truncate('day', date() - duration({years: p.maxAge})) <= date(me.dateOfBirth) <= date.truncate('day', date() - duration({years: p.minAge}))
AND ageMin <= date(p.dateOfBirth) <= ageMax

WITH me, p, c

OPTIONAL MATCH (p)-[:SPEAKS]->(l:Language)
OPTIONAL MATCH (p)-[:IS_A_FOLLOWER_OF]->(r:Religion)
OPTIONAL MATCH (p)-[:ISPOLYAMTYPE]->(pt:PolyAmType)
OPTIONAL MATCH (p)-[:ISLOOKINGFORGENDER]->(lfg:Gender)
OPTIONAL MATCH (p)-[:ISAGENDER]->(g:Gender)

RETURN distinct p, collect(DISTINCT g.key) as gender, collect(DISTINCT l.key) as languages, collect (DISTINCT r.key) as religions,
collect(DISTINCT lfg.key) as lookingForGender, collect(DISTINCT c.key) as connectionType, collect(DISTINCT pt.key) as polyamtypes,
datetime({epochmillis: p.lastSeen }).month as month,
apoc.util.md5([id(p), 0]) as hash

ORDER BY month DESC, hash

SKIP 0 LIMIT 10

leelandclay · ‎12-31-2019

I didn't even consider chaining further relationships to reduce the initial query results. Thanks!

Neo4j

Trying to squeeze as much performance as I can out of this complex query