Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
11-17-2020 08:04 AM
Hello,
I'm new to neo4j and love it so far, but I'm having a little trouble with more complicated cypher queries. Currently I have a node relationship like:
(user:User)-[:MEMBERSHIP]->(:Group)<-[:MEMBERSHIP]-(group_member:User)
I'd like to:
Of course if there is a way to query 1 & 2 in the same "MATCH/WHERE" that would be preferred but I'm not seeing a way to do this.
Any pointers in the right direction would be awesome!
11-20-2020 10:48 AM
MATCH (user1:User {name:"bob"}),(user2:User)
WHERE NOT (user1)-[:MEMBERSHIP]->(g:Group)<-[:MEMBERSHIP]-(user2)
AND g.name='filter me in' // group property restriction
return user2.name
Do you have a small test set you could share? (create statements for test set would help)
11-20-2020 09:15 PM
Thank you for this Joel! I will play with it and let you know how it goes.
11-22-2020 05:28 PM
Yes, this code is helpful! Thank you.
So right now these two queries are working:
MATCH (user1:User {user_id:166}),(user2:User)
WHERE (NOT (user1)-[:MEMBERSHIP]->(:Group)<-[:MEMBERSHIP]-(user2)
AND distance(user1.location, user2.location) < 20000)
RETURN user2 AS user
MATCH (user1:User {user_id:166})-[:MEMBERSHIP]->(:Group)<-[user2_rel:MEMBERSHIP]-(user2:User)
WHERE user2_rel.status='left' AND distance(user1.location, user2.location) < 20000
RETURN user2 AS user
Basically, the first returns all users that are not in any groups with user1 that are within 20 km of user1. The second returns all users that have left a group that they were in with user1 and are within 20 km.
I'd like to combine both sets, sort the union by certain properties of user2, and them limit to a specific number. I know that using UNION is slow and also you have very limited options on the UNION'ed set. So it seems like the best thing would be to combine both into one statement if it can be done.
Thanks again for your help! Getting closer...
11-23-2020 08:45 AM
It is best to not spend time optimizing until it is working and even then, only if required. You should be able to put a UNION between those two, to concatenate the results.
simple example below, here I'm renaming column names so I can union the results
RETURN 'apoc.version' as lib_name, apoc.version() as version
UNION
RETURN 'gds.version' as lib_name, gds.version() as version
Note: If you are neo4j 4.x you can also do this
CALL {
RETURN 'apoc.version' as lib_name, apoc.version() as version
UNION
RETURN 'gds.version' as lib_name, gds.version() as version
}
return lib_name, version
order by lib_name desc
11-28-2020 08:22 AM
So, I now am able to construct a full query that works. The part of the query that deals with existing relationships is performing well. However the part that is pulling no relationships is performing terribly.
Here is that query:
MATCH (current_user:GraphUser {user_id:166}),(potential_user:GraphUser)
WITH current_user AS current_user, potential_user AS potential_user,
distance(current_user.location,potential_user.location) AS distance
WHERE (NOT (current_user)-[:GRAPH_MEMBERSHIP]->()<-[:GRAPH_MEMBERSHIP]-(potential_user))
AND distance < 20000
RETURN potential_user, distance
ORDER BY distance ASC
LIMIT 10
There are about 100K GraphUser nodes, so not even that many.
Here is the profile output:
I don't understand why the filtering needs to do millions of db hits for a set of under 100k rows, but i think it has something to do with the Cartesian product. But at over a second, this is performing much worse than the statement in SQL.
Thanks for the help so far, maybe I'm missing something obvious.
11-28-2020 08:37 AM
Checkout:
One tip I learned from the video, is you can move up the LIMIT
command so that the query will end earlier (which is different from SQL.). I haven't quite got the hang of it myself, so I'm not sure exactly how to transform your query.
11-28-2020 12:45 PM
Regarding optimizations to the existing query
Try doing the filtering in two steps, I'd first try filtering for distance < 20000, then WITH those results apply the WHERE NOT
If/how this helps optimize depends on the data content, but what we're trying to do is quickly narrow the cartesian product list down to something much smaller as fast as possible, so you might also try applying the WHERE NOT first, and then WITH those results calculate the distance of the remaining results applying the distance<20000 filter second. My gut tells me to try the simple distance check first, but you never know for sure until you try it, I really enjoy the surprises..
All the sessions of the conference are now available online