Neo4j

eric7 · ‎06-03-2021

Hello,

I am trying to optimize a query I have been working on but do not understand why cypher/neo4j profiler hits the database as much as it does.

The query below tries to find all mutual contacts for a given user $user_id.

1st pass

profile MATCH (u1:User {user_id: $user_id})-[:CONTACT]->(u2:User)
where exists ((u2)-[:CONTACT]->(u1))
return u1,u2

2nd pass (better but still not great)

profile MATCH (u1:User {user_id: $user_id})-[:CONTACT]->(u2:User)
with u1,u2
match ((u2)-[:CONTACT]->(u1))
return u1,u2

My understanding is that using WITH I am signaling to the 2nd MATCH clause the existence of the start and end nodes. However, the profiler seems to tell me that this incurs the most amount of db hits: Screen-Shot-2021-06-03-at-2-58-21-PM — ImgBB . I'm confused about best practices for things like this and how to optimize my query. Thank you!

andrew_bowman · ‎06-04-2021

Either approach should work. Generally I would favor the first query.

Given that you already have a unique constraint on :User(user_id), there's not much more tuning you can do here.

As for the number of db hits, perhaps the output of this query will show you how many relationships the query must consider before it arrives at the 28 resulting rows:

match (u1:User {user_id: $user_id})-[:CONTACT]->(u2:User)
with u1,u2
match (u2)-[:CONTACT]->()
return count(*) as relationshipsRequiringFiltering

Neo4j

Optimizing a query and understanding the profiler