Neo4j

jiteshjosh · ‎01-03-2021

Which code is optimized and efficient in Neo4j?
A) MATCH(tom:Person{name:"Tom Hanks"})-[:ACTED_IN]->(mov:Movie)<-[:ACTED_IN]-(co:Person)
RETURN co.name,mov.title

B) MATCH (tom:Person {name:"Tom Hanks"})-[:ACTED_IN]->(m)<-[:ACTED_IN]-(coActors) RETURN coActors.name

elaine_rosenber · ‎01-04-2021

I believe that B will be more efficient. We know that all :ACTED_IN relationships must point to Movie nodes so there is no need to use it in the pattern. Also if you don't need to return the title, that is less property access that needs to occur.

The real answer to your question, however is based upon the data you are querying. And of course the definitive answer is to prepend these queries with PROFILE which will give you the answer. Make sure, however that you take the send run or PROFILE for each of them as the first run of PROFILE needs to compile the query into the query cache.

All of this is covered in the course, Cypher Query Tuning in Neo4j 4.x.

Elaine

jiteshjosh · ‎01-04-2021

Yes, we know about it, however does machine also knows in the smart way that :ACTED_IN must point to Movie only? If not and then if machine scans to find the relation for all nodes then it will be overhead.
Thanks for sharing the option to PROFILE the queries. I will check it.

elaine_rosenber · ‎01-04-2021

What is used during the query will also depend upon what is stored and used in the count store.

This is also covered in the Query Tuning course.

Elaine

clem · ‎01-04-2021

I presume you care about the presence/absence of the Labels and not the extra movie title RETURN in A vs. B

Here, the PROFILE command will answer your question

A)

PROFILE MATCH(tom:Person{name:"Tom Hanks"})-[:ACTED_IN]->(mov:Movie)<-[:ACTED_IN]-(co:Person)
RETURN co.name // ,mov.title // removed

shows 265 total db hits in 120 ms the first time the query is run after restarting the DB.

B)
PROFILE MATCH (tom:Person {name:"Tom Hanks"})-[:ACTED_IN]->(m)<-[:ACTED_IN]-(coActors) RETURN coActors.name
shows 214 total db hits in 82 ms the first time the query is run after restarting the DB

The profile pictures shows why (A on the left, B on the right.) The query with the extra Labels (Movie and Person for co actors) have two extra Filter statements, which in the Movie DB are unneeded because ACTED_IN relationship always starts with a Person and ends with a Movie. That might not be true in general.

I will note that having the extra Labels helps a person understand the query better.

I do wonder if this is an opportunity for optimization: could Neo4J keep track of the Labels associated with a Relationships, and not bother to filter if there is only one type of Label for either the in or out of the relationship.

jiteshjosh · ‎01-05-2021

It was nice to see the explanation about the difference. Thanks!
However, will curious to know the results in a huge Graphdb having multiple relations etc.

michael_hunger · ‎01-09-2021

The database keeps some track of that but not in enough granularity (e.g. multi-label or no-labels) that it can correctly determine that.

So it's currently up to the user and their knowledge of the domain to reduce the number of label checks where feasible.

Neo4j

Efficient Code