Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
01-12-2023 12:07 AM - edited 01-12-2023 12:37 AM
I have the following query:
PROFILE MATCH(e:Entity {name:'Mona Lisa'})-[rel]->(e2:Entity)
WITH *, collect( e2.name) as e2Nodes
MATCH(e2)-[rele2ins:`https://www.wikidata.org/wiki/Property:P31`]->(e2ins:Entity) WHERE e2ins.wikidata_id in ['Q5']
WITH *, collect( e2ins.name) as e2InsNodes
return e2Nodes,e2InsNodes
It has 1k db hits and it's plan is as below:
When I add a WHERE condition to this query as follows:
PROFILE MATCH(e:Entity {name:'Mona Lisa'})-[rel]->(e2:Entity)
WITH *, collect( e2.name) as e2Nodes
MATCH(e2)-[rele2ins:`https://www.wikidata.org/wiki/Property:P31`]->(e2ins:Entity) WHERE e2ins.wikidata_id in ['Q5']
WITH *, collect( e2ins.name) as e2InsNodes
return e2Nodes,e2InsNodes
It's db hit count increases to 100k and the plan is as below:
Basically I'm returning entities(e2) have an incoming edge from 'Mona Lisa'. This is 1k db hits. Then I fetch entities with some specific relation(...P31) of these e2 entities. The total number of results is 227. However, there are 100k hits in between the steps when I filtered results by some attribute. Why is this the case?
Thanks.
NOTE: This applies to the following two queries below:
PROFILE MATCH(e:Entity {name:'Mona Lisa'})-[rel]->(e2:Entity)-[rele2ins:`https://www.wikidata.org/wiki/Property:P31`]->(e2ins:Entity)
return e2.name,e2ins.wikidata_id
PROFILE MATCH(e:Entity {name:'Mona Lisa'})-[rel]->(e2:Entity)-[rele2ins:`https://www.wikidata.org/wiki/Property:P31`]->(e2ins:Entity) WHERE e2ins.wikidata_id in ['Q5']
return e2.name,e2ins.wikidata_id
01-12-2023 12:50 AM
As I've figured out, when you put a WHERE condition - no matter the start node is specified by an index lookup - it also fetches those nodes with the specified conditions in the where clause at the beginning of the query.
If the first query is collected beforehand, than it works as expected with few db hits.
PROFILE MATCH(e:Entity{name: 'Mona Lisa'})-[rel]->(e2:Entity)-[rele2ins:`https://www.wikidata.org/wiki/Property:P31`]->(e2ins:Entity) WITH e,e2,collect(e2ins) as e2InsNodes UNWIND e2InsNodes as e2ins WITH * WHERE e2ins.wikidata_id = 'Q5'
return distinct e2.name,e2ins.wikidata_id
I'm not sure about the reason of this implementation and causes of it but I think this types of queries should be handled in a way that they start from the start node, expand and filter queries by WHERE clause. It would be a nice thing to have.
01-12-2023 04:07 PM
In the new query planner it tries to optimize for speed rather than memory and db hits. If a node has an unique index it tries to get those nodes and traverse from both sides and reach the middle. This can have higher db hits, but the total time query taken can get improved.
I have talked about this aspect in my book Graph Data Processing with Cypher .
All the sessions of the conference are now available online