Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
12-11-2019 07:23 PM
Hi all,
I am quite confused about DB hits of Neo4j.
I have 2 following queries:
Query 1:
profile match (com:Company)<-[:IS_CUSTOMER]-(cust:Customer)
with cust
return sum(cust.sysid)
Query 2:
profile match (com:Company)<-[:IS_CUSTOMER]-(cust:Customer)
with cust
with
case when cust.createdYear=2017 then cust.sysid else 0 end as year_2017,
case when cust.createdYear=2018 then cust.sysid else 0 end as year_2018,
case when cust.createdYear>2018 then cust.sysid else 0 end as current_year
return sum(year_2017) as year_2017, sum(year_2018) as year_2018, sum(current_year) as present
For query 1: there are 15 total DB hits.
For query 2: there are 27 total DB hits.
As I understand, DB hits stand for the work of storage when I try to get data from the Neo4j database. So in the case of query 2, all customer's nodes are returned and used in the second line of query. It means that data on the storage is already retrieved.
Due to the execution plan, in the projection stage, there are 16 db hits, and this point makes me confused. If all customer's nodes are already retrieved from the database, why db hits are still procedure? Those CASE statements work with the returned data only, they don't need to get data from storage then process later.
12-12-2019 12:55 PM
When we work with nodes within Cypher, we use a lightweight object to represent the node with minimal information (such as the graph id of the node), as the query may not need to access properties of that node at all, and because property access can at times be expensive. So property access is lazy. When the MATCH finishes we have not accessed the node's properties, that will happen when properties are actually being used, such as in your CASE.
12-12-2019 07:01 PM
Hi Andrew,
Thank you so much for your reply.
So in this case, I can group the desired properties then I can use them later without creating new db hits:
profile match (com:Company)<-[:IS_CUSTOMER]-(cust:Customer)
with {createdYear: cust.createdYear, sysid: cust.sysid} as cust
with
case when cust.createdYear=2017 then cust.sysid else 0 end as year_2017,
case when cust.createdYear=2018 then cust.sysid else 0 end as year_2018,
case when cust.createdYear>2018 then cust.sysid else 0 end as current_year
return sum(year_2017) as year_2017, sum(year_2018) as year_2018, sum(current_year) as present
In this case, total db hits are 19. But for this approaching method, does it consume more memory than the previous methods?
If it does, on which case we should use the previous query and this query?
All the sessions of the conference are now available online