cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Increase query performance

dlyberis
Node Link

I have built the following query that calculates the european air quality index for some entities. The calculation process take a long time to return the results. Is there any way to speed up the calculation, apoc parallel could do this? Any suggestions?

Thanx in advanced

call n10s.inference.nodesLabelled('Entity', {catNameProp: "label", catLabel: "Resource", subCatRel: "SCO" }) YIELD node 
with distinct labels(node) as labels,["no2","so2","o3"] as SP_List,["pm2.5","pm10"] as PART_List
unwind labels as label
unwind SP_List as SP_name
unwind PART_List as PART_name
with SP_name,PART_name,label,localdatetime({timezone: 'Europe/Athens'}) as now
match (a)-[:hasSensor*0..1]-()-[:hasSensorProperty]->(b:SensorProperty)
where (label in LABELS(a)) AND (((b.name = SP_name) AND (LocalDateTime(b.timestamp) >= now-duration({hours:1}))) or ((b.name = PART_name) AND (LocalDateTime(b.timestamp) >= now-duration({hours:24}))))
with distinct a.name as Entity, b.name as measurement, avg(b.value) as value , custom.AQI(b.name,avg(b.value)) as measurement_AQI,["Good","Fair","Moderate","Poor","Very Poor","Extremely Poor"] as AQI_Index 
return Entity, AQI_Index[max(measurement_AQI)] as AQI

1 ACCEPTED SOLUTION

I don't think you need to unwind the node labels, SP_List, and PART_List values and test each combination, as this will cause the match clause to be repeated for each combination. Instead, I think you get the same result by testing if the values are in the list. I did some other refactoring as well.  I believe you should get the same result with the following query, but do check.

call n10s.inference.nodesLabelled('Entity', {catNameProp: "label", catLabel: "Resource", subCatRel: "SCO" }) YIELD node 
unwind labels(node) as label
with collect(distinct label) as labels, localdatetime({timezone: 'Europe/Athens'}) as now
match (a)-[:hasSensor*0..1]-()-[:hasSensorProperty]->(b:SensorProperty)
where any(i in labels where i in labels(a))
AND (((b.name in ["no2","so2","o3"]) AND (LocalDateTime(b.timestamp) >= now-duration({hours:1}))) or ((b.name in ["pm2.5","pm10"]) AND (LocalDateTime(b.timestamp) >= now-duration({hours:24}))))
with a.name as Entity, b.name as measurement, avg(b.value) as value
with Entity, custom.AQI(measurement, value) as measurement_AQI
return Entity, ["Good","Fair","Moderate","Poor","Very Poor","Extremely Poor"][max(measurement_AQI)] as AQI

Finally, it is recommended to use parameters instead of literals, so you could extract the SP_List, PART_List, and the AQI lists and pass them as parameters instead of putting them inline as I did in the query. Something like the following:

call n10s.inference.nodesLabelled('Entity', {catNameProp: "label", catLabel: "Resource", subCatRel: "SCO" }) YIELD node 
unwind labels(node) as label
with collect(distinct label) as labels, localdatetime({timezone: 'Europe/Athens'}) as now
match (a)-[:hasSensor*0..1]-()-[:hasSensorProperty]->(b:SensorProperty)
where any(i in labels where i in labels(a))
AND (((b.name in $SP_List) AND (LocalDateTime(b.timestamp) >= now-duration({hours:1}))) or ((b.name in $PART_List) AND (LocalDateTime(b.timestamp) >= now-duration({hours:24}))))
with a.name as Entity, b.name as measurement, avg(b.value) as value
with Entity, custom.AQI(measurement, value) as measurement_AQI
return Entity, $AQI_Values[max(measurement_AQI)] as AQI

 

View solution in original post

2 REPLIES 2

dlyberis
Node Link

I made an improvement at line 8. I replaced the filtering by label with id. it is much faster than the previous one but it is not enough yet. It takes almost 16 seconds to return the results. I have placed as attachment the profile results. Is there any suggestion for the execution time improvement?  

call n10s.inference.nodesLabelled('Entity', {catNameProp: "label", catLabel: "Resource", subCatRel: "SCO" }) YIELD node 
with node, ["no2","so2","o3"] as SP_List, ["pm2.5","pm10"] as PART_List
with SP_List,PART_List,collect({id:id(node)}) as node_prop
unwind SP_List as SP_name
unwind PART_List as PART_name
unwind node_prop as nd
with SP_name,PART_name,nd,localdatetime({timezone: 'Europe/Athens'}) as now
match (m)-[r*1..2]->(b:SensorProperty)
where id(m)=nd.id AND (((b.name = SP_name) AND (LocalDateTime(b.timestamp) >= now-duration({hours:1}))) or ((b.name = PART_name) AND (LocalDateTime(b.timestamp) >= now-duration({hours:24}))))
with distinct m.name as Entity, b.name as measurement, avg(b.value) as value , custom.AQI(b.name,avg(b.value)) as measurement_AQI,["Good","Fair","Moderate","Poor","Very Poor","Extremely Poor"] as AQI_Index 
return Entity, AQI_Index[max(measurement_AQI)] as AQI

  

I don't think you need to unwind the node labels, SP_List, and PART_List values and test each combination, as this will cause the match clause to be repeated for each combination. Instead, I think you get the same result by testing if the values are in the list. I did some other refactoring as well.  I believe you should get the same result with the following query, but do check.

call n10s.inference.nodesLabelled('Entity', {catNameProp: "label", catLabel: "Resource", subCatRel: "SCO" }) YIELD node 
unwind labels(node) as label
with collect(distinct label) as labels, localdatetime({timezone: 'Europe/Athens'}) as now
match (a)-[:hasSensor*0..1]-()-[:hasSensorProperty]->(b:SensorProperty)
where any(i in labels where i in labels(a))
AND (((b.name in ["no2","so2","o3"]) AND (LocalDateTime(b.timestamp) >= now-duration({hours:1}))) or ((b.name in ["pm2.5","pm10"]) AND (LocalDateTime(b.timestamp) >= now-duration({hours:24}))))
with a.name as Entity, b.name as measurement, avg(b.value) as value
with Entity, custom.AQI(measurement, value) as measurement_AQI
return Entity, ["Good","Fair","Moderate","Poor","Very Poor","Extremely Poor"][max(measurement_AQI)] as AQI

Finally, it is recommended to use parameters instead of literals, so you could extract the SP_List, PART_List, and the AQI lists and pass them as parameters instead of putting them inline as I did in the query. Something like the following:

call n10s.inference.nodesLabelled('Entity', {catNameProp: "label", catLabel: "Resource", subCatRel: "SCO" }) YIELD node 
unwind labels(node) as label
with collect(distinct label) as labels, localdatetime({timezone: 'Europe/Athens'}) as now
match (a)-[:hasSensor*0..1]-()-[:hasSensorProperty]->(b:SensorProperty)
where any(i in labels where i in labels(a))
AND (((b.name in $SP_List) AND (LocalDateTime(b.timestamp) >= now-duration({hours:1}))) or ((b.name in $PART_List) AND (LocalDateTime(b.timestamp) >= now-duration({hours:24}))))
with a.name as Entity, b.name as measurement, avg(b.value) as value
with Entity, custom.AQI(measurement, value) as measurement_AQI
return Entity, $AQI_Values[max(measurement_AQI)] as AQI