Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
12-05-2022 11:33 PM
I have built the following query that calculates the european air quality index for some entities. The calculation process take a long time to return the results. Is there any way to speed up the calculation, apoc parallel could do this? Any suggestions?
Thanx in advanced
call n10s.inference.nodesLabelled('Entity', {catNameProp: "label", catLabel: "Resource", subCatRel: "SCO" }) YIELD node
with distinct labels(node) as labels,["no2","so2","o3"] as SP_List,["pm2.5","pm10"] as PART_List
unwind labels as label
unwind SP_List as SP_name
unwind PART_List as PART_name
with SP_name,PART_name,label,localdatetime({timezone: 'Europe/Athens'}) as now
match (a)-[:hasSensor*0..1]-()-[:hasSensorProperty]->(b:SensorProperty)
where (label in LABELS(a)) AND (((b.name = SP_name) AND (LocalDateTime(b.timestamp) >= now-duration({hours:1}))) or ((b.name = PART_name) AND (LocalDateTime(b.timestamp) >= now-duration({hours:24}))))
with distinct a.name as Entity, b.name as measurement, avg(b.value) as value , custom.AQI(b.name,avg(b.value)) as measurement_AQI,["Good","Fair","Moderate","Poor","Very Poor","Extremely Poor"] as AQI_Index
return Entity, AQI_Index[max(measurement_AQI)] as AQI
Solved! Go to Solution.
12-08-2022 09:27 AM
I don't think you need to unwind the node labels, SP_List, and PART_List values and test each combination, as this will cause the match clause to be repeated for each combination. Instead, I think you get the same result by testing if the values are in the list. I did some other refactoring as well. I believe you should get the same result with the following query, but do check.
call n10s.inference.nodesLabelled('Entity', {catNameProp: "label", catLabel: "Resource", subCatRel: "SCO" }) YIELD node
unwind labels(node) as label
with collect(distinct label) as labels, localdatetime({timezone: 'Europe/Athens'}) as now
match (a)-[:hasSensor*0..1]-()-[:hasSensorProperty]->(b:SensorProperty)
where any(i in labels where i in labels(a))
AND (((b.name in ["no2","so2","o3"]) AND (LocalDateTime(b.timestamp) >= now-duration({hours:1}))) or ((b.name in ["pm2.5","pm10"]) AND (LocalDateTime(b.timestamp) >= now-duration({hours:24}))))
with a.name as Entity, b.name as measurement, avg(b.value) as value
with Entity, custom.AQI(measurement, value) as measurement_AQI
return Entity, ["Good","Fair","Moderate","Poor","Very Poor","Extremely Poor"][max(measurement_AQI)] as AQI
Finally, it is recommended to use parameters instead of literals, so you could extract the SP_List, PART_List, and the AQI lists and pass them as parameters instead of putting them inline as I did in the query. Something like the following:
call n10s.inference.nodesLabelled('Entity', {catNameProp: "label", catLabel: "Resource", subCatRel: "SCO" }) YIELD node
unwind labels(node) as label
with collect(distinct label) as labels, localdatetime({timezone: 'Europe/Athens'}) as now
match (a)-[:hasSensor*0..1]-()-[:hasSensorProperty]->(b:SensorProperty)
where any(i in labels where i in labels(a))
AND (((b.name in $SP_List) AND (LocalDateTime(b.timestamp) >= now-duration({hours:1}))) or ((b.name in $PART_List) AND (LocalDateTime(b.timestamp) >= now-duration({hours:24}))))
with a.name as Entity, b.name as measurement, avg(b.value) as value
with Entity, custom.AQI(measurement, value) as measurement_AQI
return Entity, $AQI_Values[max(measurement_AQI)] as AQI
12-08-2022 08:15 AM
I made an improvement at line 8. I replaced the filtering by label with id. it is much faster than the previous one but it is not enough yet. It takes almost 16 seconds to return the results. I have placed as attachment the profile results. Is there any suggestion for the execution time improvement?
call n10s.inference.nodesLabelled('Entity', {catNameProp: "label", catLabel: "Resource", subCatRel: "SCO" }) YIELD node
with node, ["no2","so2","o3"] as SP_List, ["pm2.5","pm10"] as PART_List
with SP_List,PART_List,collect({id:id(node)}) as node_prop
unwind SP_List as SP_name
unwind PART_List as PART_name
unwind node_prop as nd
with SP_name,PART_name,nd,localdatetime({timezone: 'Europe/Athens'}) as now
match (m)-[r*1..2]->(b:SensorProperty)
where id(m)=nd.id AND (((b.name = SP_name) AND (LocalDateTime(b.timestamp) >= now-duration({hours:1}))) or ((b.name = PART_name) AND (LocalDateTime(b.timestamp) >= now-duration({hours:24}))))
with distinct m.name as Entity, b.name as measurement, avg(b.value) as value , custom.AQI(b.name,avg(b.value)) as measurement_AQI,["Good","Fair","Moderate","Poor","Very Poor","Extremely Poor"] as AQI_Index
return Entity, AQI_Index[max(measurement_AQI)] as AQI
12-08-2022 09:27 AM
I don't think you need to unwind the node labels, SP_List, and PART_List values and test each combination, as this will cause the match clause to be repeated for each combination. Instead, I think you get the same result by testing if the values are in the list. I did some other refactoring as well. I believe you should get the same result with the following query, but do check.
call n10s.inference.nodesLabelled('Entity', {catNameProp: "label", catLabel: "Resource", subCatRel: "SCO" }) YIELD node
unwind labels(node) as label
with collect(distinct label) as labels, localdatetime({timezone: 'Europe/Athens'}) as now
match (a)-[:hasSensor*0..1]-()-[:hasSensorProperty]->(b:SensorProperty)
where any(i in labels where i in labels(a))
AND (((b.name in ["no2","so2","o3"]) AND (LocalDateTime(b.timestamp) >= now-duration({hours:1}))) or ((b.name in ["pm2.5","pm10"]) AND (LocalDateTime(b.timestamp) >= now-duration({hours:24}))))
with a.name as Entity, b.name as measurement, avg(b.value) as value
with Entity, custom.AQI(measurement, value) as measurement_AQI
return Entity, ["Good","Fair","Moderate","Poor","Very Poor","Extremely Poor"][max(measurement_AQI)] as AQI
Finally, it is recommended to use parameters instead of literals, so you could extract the SP_List, PART_List, and the AQI lists and pass them as parameters instead of putting them inline as I did in the query. Something like the following:
call n10s.inference.nodesLabelled('Entity', {catNameProp: "label", catLabel: "Resource", subCatRel: "SCO" }) YIELD node
unwind labels(node) as label
with collect(distinct label) as labels, localdatetime({timezone: 'Europe/Athens'}) as now
match (a)-[:hasSensor*0..1]-()-[:hasSensorProperty]->(b:SensorProperty)
where any(i in labels where i in labels(a))
AND (((b.name in $SP_List) AND (LocalDateTime(b.timestamp) >= now-duration({hours:1}))) or ((b.name in $PART_List) AND (LocalDateTime(b.timestamp) >= now-duration({hours:24}))))
with a.name as Entity, b.name as measurement, avg(b.value) as value
with Entity, custom.AQI(measurement, value) as measurement_AQI
return Entity, $AQI_Values[max(measurement_AQI)] as AQI
All the sessions of the conference are now available online