cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

cypher: Calculate distances

dlyberis
Node Link

I am trying to calulate distances between each node.

 

 

call n10s.inference.nodesLabelled('Entity', {catNameProp: "label", catLabel: "Resource", subCatRel: "SCO" }) YIELD node 
match(node)-[:hasLocation]-(b:Location) 
WITH node,point({latitude: avg(b.lat), longitude: avg(b.long)}) as entity_point
return node.name, entity_point

 

 

is it possible to calculate the distance between the returned nodes and find the closest neighboor to each one? Ιn continuation of the above query is there any for loop way to impement something like this?

Thank you in advance.

2 ACCEPTED SOLUTIONS

I don't have the library or data to test this. You can try to see if it results in what you are looking for. 

call n10s.inference.nodesLabelled('Entity', {catNameProp: "label", catLabel: "Resource", subCatRel: "SCO" }) YIELD node 
match(node)-[:hasLocation]-(b:Location) 
with node, point({latitude: avg(b.lat), longitude: avg(b.long)}) as entity_point
with collect({name: node.name, point: entity_point}) as points
unwind range(0,size(points)-2) as index
with index, points[index] as n, points[index+1..] as otherNodes
return index, n.name, [x in otherNodes | {name: x.name, distance: n.point - x.point}] as distances

The distance between two points occurs with the expression 'n.point - x.point' on line 7.  Replace this with the actual distance calculation between two entity points.  The result of the query will be a row for each node, containing the name of the node and a collection of the other nodes and their distance from the row's node. The number of calculations per row decreases by one each row, as the algorithm does not calculate 'n.point - x.point' and 'x.point - n.point'.  Let me know if there are issues and we can see if we can resolve them. 

View solution in original post

Oops, there is an error in line 12 and I can't edit the previous post. The following is the corrected version. 

call n10s.inference.nodesLabelled('Entity', {catNameProp: "label", catLabel: "Resource", subCatRel: "SCO" }) YIELD node 
match(node)-[:hasLocation]-(b:Location) 
with node, point({latitude: avg(b.lat), longitude: avg(b.long)}) as entity_point
with collect({name: node.name, point: entity_point}) as points
unwind points as a
unwind points as b
with a, b
where a.name <> b.name
with a, b, distance(a.point, b.point) as distance
order by distance
return a.name as name, collect({name: b.name, distance: distance})[..3] as otherNodes

View solution in original post

7 REPLIES 7

I don't have the library or data to test this. You can try to see if it results in what you are looking for. 

call n10s.inference.nodesLabelled('Entity', {catNameProp: "label", catLabel: "Resource", subCatRel: "SCO" }) YIELD node 
match(node)-[:hasLocation]-(b:Location) 
with node, point({latitude: avg(b.lat), longitude: avg(b.long)}) as entity_point
with collect({name: node.name, point: entity_point}) as points
unwind range(0,size(points)-2) as index
with index, points[index] as n, points[index+1..] as otherNodes
return index, n.name, [x in otherNodes | {name: x.name, distance: n.point - x.point}] as distances

The distance between two points occurs with the expression 'n.point - x.point' on line 7.  Replace this with the actual distance calculation between two entity points.  The result of the query will be a row for each node, containing the name of the node and a collection of the other nodes and their distance from the row's node. The number of calculations per row decreases by one each row, as the algorithm does not calculate 'n.point - x.point' and 'x.point - n.point'.  Let me know if there are issues and we can see if we can resolve them. 

dlyberis
Node Link

Thank you very much for your answer it worked after using the distance function on line 7 as you can see at the following cypher code.

call n10s.inference.nodesLabelled('Entity', {catNameProp: "label", catLabel: "Resource", subCatRel: "SCO" }) YIELD node 
match(node)-[:hasLocation]-(b:Location) 
with node, point({latitude: avg(b.lat), longitude: avg(b.long)}) as entity_point
with collect({name: node.name, point: entity_point}) as points
unwind range(0,size(points)-2) as index
with index, points[index] as n, points[index+1..] as otherNodes
return index, n.name, [x in otherNodes |{name: x.name, distance: distance(n.point,x.point)}] as distances

a part of the result is depicted at the following image

dlyberis_0-1663766063715.png

Is it possible to get an order by "distance" key of the returned distances lists for each n.name? is there a sort way to do it in the code that you provided ?
i really appreciate your help!

try this:

call n10s.inference.nodesLabelled('Entity', {catNameProp: "label", catLabel: "Resource", subCatRel: "SCO" }) YIELD node 
match(node)-[:hasLocation]-(b:Location) 
with node, point({latitude: avg(b.lat), longitude: avg(b.long)}) as entity_point
with collect({name: node.name, point: entity_point}) as points
unwind range(0,size(points)-2) as index
with index, points[index] as n, points[index+1..] as otherNodes
with index, n.name as name, [x in otherNodes |{name: x.name, distance: distance(n.point,x.point)}] as distances
unwind distances as distance
with index, name, distance
order by distance desc
return index, name, collect(distance) as distances

Do you want each name to have the full list of other names and their corresponding distances? 

dlyberis
Node Link

with your code i can realize how cypher can handle data manipulation and aggregation, it is really helpfull. I prefer each name to have only the first 3 elements with the shortest distance.

i tried this one is it a proper way?

match(node)-[:hasLocation]-(b:Location) 
with node, point({latitude: avg(b.lat), longitude: avg(b.long)}) as entity_point
with collect({name: node.name, point: entity_point}) as points
unwind range(0,size(points)-2) as index
with index, points[index] as n, points[index+1..] as otherNodes
with index, n.name as name, [x in otherNodes |{name: x.name, distance: distance(n.point,x.point)}] as distances
unwind distances as distance
with index, name, distance
order by distance
with index,name,collect(distance) as distances
return index, name,distances[0..3] 

The way I wrote the code, each name has less and less comparison to other nodes. The last name is not even given an output. This is because I just calculated the distance between the current node and the remaining nodes in the list, because the distance calculation is commutative.  As such, the the results as is are not necessarily the top three closest nodes for each node. The only one that has this property is the first node, as it contains all node calculations in its list.  This can be fixed.  The easiest way is just calculate the distance for each node agains all nodes, and filter the closest three for each. This ignores the efficiency of not calculating distance(a, b) and distance(b, a).  To retain the efficiency of not calculating each distance twice, the above query can be modified. It is just more work and less understandable. If you want, I can alter it so you can get a valid list of the top three closest nodes for each node. 

Try this version. It should provide you a list of all the nodes, with each nodes corresponding three closest other nodes in a list. To make the code simpler to understand, I just went ahead and calculated the distance between each pair of nodes in both orders. I assume the distance calculation does not take that long. 

The double unwind of points results in rows that are the Cartesian product of the points elements. With this, you can calculate the distance between every two points. The rows that contain the same two points are filtered out with line 8. I used list slicing to keep only the first three other nodes (which are the closest since the data is sorted in ascending order). This is what you correctly did too. 

call n10s.inference.nodesLabelled('Entity', {catNameProp: "label", catLabel: "Resource", subCatRel: "SCO" }) YIELD node 
match(node)-[:hasLocation]-(b:Location) 
with node, point({latitude: avg(b.lat), longitude: avg(b.long)}) as entity_point
with collect({name: node.name, point: entity_point}) as points
unwind points as a
unwind points as b
with a, b
where a.name <> b.name
with a, b, distance(a.point, b.point) as distance
order by distance
with a.name as name, collect({name: b.name, distance: distance}) as bNodes
return name, collect(bNodes)[..3] as otherNodes

 

Oops, there is an error in line 12 and I can't edit the previous post. The following is the corrected version. 

call n10s.inference.nodesLabelled('Entity', {catNameProp: "label", catLabel: "Resource", subCatRel: "SCO" }) YIELD node 
match(node)-[:hasLocation]-(b:Location) 
with node, point({latitude: avg(b.lat), longitude: avg(b.long)}) as entity_point
with collect({name: node.name, point: entity_point}) as points
unwind points as a
unwind points as b
with a, b
where a.name <> b.name
with a, b, distance(a.point, b.point) as distance
order by distance
return a.name as name, collect({name: b.name, distance: distance})[..3] as otherNodes