Neo4j

cj2001 · ‎06-05-2020

I am using Neo4j 3.5.17 Enterprise with GDS 1.2.

I am specifically trying to create a query that will take the top n nodes (by PageRank) and compute the Euclidean distance to each of those nodes for every node within 2 hops (ego radius=2) of those nodes. For example, suppose I have nodes A, B, and C as the 3 nodes with the highest PageRank. So then I want to get all nodes that are 2 hopes individually from each of those nodes. So that might be a set of nodes something like

Node A:    D, E, F
Node B:    G, H
Node C:    I, J, K, L

So I want to loop through nodes A-C, find their respective nodes in the ego graph, and set the Euclidean distance on nodes D-L based on their relationship to their parent node (A-C). So I might get some result:

Node D has a Euclidean distance of 100.0 from Node A
...
Node G has a Euclidean distance of 200.0 from Node B
...
etc.

I have managed to make this work for single nodes, such as providing node A explicitly using:

MATCH (r1:NodeLabel)-[*..2]-(r2:NodeLabel {nodeName: 'A'})
SET r1.distance = gds.alpha.similarity.euclideanDistance(r1.myVector, r2:myVector)
RETURN DISTINCT r1.nodeName, r1.pagerank, r2.nodeName
ORDER BY r1.distance

However, I would like to be able to loop this over several values of r2:nodeName. To do this, I have tried the following:

MATCH (r1:nodeLabel) WHERE r1.pagerank > 40.
MATCH (r2:nodeLabel)-[*..2]-r1
SET r2.distance = gds.alpha.similarity.euclideanDistance(r1.myVector, r2.myVector)
RETURN DISTINCT r1.nodeName, r1.pagerank, r2.nodeName
ORDER BY r2.distance

however I get the following error:

Invalid input '(': expected whitespace, comment, '.', node labels, '[', "=~", IN, STARTS, ENDS, CONTAINS, IS, '^', '*', '/', '%', '+', '-', '=', '~', "<>", "!=", '<', '>', "<=", ">=", AND, XOR, OR, FROM GRAPH, CONSTRUCT, LOAD CSV, START, MATCH, UNWIND, MERGE, CREATE UNIQUE, CREATE, SET, DELETE, REMOVE, FOREACH, WITH, CALL, RETURN, UNION, ';' or end of input (line 2, column 7 (offset: 56))
"MATCH (r2:nodeLabel)-[*..2]-r1"

Any suggestions? Thanks in advance!

Cobra · ‎06-08-2020

Ok

Have a look at apoc.cypher.doIt(): doc

Put this part in it:

MATCH (r2:nodeLabel)-[*..2]-(r1)
SET r2.distance = gds.alpha.similarity.euclideanDistance(r1.myVector, r2.myVector)
RETURN DISTINCT r1.nodeName, r1.pagerank, r2.nodeName

Moreover, should not it be [*0..2]?

Regards,
Cobra

View solution in original post

Cobra · ‎06-06-2020

Hello @cj2001

There is a syntax error on your request, you forget to put r1 between ():

MATCH (r1:nodeLabel) WHERE r1.pagerank > 40.
MATCH (r2:nodeLabel)-[*..2]-(r1)
SET r2.distance = gds.alpha.similarity.euclideanDistance(r1.myVector, r2.myVector)
RETURN DISTINCT r1.nodeName, r1.pagerank, r2.nodeName
ORDER BY r2.distance

Moreover, you can have a look have the ORDER BY clause if you want the top n nodes (by PageRank).

Regards,
Cobra

cj2001 · ‎06-08-2020

Oh, yes. That was silly on my part. The typo is in my transcription of going from my very specific query on my system to making it generalized for this post. The ()'s are actually in my query as you have written it above, and I am still getting the original error.

Cobra · ‎06-08-2020

Ok

Have a look at apoc.cypher.doIt(): doc

Put this part in it:

MATCH (r2:nodeLabel)-[*..2]-(r1)
SET r2.distance = gds.alpha.similarity.euclideanDistance(r1.myVector, r2.myVector)
RETURN DISTINCT r1.nodeName, r1.pagerank, r2.nodeName

Moreover, should not it be [*0..2]?

Regards,
Cobra

cj2001 · ‎06-08-2020

OK, this worked, but not for the reasons we thought.

It turns out that what it was objecting to was WHERE r1.pagerank > 40.. In particularly, it didn't like that this line ended with a .. Once I replaced 40. to 40.0, it worked.

Thank you for your help!

Cobra · ‎06-08-2020

Oh I'm happy to hear this

No problem

Regards,
Cobra

Neo4j

Looping through a subset of nodes to set values of other nodes