Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
04-20-2022 08:28 AM
Using GDS 2.0 and the Python client and trying to do something pretty simple.
I have run an algorithm and streamed it to a data frame all good. Now I want to add a column to that data frame with an identifier from the node properties since at the moment all I have are node Ids.
My goal is to retrieve the "num" property from the node and I amusing gds.util.asNode and passing the the previously generated nodeId held in the data frame.
LCC_sub['pat']=gds.util.asNode(LCC_sub['nodeId'])["num"]
This does not work and is showing a neo4j cypher error.
CypherSyntaxError: {code: Neo.ClientError.Statement.SyntaxError} {message: Invalid input '2': expected whitespace, comment, '.', node labels or rel types, '[', '^', '*', '/', '%', '+', '-', "=~", IN, STARTS, ENDS, CONTAINS, IS, '=', "<>", "!=", '<', '>', "<=", ">=", AND, XOR, OR, ',' or ')' (line 1, column 30 (offset: 29))
"RETURN gds.util.asNode(0 262921"
^}
Looking at the details the 0 is the index and then 262921 is the actual nodeId I want the property value on. Why is the index of the dataframe being passed to function? I have tested passing LCC_nodeId by itself to a new column without issue, i.e. there was no index value along with the nodeId.
Any clues, solutions, workarounds?
Andy
04-20-2022 03:17 PM
Is LCC_sub['nodeId'] a single number?
Or is it multiple numbers?
I think the cause is that the value of LCC_sub['nodeId'] is not a single number.
Sample data
CREATE (nAlice:User {name: 'Alice', num: 1})
CREATE (nBridget:User {name: 'Bridget', num: 2})
CREATE (nCharles:User {name: 'Charles', num: 3})
CREATE (nAlice)-[:LINK]->(nBridget)
CREATE (nBridget)-[:LINK]->(nCharles)
id(n)
Alice is 0
Bridget is 1
Charles is 2
Single id(n)
RETURN gds.util.asNode(1)["name"] AS node
"Bridget"
RETURN gds.util.asNode(1 2)["name"] AS node
Neo.ClientError.Statement.SyntaxError
Invalid input '2': expected whitespace, comment, '.', node labels or rel types, '[', '^', '*', '/', '%', '+', '-', "=~", IN, STARTS, ENDS, CONTAINS, IS, '=', "<>", "!=", '<', '>', "<=", ">=", AND, XOR, OR, ',' or ')' (line 1, column 26 (offset: 25))
"RETURN gds.util.asNode(1 2)["name"] AS node"
RETURN gds.util.asNodes([1,2]) AS node
[{"name":"Bridget","num":2},{"name":"Charles","num":3}]
04-20-2022 05:11 PM
Hi Koji,
It is a single number that comes straight from the data frame that GDS 2.0 outputs from an algorithm.
The local clustering coefficient in this case.
nodeId | localClusteringCoefficient | |
---|---|---|
0 | 262921 | 0.000000 |
1 | 263036 | 0.000000 |
2 | 264811 | 0.166667 |
3 | 266847 | 0.000000 |
4 | 268280 | 0.000000 |
Andy
04-20-2022 06:40 PM
I think ..
The nodeId 262921 is OK.
The nodeId '0 262921' is NG.
04-20-2022 07:44 PM
The 0 (zero) is the index of the data frame not part of the cell value.
It appears as if the index is being injected along with cell value which shouldn't happen. I have tested this with a simple column test.
LCC_sub['test'] = LCC_sub['nodeId] and it appears to work correctly the first entry in the new column is '262921'. This is what I expected to be passed to the gds.util.asNode. The error message implies that the index value of 0 is also being passed. So I am at bit of a loss.
Andy
04-21-2022 12:37 AM
nodeId = id() ?
This data may not be what you think it is.
Here is what the data looks like
The node with id() = 0 is the first line.
id() is the real Node ID.
The nodeId is 262921
And localClusteringCoefficient(num) is 0.000000
id() nodeId localClusteringCoefficient
0 262921 0.000000
1 263036 0.000000
2 264811 0.166667
3 266847 0.000000
4 268280 0.000000
Create the data.
CREATE (:LCC_sub {nodeId: 262921, num: 0.000000})
CREATE (:LCC_sub {nodeId: 263036, num: 0.000000})
CREATE (:LCC_sub {nodeId: 264811, num: 0.166667})
CREATE (:LCC_sub {nodeId: 266847, num: 0.000000})
CREATE (:LCC_sub {nodeId: 268280, num: 0.000000})
id(LCC_sub) is the id that value is 0.
LCC_sub['nodeId'] is not the id that value is 262921,
MATCH (LCC_sub:LCC_sub)
WHERE LCC_sub.nodeId = 262921
RETURN gds.util.asNode(id(LCC_sub))["num"]
gds.util.asNode(id(LCC_sub))["num"]
0.0
04-21-2022 07:33 AM
Hi Koji,
Thank you for response but I think it does not capture the issue.
The workflow executed in Python via the new GDS client is:
I have large base graph with 400K+ nodes and 1.6M relationships.
I use GDS to create an in memory graph on a subset of that graph with 5000 nodes and 33K relationships.
On that in memory graph I run a label propagation algorithm and write community Ids back to the original database. With those Ids written I create another in memory graph based on one of the community ids. This is the graph I am working on and doing a suite of centrality algorithms and returning the values via stream to my python. The streamed results are assigned to a variable and it has the type pandas DataFrame. The first column of numbers you see are the data frame index and are not part of the Neo4j graph either the real or in memory one.
With working with GDS the in memory graphs only store the base id from parent graph and as such for human interpretation you need to reference back to the original graph for the properties associated with the nodes. Note you need the original graph not the in memory graph.
To that end a gds.util.asNode takes as input the nodeId and within Path returns an object of the type neo4j.graph.node which has attributes of a dictionary. Thus if I execute this in JupyterLab shell it returns the value of the property 'num' as a string which is correct.
gds.util.asNode(262921)['num']
'10199212'
If I then do this to get a list of the 'num' values from my data frame
pat_num=list()
for case in LCC_sub.itertuples():
pat_num.append(gds.util.asNode(case.nodeId)['num'])
This works and returns a list of the values. I can then merge this list into my data frame and this is a work around.
The question is why does this work as a single line where LCC_pat is the same data frame.
LCC_sub['pat']=gds.util.asNode(LCC_sub.nodeId)['num']
if I do a test to see if the LCC_sub.nodeId is giving a correct value I do:
The error message when I run this
LCC_sub['pat']=gds.util.asNode(LCC_sub.nodeId)['num']
appears to indicate that gds.util.asNode is not getting the right value in this instance but does in the for loop with in essence the same format. What is amiss?
Andy
All the sessions of the conference are now available online