cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Error in graph-algorithms-algo-3.4.7

slygren
Node Clone

Hi,

I've been trying to get the algo.pagerank.stream to work on my data, but from what I understand there seem to be a bug in how the function returns nodeId's

Following the examples provided for the library, here is my simple query

call algo.pageRank.stream(
'MATCH (n) RETURN id(n) as id',
'MATCH (n1:DataState)-[r:MAKE_DISPLACEMENTSTATE]->(n2:DisplacementState) RETURN id(n1) as source, id(n2) as target'
,{graph:'cypher'}
) yield nodeId, score
with nodeId, score
return nodeId limit 10

And the result

And quite obviously those id's are not at all related to any :DataState or :DisplacementState nodes at all, but rather completely different nodes totally unrelated to the query in issue.

Any suggestions?

1 ACCEPTED SOLUTION

Answering myself - turns out the key point here is ordering and limiting results like so

ORDER BY score DESC limit 10

View solution in original post

9 REPLIES 9

You didn't limit the nodes in the node-list to those labels.

MATCH (n) WHERE n:DataState OR n:DisplacementState RETURN id(n) as id

Ok I get that, but isn't the id(n) supposed to cover all id's and act as a 'function' to resolve the id's for the next match string (I believe I've seen it used like this by yourself in one example)?

Anyways - any suggestions to how to get the id's to use for only :DataState and :DisplacementState here?

See my query above.

Yes that's intentional the node list specifies the graph and the relationship-list fills it out.

Ok, so following your answer I tried to be very specific in identifying id's for only nodes that 'participates' in the second query, only this time with a slightly more advanced one like so

call algo.pageRank.stream(
'MATCH (f:Functional_Location)-[:HAS_TAG]->(:HistoryTag)-[:IN_STATEPERIOD]->(:StatePeriod)-[:HAS_DATASTATE]->(:DataState)-[:MAKE_DISPLACEMENTSTATE]->(n:DisplacementState) with collect(f)+collect(n) as nodes unwind nodes as n return id(n) as id',
'MATCH (n1:Functional_Location)-[:HAS_TAG]->(:HistoryTag)-[:IN_STATEPERIOD]->(:StatePeriod)-[:HAS_DATASTATE]->(:DataState)-[r:MAKE_DISPLACEMENTSTATE]->(n2:DisplacementState) RETURN id(n2) as source, id(n1) as target, count(r) as weight'
,{graph:'cypher'}
) yield nodeId,score
return nodeId

I tested the first query and from what I can see only the source and target node id's (for second query) is returned. However the returned result from pagerank is all 0's, i.e. no nodes/nodeId as expected from any of the queries. Obviously I must be fundamentally mistaken in how to work this thing - could you please explain where/what's wrong?

Btw - the intention here is to run the algorithm and then make a virtual graph from data to stream to gephi.

To the initial question it's worth mentioning the fact that a pagerank returning 'node' as opposed to 'nodeId' does in fact give the expected result

call algo.pageRank.stream(
'MATCH (n) RETURN id(n) as id','MATCH (n1:DataState)-[:MAKE_DISPLACEMENTSTATE]->(d1:DisplacementState) RETURN id(n1) as source, id(d1) as target'
,{graph:'cypher'}
) yield node,score 
WITH node, score ORDER BY score DESC limit 10
return node.type as type, score;

And that's even with using the approach of collecting all node id's

MATCH (n) RETURN id(n) as id

Anyone care to elaborate?

Answering myself - turns out the key point here is ordering and limiting results like so

ORDER BY score DESC limit 10

So you're all good now?

Not really as even though the last approach works I still don't understand why the second one doesn't. Also I struggle to understand why, when using the match(n) return id(n) pattern and having a more specific 2nd query, the result of the pagerank call is returning all nodes/nodeId's

As I said the node-query is building up the graph. So you get all nodes in that projected graph.

The relationship-query only adds relationships between nodes that are in that projected graph.

So all id's are returned even if they have no connections, so their PR defaults to the initial value of 0.15