Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
10-09-2018 02:49 PM
Hi,
I've been trying to get the algo.pagerank.stream to work on my data, but from what I understand there seem to be a bug in how the function returns nodeId's
Following the examples provided for the library, here is my simple query
call algo.pageRank.stream(
'MATCH (n) RETURN id(n) as id',
'MATCH (n1:DataState)-[r:MAKE_DISPLACEMENTSTATE]->(n2:DisplacementState) RETURN id(n1) as source, id(n2) as target'
,{graph:'cypher'}
) yield nodeId, score
with nodeId, score
return nodeId limit 10
And the result
And quite obviously those id's are not at all related to any :DataState or :DisplacementState nodes at all, but rather completely different nodes totally unrelated to the query in issue.
Any suggestions?
Solved! Go to Solution.
10-10-2018 04:05 AM
Answering myself - turns out the key point here is ordering and limiting results like so
ORDER BY score DESC limit 10
10-09-2018 03:18 PM
You didn't limit the nodes in the node-list to those labels.
MATCH (n) WHERE n:DataState OR n:DisplacementState RETURN id(n) as id
10-09-2018 03:24 PM
Ok I get that, but isn't the id(n) supposed to cover all id's and act as a 'function' to resolve the id's for the next match string (I believe I've seen it used like this by yourself in one example)?
Anyways - any suggestions to how to get the id's to use for only :DataState and :DisplacementState here?
10-09-2018 03:32 PM
See my query above.
Yes that's intentional the node list specifies the graph and the relationship-list fills it out.
10-10-2018 12:23 AM
Ok, so following your answer I tried to be very specific in identifying id's for only nodes that 'participates' in the second query, only this time with a slightly more advanced one like so
call algo.pageRank.stream(
'MATCH (f:Functional_Location)-[:HAS_TAG]->(:HistoryTag)-[:IN_STATEPERIOD]->(:StatePeriod)-[:HAS_DATASTATE]->(:DataState)-[:MAKE_DISPLACEMENTSTATE]->(n:DisplacementState) with collect(f)+collect(n) as nodes unwind nodes as n return id(n) as id',
'MATCH (n1:Functional_Location)-[:HAS_TAG]->(:HistoryTag)-[:IN_STATEPERIOD]->(:StatePeriod)-[:HAS_DATASTATE]->(:DataState)-[r:MAKE_DISPLACEMENTSTATE]->(n2:DisplacementState) RETURN id(n2) as source, id(n1) as target, count(r) as weight'
,{graph:'cypher'}
) yield nodeId,score
return nodeId
I tested the first query and from what I can see only the source and target node id's (for second query) is returned. However the returned result from pagerank is all 0's, i.e. no nodes/nodeId as expected from any of the queries. Obviously I must be fundamentally mistaken in how to work this thing - could you please explain where/what's wrong?
Btw - the intention here is to run the algorithm and then make a virtual graph from data to stream to gephi.
10-10-2018 03:46 AM
To the initial question it's worth mentioning the fact that a pagerank returning 'node' as opposed to 'nodeId' does in fact give the expected result
call algo.pageRank.stream(
'MATCH (n) RETURN id(n) as id','MATCH (n1:DataState)-[:MAKE_DISPLACEMENTSTATE]->(d1:DisplacementState) RETURN id(n1) as source, id(d1) as target'
,{graph:'cypher'}
) yield node,score
WITH node, score ORDER BY score DESC limit 10
return node.type as type, score;
And that's even with using the approach of collecting all node id's
MATCH (n) RETURN id(n) as id
Anyone care to elaborate?
10-10-2018 04:05 AM
Answering myself - turns out the key point here is ordering and limiting results like so
ORDER BY score DESC limit 10
10-10-2018 07:57 AM
So you're all good now?
10-10-2018 08:52 AM
Not really as even though the last approach works I still don't understand why the second one doesn't. Also I struggle to understand why, when using the match(n) return id(n) pattern and having a more specific 2nd query, the result of the pagerank call is returning all nodes/nodeId's
10-10-2018 12:40 PM
As I said the node-query is building up the graph. So you get all nodes in that projected graph.
The relationship-query only adds relationships between nodes that are in that projected graph.
So all id's are returned even if they have no connections, so their PR defaults to the initial value of 0.15
All the sessions of the conference are now available online