cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Path counting

I need to fix an issue i am facing.
simply put, i need to select 500 random nodes and count the number of paths to a single target node at hops 1,2,3,4. i am doing it one at a time after coming up with those random nodes using a separate code. this means i have to run the count 500 times and it is time consuming.

match (n:random), (m:not random)
with n, collect(m) as terminatorNodes CALL apoc.path.expandConfig(n, {
relationshipFilter:">",
minLevel: 1,
maxLevel: 4,
terminatorNodes:terminatorNodes,
uniqueness: "NODE_PATH"
}) yield path
with count (path) as number_of_paths, (length(path)) as number_of_hops WITH collect ({number_of_hops:number_of_hops,
number_of_paths:number_of_paths}) as rows, max(number_of_hops) as maxHops
UNWIND range(1, 4) as number_of_hops
RETURN number_of_hops, coalesce([row IN rows WHERE row.number_of_hops = number_of_hops][0]. number_of_paths,0) as number_of_paths
3X_a_b_ab0e7d9eacd261621c2e43a72f4ed4f04ad865ea.png

2 ACCEPTED SOLUTIONS

No worries, I need to ask questions to get us to an equal understanding. 

Try this, but make sure you review the ‘return’ clause to ensure the correct property names. 

 
match (m)
with m, rand() as R
order by R
limit 500
with collect(m) as terminatorNodes
match (n)
where id(n)=100
with n, terminatorNodes
CALL apoc.path.expandConfig(n, {
relationshipFilter:">",
minLevel: 1,
maxLevel: 4,
terminatorNodes: terminatorNodes,
uniqueness: "NODE_PATH"
}) yield path
with n, path, length(path) as hops
with n, nodes(path)[hops] as target, hops
with n, target, hops, count(*) as paths
with n, target, collect({h: hops, p: paths}) as stats
return n.name as `start node (n)`, n.cui, target.rScore as `r score`,
target.name as target_node, target.cui,
coalesce([i in stats where i.h = 1][0].p, 0) as `hop-1`,
coalesce([i in stats where i.h = 2][0].p, 0) as `hop-2`,
coalesce([i in stats where i.h = 3][0].p, 0) as `hop-3`,
coalesce([i in stats where i.h = 4][0].p, 0) as `hop-4`

View solution in original post

You used to be able to figure that out very simply with size( (m)-[]->() ), but the use of patterns for anything but testing for the existence of a pattern has been deprecated. I have used a 'call' subquery to accomplish the same. I did not have your modified code, so I used the last version. You will need to make your modifications again...sorry.

match (m)
with m, rand() as R
order by R
limit 500
with collect(m) as terminatorNodes
match (n)
where id(n)=100
with n, terminatorNodes
CALL apoc.path.expandConfig(n, {
relationshipFilter:">",
minLevel: 1,
maxLevel: 4,
terminatorNodes: terminatorNodes,
uniqueness: "NODE_PATH"
}) yield path
with n, path, length(path) as hops
with n, nodes(path)[hops] as target, hops
with n, target, hops, count(*) as paths
with n, target, collect({h: hops, p: paths}) as stats
call{
    with target
    optional match (target)-[]->()
    return count(*) as targetOutgoingRelationships
}
return n.name as `start node (n)`, n.cui, target.rScore as `r score`,
target.name as target_node, target.cui,
coalesce([i in stats where i.h = 1][0].p, 0) as `hop-1`,
coalesce([i in stats where i.h = 2][0].p, 0) as `hop-2`,
coalesce([i in stats where i.h = 3][0].p, 0) as `hop-3`,
coalesce([i in stats where i.h = 4][0].p, 0) as `hop-4`,
targetOutgoingRelationships

 

View solution in original post

10 REPLIES 10

Do you have a specific question you need assistance with? Also, is your 'match' statement pseudo code, or are you really forming the Cartesian product between each node? What are the actual node labels?

If you want to find the number of paths by hops between 500 random nodes and a single target node, could you first match to get the target node and then execute the above analysis for each random node? You could accomplish this by passing the node id's of the 500 externally derived nodes and using an 'unwind' clause on the list to analyze the paths between the target node and the current random node. Is the selection code in cypher? If so, it may be able to be incorporated in the analysis code to make the process one step instead of two.

Can you provide more details?

yes, I have rather a large biomedical graph and my goal is to map 500 or more starting nodes to a single disease. i want the starting nodes to be picked randomly while keeping the target node constant. as far as labels, i do not want to specify them as there are 100+ labels. i already know the id of the target node. the difficulty is passing the two codes as one,
so far i was using this code to return the random nodes and then map them separately.
match (n)
return n.name, n.CUI, rand() as R
order by R
limit 500.
.................
i would then manually enter the CUI of the random node to the previous code 500 times.
my intention is to have those two codes as one.

Do you have a specific target node that you want to run this for chosen a priori, or do you want to repeat the mapping process of the 500 random nodes to each the disease nodes in the database?

i have a specific target node

You mentioned you don’t want to specify labels because there are 100+ variations. Are there a few labels it can’t be, so we specify just those to avoid terminating on nodes it can’t be?

Change the ‘id’ of ‘n’ to match your target node.

match (m)
with m, rand() as R
order by R
limit 500
with collect(m) as terminatorNodes
match (n)
where id(n)=100
with n, terminatorNodes
CALL apoc.path.expandConfig(n, {
relationshipFilter:">",
minLevel: 1,
maxLevel: 4,
terminatorNodes: terminatorNodes,
uniqueness: "NODE_PATH"
}) yield path
with count(path) as number_of_paths, length(path) as number_of_hops 
with collect ({hops: number_of_hops, paths: number_of_paths}) as rows
unwind range(1, 4) as index
return index as number_of_hops, coalesce([row in rows where row.hops = index][0].paths, 0) as number_of_paths

thank you very much. apologies for giving you hard time.
i ran your code and this is what it turned; of course i get a new number of paths every time i run it.
3X_d_3_d3630f86ca22c73d033cb6c5bcd7e11fc9ca6945.png

is there a way to add the names and their CUI to the result and is there a way to not just get the result of the five hindered. as you can see, it only returns one path at a time.

i am looking for a table that looks like this


the hops number are the path counts

No worries, I need to ask questions to get us to an equal understanding. 

Try this, but make sure you review the ‘return’ clause to ensure the correct property names. 

 
match (m)
with m, rand() as R
order by R
limit 500
with collect(m) as terminatorNodes
match (n)
where id(n)=100
with n, terminatorNodes
CALL apoc.path.expandConfig(n, {
relationshipFilter:">",
minLevel: 1,
maxLevel: 4,
terminatorNodes: terminatorNodes,
uniqueness: "NODE_PATH"
}) yield path
with n, path, length(path) as hops
with n, nodes(path)[hops] as target, hops
with n, target, hops, count(*) as paths
with n, target, collect({h: hops, p: paths}) as stats
return n.name as `start node (n)`, n.cui, target.rScore as `r score`,
target.name as target_node, target.cui,
coalesce([i in stats where i.h = 1][0].p, 0) as `hop-1`,
coalesce([i in stats where i.h = 2][0].p, 0) as `hop-2`,
coalesce([i in stats where i.h = 3][0].p, 0) as `hop-3`,
coalesce([i in stats where i.h = 4][0].p, 0) as `hop-4`

thank you very much, i cant thank you enough. 

it worked to perfection. I just had to flip the starting and the target nodes. anyways, I will give you one more hard time. I also want the code to also count the number of outgoing relationships that each random node has. 

match (the randomly selected node"m")-[r]->()  return count(r) as  `count_of__outgoing-relationships_from_the_starting_node` 

You used to be able to figure that out very simply with size( (m)-[]->() ), but the use of patterns for anything but testing for the existence of a pattern has been deprecated. I have used a 'call' subquery to accomplish the same. I did not have your modified code, so I used the last version. You will need to make your modifications again...sorry.

match (m)
with m, rand() as R
order by R
limit 500
with collect(m) as terminatorNodes
match (n)
where id(n)=100
with n, terminatorNodes
CALL apoc.path.expandConfig(n, {
relationshipFilter:">",
minLevel: 1,
maxLevel: 4,
terminatorNodes: terminatorNodes,
uniqueness: "NODE_PATH"
}) yield path
with n, path, length(path) as hops
with n, nodes(path)[hops] as target, hops
with n, target, hops, count(*) as paths
with n, target, collect({h: hops, p: paths}) as stats
call{
    with target
    optional match (target)-[]->()
    return count(*) as targetOutgoingRelationships
}
return n.name as `start node (n)`, n.cui, target.rScore as `r score`,
target.name as target_node, target.cui,
coalesce([i in stats where i.h = 1][0].p, 0) as `hop-1`,
coalesce([i in stats where i.h = 2][0].p, 0) as `hop-2`,
coalesce([i in stats where i.h = 3][0].p, 0) as `hop-3`,
coalesce([i in stats where i.h = 4][0].p, 0) as `hop-4`,
targetOutgoingRelationships