Neo4j

ashifshereef2k · ‎08-12-2020

I have a binary tree structure that I am using to find the aggregate sum of values stored in each node. I achieve it as follows.

MATCH (d1:Distributor)-[*]->(d2:Distributor)
with d1, sum (d2.counter) AS tot_counter 
RETURN d1.cid, tot_counter

There is a property called counter stored as a property of each node. So, while running the above query, I can group together the aggregate sum of the counter value of each branch. For instance, imagine the node 1, which has two children, 2 and 3. So the query should aggregate a sum of the branches of 1, ie...2+3 and return the same, and has to repeat this recursively to go the entire length of the graph.

For instance, it will return,

There are two situations in which I am facing problems.

I need control over the node from where I need to start executing. For instance, I need the query to find the aggregate for all paths starting from 3. I tried the following query, but it doesn't do the work.

MATCH (d1:Distributor)-[*]->(d2:Distributor)
WITH d1, sum (d2.counter) AS tot_counter WHERE d1.cid="1"
RETURN d1.cid, tot_counter

This query only returns the aggregate sum of one node, I need the path to start from 1. Not to limit to 1.

Secondly, I want a mechanism by which I can store this aggregate sum of the branch's counter value obtained this way to be stored in another property called counter_sum of each node.

Is there any way I can achieve this?

tony_chiboucas · ‎08-13-2020

Well, those cid fields should probably be stored as numbers instead of strings. Especially if they are numeric integer identities. The following will work for you... as long as you don't have a random non-numeric thrown into your cid fields...

MATCH (d1:Distributor)-[*]->(d2:Distributor)
WITH d1, sum (d2.counter) AS tot_counter
WHERE toInteger(d1.cid) >= 1
RETURN d1.cid, tot_counter

Alternatively, and significantly less efficient, would be to exclude the nodes you don't want.

WHERE d1.cid != "0" AND d1.cid !="1"

Better yet, might be to create alternate Labels that only include the nodes you do want to analyze.

MATCH (d:Distributor)
WHERE toInteger(d1.cid) >= 1
SET d :DistTree
;

# then you don't have to worry about the nodes you don't care about
MATCH (d1:DistTree)-[*]->(d2:DistTree)
with d1, sum (d2.counter) AS tot_counter 
RETURN d1.cid, tot_counter

Last note, I'd strongly advise adjusting the cid property as indexed, and with a unique constraint, and only ever put integers there. Everything will run much faster that way.

MATCH (d:Distributor)
SET d.cid = toInteger(d1.cid);

CREATE CONSTRAINT distributor_cid
ON (d:Distributor)
ASSERT d.cid IS UNIQUE;

CREATE INDEX distributor-tree FOR (d:Distributor) ON (d.cid);

Neo4j

How to start a variable length query from a specific node