cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Optimize getting all incoming and outgoing edges for each node and computing the difference in amounts

Hello, I am trying to solve the following problem:

I have transaction data between Accounts (nodes) with each :SENT transaction as a directional relationship between nodes. I would like to compute the balance of each Account by taking the difference of the sums of all incoming and outgoing transactions, but I am running into issues with how long it is taking to run my query. Please bear in mind that I am a Cypher newbie so don't really know what I'm doing 🙂

The Account node is the only type of node in the database and the SENT relationship is the only type of relationship. The database is not very large, about 50,000 nodes and 70,000 relationships.

My current query is as follows:

MATCH (n)
WITH n
MATCH ()-[incoming]->(l) WHERE l.id = n.id
WITH n, incoming
WITH n, sum(incoming.amount) as sumIn
MATCH (m) -[outgoing]-> () WHERE m.id = n.id
WITH n, outgoing, sumIn
WITH n, sumIn, sum(outgoing.amount) as sumOut
SET n.balance = sumIn - sumOut

But the runtime of this query was more than 80 minutes, which seems quite strange for a relatively small dataset.

First I tried a query like this:

MATCH () -[incoming]-> (n) -[outgoing]-> () 
WITH n, incoming, outgoing
return n.id, count(distinct incoming), count(distinct outgoing), sum(distinct incoming.amount) - sum(distinct outgoing.amount) as balance

But while count() in this case returns the correct number of incoming and outgoing transactions, I have not been able to sum the amounts.
When I run the MATCH query separately as
MATCH () -[incoming]-> (n) and
MATCH (n) -[outgoing-> () I get the correct counts and sums, but as soon as I try to run it together like
MATCH () -[incoming]-> (n) -[outgoing]-> () it returns a much larger number for both incoming and outgoing and it's the same for both.

That prompted me to try the following approach:

MATCH (n)
WITH n, (n)-[:SENT]->() as outgoing, (n)<-[:SENT]-() as incoming
return n.id,count(s),count(r)

Which gets the correct counts, but since outgoing and incoming here are paths, not relationships, I don't know how to access the amount property in this case to sum and calculate the balance.

This led me to the current approach which I listed first, to match all nodes, then match and sum all incoming edges for each node, then match and sum all outgoing edges for each node and set the balance. But the runtime of this is incredibly long.

What am I missing or overthinking? I feel like there must be a better way to accomplish what seems like a relatively common and simple task. Thanks for any input!

1 ACCEPTED SOLUTION

Be aware that UNION ALL should be used instead of UNION because UNION will remove duplicates.

View solution in original post

23 REPLIES 23

Hello @tomas.vrba

This is a solution:

MATCH (n)
CALL {
    WITH n MATCH (n)-[r]->() RETURN count(r) AS occurrences
    UNION
    WITH n MATCH (n)<-[r]-() RETURN count(r) AS occurrences
}
WITH n, collect(occurrences) AS values
SET n.balance = values[0] - values[1]

To speed-up this query, you can do several things if you are not working on all nodes of the database:

  • specify a label or a relation type
  • specify a UNIQUE CONSTRAINT and use the index created in a WHERE clause

Regards,
Cobra

Fantastic, thanks so much!
This set me on the right path, what I needed were the sums, and I also had to do UNION ALL since some accounts have exactly the same amounts for transactions going in and out, so that wasn't showing up.
But the following query worked with much improved runtime (622ms as opposed to 80 minutes 😄 )

MATCH (n)
CALL {
    WITH n MATCH (n)-[r]->() RETURN sum(r.amount) AS total
    UNION ALL
    WITH n MATCH (n)<-[r]-() RETURN sum(r.amount) AS total
}
WITH n, collect(total) AS values
WITH n,
CASE WHEN values[0] IS NOT NULL THEN values[0] ELSE 0 END AS totalOut,
CASE WHEN values[1] IS NOT NULL THEN values[1] ELSE 0 END AS totalIn
SET n.balance = totalIn-totalOut

The logic in the original post is a little off in the case of the first matches not returning a result. In this case, values is going to be single element list with a value representing the incoming count. The logic as is, will map the single value to the totalOut count and totalIn will be zero. this could be fixed with a combination of optional match and coalesce.

In your case, you are returning the sum resulting from the match. From my testing, the sum on a null result will produce a value of zero. As such, you will always get a values list with two values; therefore, I believe you can remove the CASE statements and eliminate the incorrect logic, as it will never be executed.

I will let @Cobra address your other issue.

I think I am going to append my previous statement. Since you are returning 'w' in your return statement, you will not always get a result.

I see you changed the code to use an optional match, so you have it corrected. You should be able to remove the case statements though, as you should always get a list of three elements. For the case of no match, the 'w' value will be null and the 'sum' will be zero.

Be aware that UNION ALL should be used instead of UNION because UNION will remove duplicates.

Awesome, thank you!
This seems to work and is very fast.

Hi all,

I have a similar problem. I have a dataset with customers and edges to orders where I try to count all edges from customers to orders.

MATCH (c:Customer)
SIZE([(c)--(o:Order) | o]) AS amount
RETURN min(amount), max(amount), avg(amount)

 This query works fine, but I think I could speed it up. The property 'customerID' on vertices labelled Customer is a key and is indexed, yet I cannot get my head around how to rephrase the query to make use of this index.

Thanks for any help, much appreciated.

Best,

Philipp

Hi,

Apologies, I forgot the WITH after MATCH clause.

Still trying to get my head around how to make use of the index for this query.

Thanks for any help with it.

Philipp

Hello, I am trying to reproduce this query for a very similar problem.
The nodes are accounts, that have addresses, and the edges are transactions with some metadata.
The result is always NULL for the balance for each address.

Here is the following query:

MATCH (w:Wallet) - [tx:SENT_TO {at: "at"}] - (Wallet)
CALL {
    WITH w MATCH (w)-[tx:SENT_TO {at: "at"}]->(Wallet)
    RETURN SUM(tx.value) AS total
    UNION ALL
    WITH w MATCH (w)<-[tx:SENT_TO {at: "at"}]-(Wallet) 
    RETURN SUM(tx.value) AS total
}
WITH w, collect(total) AS values
WITH w,
CASE WHEN values[0] IS NOT NULL THEN values[0] ELSE 0 END AS totalOut,
CASE WHEN values[1] IS NOT NULL THEN values[1] ELSE 0 END AS totalIn
RETURN w.address, totalIn, totalOut

Hello @milan.keca

I created a little dataset and tested your query and everything worked.
Can you share your dataset?
Which version of Neo4j are you using?

Regards,
Cobra

Hello, thanks for testing it out.

Version 4.

Here is the dataset:

[
  {"from": 0, "to": 1, "value": 2000, "at": "at"},
  {"from": 0, "to": 2, "value": 5000, "at": "at"},
  {"from": 1, "to": 5, "value": 1000, "at": "at"},
  {"from": 2, "to": 4, "value": 1000, "at": "at"},
  {"from": 4, "to": 7, "value": 1000, "at": "at"},
  {"from": 5, "to": 12, "value": 200, "at": "at"},
  {"from": 2, "to": 1, "value": 500, "at": "at"}
]

The expected balances are:
0 - -7000
1 - 1500
2 - 3500
4 - 0
5 - 800
7 - 1000
12 - 200

I am getting 0s for all (totalIn - totalOut), meaning totalIn and totalOut are always the same.

I have managed to do it with the following query, but it's slower and returns duplicates:

MATCH (w:Wallet) - [tx:SENT_TO {at: "at"}] - (Wallet)
CALL {
    WITH w MATCH (w) <- [tx:SENT_TO {at: "at"}] - (Wallet)
    RETURN SUM(tx.value) AS value_in
}
CALL {
    WITH w MATCH (w) - [tx:SENT_TO {at: "at"}] -> (Wallet)
    RETURN SUM(tx.value) AS value_out
}
RETURN w.address, value_in, value_out

This query works on my side (Neo4j Enterprise 4.4.5):

CALL {
    MATCH (w:Wallet)-[tx:SENT_TO {at: "at"}]-()
    RETURN w, sum(tx.value) AS total
    UNION
    MATCH (w:Wallet)-[tx:SENT_TO {at: "at"}]->()
    RETURN w, sum(tx.value) AS total
    UNION
    MATCH (w:Wallet)<-[tx:SENT_TO {at: "at"}]-() 
    RETURN w, sum(tx.value) AS total
}
WITH w.address AS address, collect(total) AS values
RETURN 
    address,
    CASE WHEN values[0] IS NOT NULL THEN values[0] ELSE 0 END AS total,
    CASE WHEN values[1] IS NOT NULL THEN values[1] ELSE 0 END AS totalOut,
    CASE WHEN values[2] IS NOT NULL THEN values[2] ELSE 0 END AS totalIn

Hmm, still getting wrong numbers. My version is 4.4-aura, enterprise.

This is the data:

╒═════════════╤══════════════════════════════════════════════════════════════════════╤══════════════╕
│"w"          │"tx"                                                                  │"w2"          │
╞═════════════╪══════════════════════════════════════════════════════════════════════╪══════════════╡
│{"address":0}│{"blockchain":"Ethereum","block_number":1,"tx_hash":"1","value":2000,"│{"address":1} │
│             │coin":"test"}                                                         │              │
├─────────────┼──────────────────────────────────────────────────────────────────────┼──────────────┤
│{"address":0}│{"blockchain":"Ethereum","block_number":1,"tx_hash":"2","value":5000,"│{"address":2} │
│             │coin":"test"}                                                         │              │
├─────────────┼──────────────────────────────────────────────────────────────────────┼──────────────┤
│{"address":1}│{"blockchain":"Ethereum","block_number":1,"tx_hash":"3","value":1000,"│{"address":5} │
│             │coin":"test"}                                                         │              │
├─────────────┼──────────────────────────────────────────────────────────────────────┼──────────────┤
│{"address":2}│{"blockchain":"Ethereum","block_number":1,"tx_hash":"4","value":1000,"│{"address":4} │
│             │coin":"test"}                                                         │              │
├─────────────┼──────────────────────────────────────────────────────────────────────┼──────────────┤
│{"address":4}│{"blockchain":"Ethereum","block_number":1,"tx_hash":"5","value":1000,"│{"address":7} │
│             │coin":"test"}                                                         │              │
├─────────────┼──────────────────────────────────────────────────────────────────────┼──────────────┤
│{"address":5}│{"blockchain":"Ethereum","block_number":1,"tx_hash":"6","value":200,"c│{"address":12}│
│             │oin":"test"}                                                          │              │
├─────────────┼──────────────────────────────────────────────────────────────────────┼──────────────┤
│{"address":2}│{"blockchain":"Ethereum","block_number":1,"tx_hash":"7","value":500,"c│{"address":1} │
│             │oin":"test"}                                                          │              │

This is the query (your last query)

CALL {
    MATCH (w:Wallet)-[tx:SENT_TO {coin: "test"}]-()
    RETURN w, sum(tx.value) AS total
    UNION
    MATCH (w:Wallet)-[tx:SENT_TO {coin: "test"}]->()
    RETURN w, sum(tx.value) AS total
    UNION
    MATCH (w:Wallet)<-[tx:SENT_TO {coin: "test"}]-() 
    RETURN w, sum(tx.value) AS total
}
WITH w.address AS address, collect(total) AS values
RETURN 
    address,
    CASE WHEN values[0] IS NOT NULL THEN values[0] ELSE 0 END AS total,
    CASE WHEN values[1] IS NOT NULL THEN values[1] ELSE 0 END AS totalOut,
    CASE WHEN values[2] IS NOT NULL THEN values[2] ELSE 0 END AS totalIn

This is the result:

╒═════════╤═══════╤══════════╤═════════╕
│"address"│"total"│"totalOut"│"totalIn"│
╞═════════╪═══════╪══════════╪═════════╡
│0        │7000   │0         │0        │
├─────────┼───────┼──────────┼─────────┤
│1        │3500   │1000      │2500     │
├─────────┼───────┼──────────┼─────────┤
│2        │6500   │1500      │5000     │
├─────────┼───────┼──────────┼─────────┤
│5        │1200   │200       │1000     │
├─────────┼───────┼──────────┼─────────┤
│4        │2000   │1000      │0        │
├─────────┼───────┼──────────┼─────────┤
│7        │1000   │0         │0        │
├─────────┼───────┼──────────┼─────────┤
│12       │200    │0         │0        

Is there anything I am missing?

I have this result and it's good for me so I don't understand what is your issue

I have the same results, but those results are not accurate. If you follow through the transactions, it should be:

address     totalIn     totalOut
0                0               7000   
1                2500        1000
2                5000        1500
4                1000        1000
5                1000        200
7                1000.       0
12              200          0  

My bad, it should be UNION ALL instead of UNION in the query:

CALL {
    MATCH (w:Wallet)-[tx:SENT_TO {at: "at"}]-()
    RETURN w, sum(tx.value) AS total
    UNION ALL
    MATCH (w:Wallet)-[tx:SENT_TO {at: "at"}]->()
    RETURN w, sum(tx.value) AS total
    UNION ALL
    MATCH (w:Wallet)<-[tx:SENT_TO {at: "at"}]-() 
    RETURN w, sum(tx.value) AS total
}
WITH w.address AS address, collect(total) AS values
RETURN 
    address,
    CASE WHEN values[0] IS NOT NULL THEN values[0] ELSE 0 END AS total,
    CASE WHEN values[1] IS NOT NULL THEN values[1] ELSE 0 END AS totalOut,
    CASE WHEN values[2] IS NOT NULL THEN values[2] ELSE 0 END AS totalIn

Actually I just noticed an issue.
If you look at the result for addresses 7 and 12.
They have totalOut 1000, and 200 respectively and 0 totalIn, it should be the other way around.

Can't get my head around that one.

We are never sure to have 3 values in the values list so we have to change a bit the query to always have 3 values:

CALL {
    MATCH (w:Wallet)
    OPTIONAL MATCH (w)-[tx:SENT_TO {at: "at"}]-()
    RETURN w, sum(tx.value) AS total
    UNION ALL
    MATCH (w:Wallet)
    OPTIONAL MATCH (w)-[tx:SENT_TO {at: "at"}]->()
    RETURN w, sum(tx.value) AS total
    UNION ALL
    MATCH (w:Wallet)
    OPTIONAL MATCH (w)<-[tx:SENT_TO {at: "at"}]-() 
    RETURN w, sum(tx.value) AS total
}
WITH w.address AS address, collect(total) AS values
RETURN 
    address,
    CASE WHEN values[0] IS NOT NULL THEN values[0] ELSE 0 END AS total,
    CASE WHEN values[1] IS NOT NULL THEN values[1] ELSE 0 END AS totalOut,
    CASE WHEN values[2] IS NOT NULL THEN values[2] ELSE 0 END AS totalIn

Ah yeah, I was just debugging and figured out it has to do something with not having out txs, probably can achieve something similar with reordering the MATCHES in CALL, because of the domain conditions, but thanks.

You helped us a lot.

The only issue with this approach is that it matches all txs, not just txs {at: "at"}, so a lot of 0s are there.
Is there a way to keep the optional match, but filter out wallets with no txs?

Some good catches by @glilienfield

This query should be the right one:

CALL {
    MATCH (w:Wallet)
    OPTIONAL MATCH (w)-[tx:SENT_TO {at: "at"}]-()
    RETURN w, sum(tx.value) AS total
    UNION ALL
    MATCH (w:Wallet)
    OPTIONAL MATCH (w)-[tx:SENT_TO {at: "at"}]->()
    RETURN w, sum(tx.value) AS total
    UNION ALL
    MATCH (w:Wallet)
    OPTIONAL MATCH (w)<-[tx:SENT_TO {at: "at"}]-() 
    RETURN w, sum(tx.value) AS total
}
WITH w.address AS address, collect(total) AS values
WHERE values[0] > 0
RETURN address, values[0] AS total, values[1] AS totalOut, values[2] AS totalIn

That looks much better. I would make one suggest, which is to move the match(w) outside the union query. It is the same query repeated three times.

More importantly, you are assuming each query will return the same results in the same order, so line n in each result corresponds to the same Wallet node. Theoretically it would be possible in a multi user environment that a new Wallet node is added during the execution of this query and now it is possible the three match(w) queries in the union do not return the same results. I know this is probably very unlikely, but it seems like it could be possible.

MATCH (w:Wallet)
CALL {
    WITH w
    OPTIONAL MATCH (w)-[tx:SENT_TO {at: "at"}]-()
    RETURN sum(tx.value) AS total
    UNION ALL
    WITH w
    OPTIONAL MATCH (w)-[tx:SENT_TO {at: "at"}]->()
    RETURN sum(tx.value) AS total
    UNION ALL
    WITH w
    OPTIONAL MATCH (w)<-[tx:SENT_TO {at: "at"}]-() 
    RETURN sum(tx.value) AS total
}
WITH w.address AS address, collect(total) AS values
WHERE values[0] > 0
RETURN address, values[0] AS total, values[1] AS totalOut, values[2] AS totalIn