Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
05-23-2020 06:57 AM
I’m performing an analysis of token distributions in Ethereum using Neo4j and Cypher. In one of my use cases I need to find clusters. For example, if a token transfer for a particular token was made to 5 different accounts and then these 5 accounts resent all of these tokens to one particular account I need to find those 6 accounts.
The token address needs to be the same in both hops so I start my query by getting all of the token addresses for a particular transfer type. There are four transfer types, and for each transfer type there is a different number of distinct token addresses. This table shows how many transfers and token addresses I have per transfer type:
These are my specs and configuration settings: Neo4j Version is 3.5.16 Community, Database size is 77GB, 8 CPU Cores. 128 GB RAM (only 95 available for Neo).
dbms.memory.heap.initial_size=32g
dbms.memory.heap.max_size=32g
dbms.memory.pagecache.size=55g
And this is my query for the Airdrops transfer type:
MATCH (at:Transfer {eventName:'Airdrop'})
WITH collect(DISTINCT at.address) as tokenAddresses
UNWIND tokenAddresses as tokenAddress
MATCH (t1:Transfer {eventName:'Airdrop',address:tokenAddress})-[:sent_to]->(a2:active)<-[:sent_from]-(t2:Transfer
{address:tokenAddress})-[:sent_to]->(a3:account)
USING INDEX t1:Transfer(eventName,address)
USING INDEX t2:Transfer(address)
WITH a3.accountId as a3Account, collect(DISTINCT a2.accountId) AS a2Accounts, count(DISTINCT a2) AS
a2count,collect(DISTINCT t2.address) as tokenAddress
WHERE a2count>=2 AND a2count <=1000
RETURN a2Accounts,a3Account,tokenAddress,a2count+1 AS clusterCount
ORDER BY clusterCount DESC;
So when the transfer type is Airdrop or Distr. my query runs and finishes in 4 seconds and 10 minutes respectively. When the transfer type is Mint it just never ends or just hangs after a while. I haven’t even tried this with Transfer type=Transfer as I know it won’t run. As you can see both accounts and transfers are Nodes, this way I can use indexes on the transfers. Any thoughts or ideas would be greatly and very much appreciated. Thank you.
05-23-2020 07:05 AM
Hello @andrespra and welcome to the Neo4j community
Just to know, did you use UNIQUE CONSTRAINTS on all your nodes?
Regards,
Cobra
05-23-2020 07:19 AM
Thank you for your quick reply. I have a Uniqueness constraint on the account nodes for the "accountId" property. Transfers Nodes also have a unique id on the property "TransferId'. That being said, I don't use these indexes on my query as I don't query accounts or transfers by their id. I query them by their "eventName + tokenAddress" properties or just by their "tokenAddress" property, and in nboth cases I'm using indexes as it can be seen on my query after the second MATCH statement:
USING INDEX t1:Transfer(eventName,address)
USING INDEX t2:Transfer(address)
05-23-2020 07:21 AM
I see, no problem
Try to add UNIQUE CONSTRAINTS also on others Label node like Transfer, it will speed-up your query even if you don't use them, Neo4j will use them
Regards,
Cobra
05-23-2020 08:20 AM
Ok, actually :Transfer(transferId) was a regular index, not a uniqueness constraint, unlike :account(accountId). I dropped the index, created the index again as a uniqueness constraint and launched my queries again but my issue remains. The problem starts as the amount of nodes I have to check increases. Any thoughts on my memory configuration? I ran Memrec and with my available RAM I tried to match the given values as much as I could. Do you think I need more resources for this? The thing is that the graph is only 77gb in size so I think it should work ok.
05-23-2020 08:23 AM
Ok good:)
dbms.memory.heap.initial_size=32g
dbms.memory.heap.max_size=32g
You can let heap.initial_size
at the minimum (I think it's 500Mo or something) and maybe increase heap.max_size
if you can:)
And I never touched dbms.memory.pagecache.size
so I can't advice you on this one
Regards,
Cobra
All the sessions of the conference are now available online