Neo4j

shivaji980980 · ‎02-17-2020

I have been working on some data which is block of transactions in which addresses(identifier) of some person appears as inputs and outputs
I am trying to add a property to address node, based on it occurrence in the transactions, and the property get updated with transaction id(a unique id) based on transaction in which it appeared as input.
**The problem i am facing is, the query giving one result when executing query for each transaction manually(using where condition) and a different result when executing on whole data in single query without specifying any condition **

match (ib:block)
where ib.cl_processed = 0 
with ib order by ib.height
match (ib)<-[includedIn]-(n:tx) 
Set ib.cl_processed = 1
With n 
match (n)<-[:input]-(input:output)-[:lockedTo]->(a:address)
with n.txid as txid, collect(distinct a) as addresses
unwind addresses as address
merge(newcl:cluster{cluster_id:txid})
SET address.oldCluster = address.cluster
SET address.cluster = newcl.cluster_id
WITH address.oldCluster AS oldCluster, newcl AS newCluster, address AS addresses
	MATCH (a1:btc_address) where a1.cluster = oldCluster and a1.address <> addresses.address
    with collect(distinct a1) as atys,newCluster,oldCluster
    unwind atys as aty
    SET aty.oldCluster = aty.cluster
    Set aty.cluster = newCluster.cluster_id
return count(aty)

i am unable to figure out why the query is resulting different results.
please help me with this

david_rosenblum · ‎02-18-2020

Most likely - you are counting all the aty and need to look at distinct aty. If you look at your path perhaps aty appears n times in a one to many circumstance. And it is counting all the aty - not the distinct. Perhaps when you lock to single transaction it changes the cardinality of aty, thus the count.

shivaji980980 · ‎02-18-2020

Thank you @david.rosenblum for your early response.
But my problem is not concerned with the count(aty) , i think you have observed that i am updating cluster attribute for node(address), my problem is concerned there. Query resulting one set of values for cluster attribute when running transaction wise independently(pivoting using id of transaction, running for each transaction once) and another set of values when we run the query on whole number of transactions(without pivoting , running on all transactions on single shot)
I think you got my doubt?

andrew_bowman · ‎02-20-2020

My guess is that your returned count is potentially much higher than what you expect, and that several btc_address nodes are being modified that aren't supposed to be modified. Is that correct? Providing an example and more detail about what's going wrong will better help us troubleshoot.

Absent more specific details, I believe the problem may be in this section:

MATCH (a1:btc_address) where a1.cluster = oldCluster and a1.address <> addresses.address
    with collect(distinct a1) as atys,newCluster,oldCluster
    unwind atys as aty
    SET aty.oldCluster = aty.cluster
    Set aty.cluster = newCluster.cluster_id

as a result of:

with n.txid as txid, collect(distinct a) as addresses
unwind addresses as address

In particular I think the problem may be with this predicate: a1.address <> addresses.address.

(note also that in your WITH clause address AS addresses is discouraged...address is a single node for the row, it is not a collection, so using a plural when it's a singular node may be misleading you later in the query)

The meaning of this is: "for this line with this one specific address, ensure that the :btc_address node we lookup doesn't have that one specific address". It isn't considering the other rows present that have other addresses, so this may be matching to :btc_address nodes that you don't want to work with.

If that's the case, then you may need to collect the addresses first and use a list predicate to ensure the :btc_addresses you match to don't have any of the addresses you're working with.

shivaji980980 · ‎03-08-2020

match (ib:btc_blocks) where ib.processed = 1 and ib.height <=3
** Set ib.processed = 0 with ib order by ib.height**
match (ib)<-[includedIn]-(tx:btc_tx)
with tx order by id(tx)
match (tx)<-[:input]-(input:btc_output)-[:lockedTo]->(a:btc_address)
with tx.txid as txid, collect(distinct a) as addresses,ib.height as hei
MERGE (newcl:cluster{cluster_id : txid})
foreach (addr in addresses | MERGE (addr)-[p:partOf]->(newcl) on create set addr.moved = addr.moved+1)
WITH newcl AS newCluster,addresses,hei
unwind addresses as addr1
**match(addr1)-[y:partOf]->(oldcl:cluster) **
**where oldcl.cluster_id <> newCluster.cluster_id **
set y.markdelete=1
return addr1,oldcl

I want to add the markfordelete to the old relationship( (addr:btc_address)-[r1:partOf]-(old:cluster) whenever same address is partOf newrelationship i.e ( (addr)-[r2:partOf]-(new:cluster) where new.cluster_id <> old.cluster_id ) in this case i want to set the r1.markfordelete=1 . somehow the above query adding markfordelete to all the partOf relationships.could you please help me as it tp urgent work for me.

data i am working with above query:
block(height) | btc_tx(txid) | btc_address(address)
2 |2aa |a1
3 |3aa |a1,a2
3 |3ab |a1,a4
3 |3ac | a1,a2
3 |3ad |a9

expected output:
from above we can say that
(a1)-[r1:partOf)-(2aa) (a2)-[r5:partOf)-(3aa) (a4)-[r7:partOf)-(3ab)
(a1)-[r2:partOf)-(3aa) (a2)-[r6:partOf)-(3ac) (a9)-[r8:partOf)-(3ad)
(a1)-[r3:partOf)-(3ab)
(a1)-[r4:partOf)-(3ac)
i want to add markfordelete=1 for r1,r2,r3(a1) and r5(a2)

intouch_vivek · ‎03-09-2020

I tried one work around for you requirement. Instead of creating a property as markforDelete, I created rest of the relations saying markNoDelete
Dataset used
a,b,c
2,2aa,a1
3,3aa,a1
3,3aa,a2
3,2ab,a1
3,2ab,a4
3,2ac,a1
3,2ac,a2
3,2ad,a9

Query
match(c:C)-[rel:PART_OF]->(b:B)
with c as name, max(id(rel)) as nomark
match(c1:C)-[rel1:PART_OF]->(b1:B) where id(rel1)=nomark set rel1.markNoDelete='Active' return name,c1,rel1

shivaji980980 · ‎03-09-2020

Thank you vivek ,your response solved my problem partially, the other i am concerning about is moving addresses in one relation to other.Here is my requirement
a,b,c
2,2aa,a1 -> 1
3,3aa,a1 -> 2
3,3aa,a2 -> 3
3,2ab,a1 -> 4
3,2ab,a4 -> 5

on processing 1, (a1)-[r1:partOf]-(2aa)
on processing 2, (a1)-[r2:partOf]-(3aa)
on processing 3, (a2)-[r2:partOf]-(3aa)
on processing 4, (a1)-[r3:partOf]-(2ab) (r3.markNoDelete='Active')
now when a1 is moved to new relations i.e r3 , i want all the addresses in latest old relation of a1 i.e r2 addresses[a2] to be also added to (r3) .
(the moved attribute i'm using the query is to serve this purpose,when btc_address.moved<4, i want to the moved the addresses in oldcluster to newcluster else it will remain in oldone

intouch_vivek · ‎03-11-2020

Hi Shivaji,,

I am bit confused with the explanation.
Could to please describe

shivaji980980 · ‎03-12-2020

sorry for little messy explanation.
My doubt is about clustering

Basically, i want to cluster the btc_address into clusters where id of each cluster is transaction_id.
when btc_address is appeared in transaction then immediately that address is made partOf new cluster with occurred transaction_id as cluster_id.
Now after adding the address to the cluster, we will check for whether the current btc_address is already partOf any previous cluster. If it is partOf any other cluster say oldcluster other than current cluster(new cluster) then partOf relation current btc_address to the oldcluster is deleted and check if any other btc_addresses are partOf that oldcluster ,then all those btc_addresses relations are deleted from oldcluster and new relationships are created to new cluster.
Example:
block_height | transaction_id| btc_address -> sno
2,2aa,a1 -> 1
3,3aa,a1 -> 2
3,3aa,a2 -> 3
3,2ab,a1 -> 4
3,2ab,a3 -> 5
on processing 1, (a1)-[r1:partOf]-(2aa) is created
{address a1 is made partOf cluster(2aa)}
on processing 2, (a1)-[r2:partOf]-(3aa) is created
{address a1 is already partOf (2aa) so the relationship r1 is deleted and other addresses partOf cluser(2aa) is not there, so no migration of other address }
on processing 3, (a2)-[r3:partOf]-(3aa) is created
{ address a2 has no partOf other than the current one, so no changes}
on processing 4, (a1)-[r4:partOf]-(2ab) is created
{address a1 is already partOf cluster(3aa) which is not the new one(2ab),so relationship r2 is deleted and other addresses partOf cluster(3aa) are [a2] . so a2 partOf cluster(3aa) relation r3 is deleted and new relation r5 is created to the current cluster(2ab)
(a2)-[r5:partOf]-(2ab) is created .
on processing 5, (a3)-[r6:partOf]-(2ab) is created
{ address a3 is not partOf any previous cluster other than new one, so no changes}
after processing all the above data the final result should be like:
cluster(2aa) is empty
cluster(3aa) is empty
cluster(2ab) -->{a1,a2,a3}
Hope the information is clear

intouch_vivek · ‎03-12-2020

Ingestion
load csv with headers from 'file:///bcc.csv' as line Merge (n:TRX{name:line.b}) Merge (m:BTC{name:line.c}) Merge (m)-[rel:PART_OF]-(n)

Set property to wanted relationship
match(c:BTC)-[rel:PART_OF]->(b:TRX)
with c as name, max(id(rel)) as nomark
match(c1:BTC)-[rel1:PART_OF]->(b1:TRX) where id(rel1)=nomark set rel1.markNoDelete='Active' return name,c1,rel1

Delete unwanted relationship (cluster)
match(n:BTC)-[rel:PART_OF]->(m:TRX) where rel.markNoDelete is null detach delete rel

shivaji980980 · ‎03-12-2020

Hey, data is already loaded into machine and we are not creating clusters at load time as the data is very huge.
And also, i think your query is not concerning the other addresses of oldcluster for migrating into the current cluster.

ameyasoft · ‎03-15-2020

As each cluster has it's own cluster_id we can use that to flag the old relations correctly.
 
Get the clusters (b1) associated with an address (a1):

match (a1)-[r:partOf)-(b1)
with a1, max(b1.cluster_id) as mx

match (a1)-[r1:partOf]-(b) where b.cluster_id < mx
set r1.markfordelete=1
return a1, b

Neo4j

Order of execution of Cypher query