Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
01-10-2019 01:28 AM
Hi All,
I want to do some analysis(identify a Pattern or see how many times a person is sendering money) on the data using the graphs.
I have a table which have 5 columns.
Txn ID | Trade Number | Sender | Broker | Receiver |
---|---|---|---|---|
1 | 12345 | Bank 1 | Broker 1 | Receiver 1 |
2 | 56789 | Bank 2 | Broker 2 | Receiver 2 |
3 | 56789 | Bank 2 | Broker 2 | Receiver 2 |
4 | 98765 | Bank 2 | Broker 3 | Receiver 3 |
To create a relationships I separated it into 4 CSVs. And independently Load them in Neo4j.
Label 1:-Trade Number
Txn ID | Trade Number |
---|---|
1 | 12345 |
2 | 56789 |
3 | 56789 |
4 | 98765 |
Label 2: SenderBank
Txn ID | Trade Number | Sender |
---|---|---|
1 | 12345 | Bank 1 |
2 | 56789 | Bank 2 |
3 | 56789 | Bank 2 |
4 | 98765 | Bank 2 |
Label 3: Broker
Txn ID | Trade Number | Broker |
---|---|---|
1 | 12345 | Broker 1 |
2 | 56789 | Broker 2 |
3 | 56789 | Broker 2 |
4 | 98765 | Broker 3 |
_Label 4: Receiver
Txn ID | Trade Number | Receiver |
---|---|---|
1 | 12345 | Receiver 1 |
2 | 56789 | Receiver 2 |
3 | 56789 | Receiver 2 |
4 | 98765 | Receiver 3 |
Since for many transactions the Sender and Receiver are same except of "Txn ID" for all the data.
I just want to check with how many Brokers/ Receivers the distinct Sender Banks are having relationships and how many times(aggregations) Like for x connections x relationships popping out of the Bank.
How can I sort of show only 1 Node for "Bank 2" instead of 3 and show 3 relationships going to 3 Brokers with only 1 Node for "Broler 2" Insted of 2. I can perform aggregations result as rows but I want to see the bigger picture.
Note:-
I tried using
apoc.refractor.mergeNodes
which actually merges
all 4 Sender nodes into 2 nodes,
all 4 Broker Nodes into 3
all 4 Receiver Nodes into 3
and multiple relationships pinting towards respective nodes.
But it also diminished the "Txn Id" property of (:SENDER) nodes or all that I chose to merge.
And I used it on my current data of 64k records. It is super slow after all merges.
Please Help if anything can be done or what am i doing wrong.
Solved! Go to Solution.
01-16-2019 02:13 AM
This should give you your summary if you nodes are distinct. Just a regular Cypher aggregation.
MATCH (se:SenderEntity)-[:BROKER]->(cp:CounterParty)
RETURN se, cp, apoc.create.vRelationship(se, 'ALL_BROKER', {count:count(*)}, cp) as rel
If only the names are distinct:
MATCH (se:SenderEntity)-[:BROKER]->(cp:CounterParty)
WITH se.senderEntityName as sender, cp.counterPartyName as counter, count(*) as count
WITH collect(distinct sender) as senders, collect(distinct counter) as counters, collect([sender, counter, count]) as rels
WITH apoc.map.groupBy([seName IN senders | apoc.create.vNode(['SenderEntity'],{name:seName})], 'name') as senders,
apoc.map.groupBy([cpName IN counters | apoc.create.vNode(['CounterParty'],{name:cpName})], 'name') as counters, rels
UNWIND rels as rel
RETURN senders[rel[0]], counters[rel[1]], apoc.create.vRelationship(senders[rel[0]], 'ALL_BROKER', {count:rel[2]}, counters[rel[1]]) as rel
01-10-2019 03:50 AM
You can just use aggregation, with count or sum.
And to visualize it as graph not as tables you can use virtual nodes with apoc, I wrote it up here.
01-15-2019 11:26 PM
Hi Michael,
Thank a lot. That is actually a huge information so worthy of calling a blog.
But for someone like me who is pure novice doing it for sake of PoC, its hard to comprehend.
I did check virtual nodes in APOC libraries, but i fail to verify if it can suit my use case.
MATCH (a)-[r]->(b)
WITH head(labels(a)) AS l, head(labels(b)) AS l2, type(r) AS rel_type, count(*) as count
CALL apoc.create.vNode([l],{name:l}) yield node as a
CALL apoc.create.vNode([l2],{name:l2}) yield node as b
CALL apoc.create.vRelationship(a,rel_type,{count:count},b) yield rel
RETURN *;
In the above documentation provided example they are selecting all the different labels and creating vNode/vRelationships for count of different types of relationships they desire. which is perfect to eyeball what connects with what.
However, my goal is to create different nodes of all distinct values in 1 LABEL and then see the count of relationships of those particular type of value with other nodes.
Like, assuming in LABEL :Person there are 10 records/nodes with Name property as "Jatin".
And "Jatin" has [:ACTED_IN] with 3 times in "Movie 1", 4 times in "Movie 2", 3 times in "Movie 3" from LABEL. :Movie.
The use case is to see:
1 node of "Jatin" with 3 relationships (count as a property) with 3 nodes of "Movie 1", "Movie 2", and "Movie 3".
OR
1 node of "Jatin" and 10 relationships protruding with 3 relationships for "Movie 1", 4 relationships to "Movie 2" and 3 relationships to "Movie 3".
Apart from that I also tried Grouping methods that you mentioned.
But it is not even returning same result as the documentation example.
match (n) set n.century = toInteger(coalesce(n.born,n.released)/100) * 100;
call apoc.nodes.group(['Person','Movie'],['century']);
NOTE:- The dataset used is Movie Graph provided by default.
Please help me with your expertise if above mentioned use case is possible.
Thank you
01-16-2019 01:38 AM
HI,
Also one more thing Michael
In the following
Label :SenderEntity nodes(13k with 151 distinct) have senderEntityName as property.
Label :CounterParty nodes(13k as well) have counterPartyName as property.
The Relationship schema is (SenderEntity)-[:BROKER]->(CounterParty)
Like my original goal is to show which ones and how many times a counterparty is attached to each senderEntity.
I tried the Grouping method in APOC.
call apoc.nodes.group(['SenderEntity','CounterParty'],['senderEntityName','counterPartyName']) yield nodes, relationships
UNWIND nodes as node
UNWIND relationships as rel
RETURN node, rel;
It does give 151 distinct sender nodes with count in each nodes. but it does not give any counterparty nodes or the relationships with it.
01-16-2019 02:13 AM
This should give you your summary if you nodes are distinct. Just a regular Cypher aggregation.
MATCH (se:SenderEntity)-[:BROKER]->(cp:CounterParty)
RETURN se, cp, apoc.create.vRelationship(se, 'ALL_BROKER', {count:count(*)}, cp) as rel
If only the names are distinct:
MATCH (se:SenderEntity)-[:BROKER]->(cp:CounterParty)
WITH se.senderEntityName as sender, cp.counterPartyName as counter, count(*) as count
WITH collect(distinct sender) as senders, collect(distinct counter) as counters, collect([sender, counter, count]) as rels
WITH apoc.map.groupBy([seName IN senders | apoc.create.vNode(['SenderEntity'],{name:seName})], 'name') as senders,
apoc.map.groupBy([cpName IN counters | apoc.create.vNode(['CounterParty'],{name:cpName})], 'name') as counters, rels
UNWIND rels as rel
RETURN senders[rel[0]], counters[rel[1]], apoc.create.vRelationship(senders[rel[0]], 'ALL_BROKER', {count:rel[2]}, counters[rel[1]]) as rel
01-16-2019 02:14 AM
The grouping should actually do exactly what you want.
You might need to add an tempoary "name" property to both node types.
01-16-2019 06:48 AM
Hi Michael,
It is giving this error:-
Neo.ClientError.Statement.SyntaxError: Invalid input 'p': expected 'n/N' (line 8, column 2 (offset: 653))
"apoc.map.groupBy(senders, 'name') as senders, " ^
01-16-2019 07:53 AM
fixed teh copy & paste error
01-17-2019 02:13 AM
I have extended my query to more levels and have included more labels.
Tons of thanks Michael. You have been of great help.
I will make your answer as solution.
God Bless you
All the sessions of the conference are now available online