Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
08-23-2021 08:33 AM
Hello community!
I have User and Group nodes. A user can be a member of any number of groups (or not a member of any) with a directed relationship IN_GROUP.
I want to find all users who are members of the same set of groups, create a separate Сluster node for them, create an IN_CLUSTER relationship between them and this Cluster node, and also create a RELATED relationship between the cluster and groups of these users.
Below are some screenshots of what I need:
I have users, each of which is in a specific set of groups:
As you can see, User_1, User_2 and User_3 have the same set of groups they belong to (Group_1, Group_2 and Group_3) - this is the first cluster. User_4 belongs to all groups - this is the second cluster. And the User_5 belongs to only one group - Group_5 - this is the third cluster.
Here's what we get:
Now we connect the clusters with users groups:
This is what I want to end up with:
I have some code that does the job, but its timing is unacceptable.
MATCH (u:User)
WITH [(u)-[:IN_GROUP]->(g:Group) | g] as groups, u
WITH apoc.coll.sortNodes(groups, "name") as groups, u
WITH apoc.util.md5(groups) as cluster_hash, groups, u
MERGE (c: Cluster {hash: cluster_hash})
CREATE (u)-[:IN_CLUSTER]->(c)
FOREACH (group IN groups |
MERGE (c)-[:RELATED]->(group))
On my dataset (several hundred thousand users and the same number of groups), this takes about 30 minutes to complete. I need a result in 5 seconds.
I'm able to use the apoc library.
Here's a cipher that creates a test data set from the above example:
CREATE (u1:User {name:"User_1"})
CREATE (u2:User {name:"User_2"})
CREATE (u3:User {name:"User_3"})
CREATE (u4:User {name:"User_4"})
CREATE (u5:User {name:"User_5"})
CREATE (g1:Group {name:"Group_1"})
CREATE (g2:Group {name:"Group_2"})
CREATE (g3:Group {name:"Group_3"})
CREATE (g4:Group {name:"Group_4"})
CREATE (g5:Group {name:"Group_5"})
MERGE (u1)-[:IN_GROUP]->(g1)
MERGE (u1)-[:IN_GROUP]->(g2)
MERGE (u1)-[:IN_GROUP]->(g3)
MERGE (u2)-[:IN_GROUP]->(g1)
MERGE (u2)-[:IN_GROUP]->(g2)
MERGE (u2)-[:IN_GROUP]->(g3)
MERGE (u3)-[:IN_GROUP]->(g1)
MERGE (u3)-[:IN_GROUP]->(g2)
MERGE (u3)-[:IN_GROUP]->(g3)
MERGE (u4)-[:IN_GROUP]->(g1)
MERGE (u4)-[:IN_GROUP]->(g2)
MERGE (u4)-[:IN_GROUP]->(g3)
MERGE (u4)-[:IN_GROUP]->(g4)
MERGE (u4)-[:IN_GROUP]->(g5)
MERGE (u5)-[:IN_GROUP]->(g5)
RETURN u1, u2, u3, u4, u5, g1, g2, g3, g4, g5
Neo4j version: 4.3.3
08-24-2021 10:13 AM
Hi @baturin.egor !
Can you try this small modification? It's hard to measure how much it helps on your complete db.
MATCH (u:User)-[:IN_GROUP]->(g:Group)
WITH collect(g) as groups, u
WITH distinct apoc.coll.sortNodes(groups, "name") as groups, collect(u) as users
WITH apoc.util.md5(groups) as cluster_hash, groups, users
MERGE (c: Cluster {hash: cluster_hash})
FOREACH (group IN groups |
MERGE (c)-[:RELATED]->(group))
FOREACH (u IN users |
CREATE (u)-[:IN_CLUSTER]->(c))
Lemme know if it helps a bit. Btw, Not sure if you can change turn those MERGE into CREATE as well
Bennu
08-24-2021 11:24 AM
Try this:
match (a:User)-[]-(b:Group)
with distinct id(a) as ID, collect(distinct id(b)) as grps
with distinct grps as n1, size(grps) as cnt order by cnt desc
match (d:Group) where id(d) in n1
with d, n1, cnt
merge (c:Cluster {name: ("Cluster" + " " + cnt)})
merge (c)-[:RELATED]->(d)
return c, d
Result:
All the sessions of the conference are now available online