cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Find same group of people working for the same group of companies

First I'm rather new to graph, so please bear with me.

I want to find groups people working for groups of companies.
Or in which companies work the same employees?
Let me give an example. Below are the people working for several companies.

The final result I want is:

"companies"                          "collect(p.name)" 
--------------------------------------------------------------------------------------------------------------------------------
["Company A","Company B"]            ["Kreuk","Snijder","Hordijk","Sepers","Vrijhof","Zwijnenberg"]                                     
["Company A","Company B","Company C"]["Kreuk","Snijder","Hordijk","Sepers","Zwijnenberg"]
["Company A","Company C"]            ["Kreuk","Snijder","Hagedoorn", "Hordijk","Sepers","Zwijnenberg"]                                     
["Company B","Company C"]            ["Berden","Kreuk","Snijder","Hordijk","Sepers","Zwijnenberg"]

The data set
The code to create this example set is at the bottom.

Best effort
The quey below is my best effort, however the result is not good, because all the people that work in the company A, B, C work also together in Company A, B along with Vrijhof.

MATCH (c:Company)<-[f1:WORKS_FOR]-(p:Person)
WITH p, apoc.coll.sort(collect(c.name)) AS companies
WHERE SIZE(companies) > 1
RETURN companies, collect(p.name)
ORDER BY companies
"companies"                          "collect(p.name)"                                   
["Company A","Company B"]            ["Vrijhof"]                                         
["Company A","Company B","Company C"]["Kreuk","Snijder","Hordijk","Sepers","Zwijnenberg"]
["Company A","Company C"]            ["Hagedoorn"]                                       
["Company B","Company C"]            ["Berden"]

So far I managed

List for which companies the employees work.

MATCH (c:Company)<-[f1:WORKS_FOR]-(p:Person)
return p.name AS name, apoc.coll.sort(collect(c.name)) AS companies
ORDER BY name

"name"       "companies"   
---------------------------------------------                       
"Berden"     ["Company B","Company C"]            
"Dobbelaar"  ["Company A"]                        
"Hagedoorn"  ["Company A","Company C"]            
"Hordijk"    ["Company A","Company B","Company C"]
"Kreuk"      ["Company A","Company B","Company C"]
"Sepers"     ["Company A","Company B","Company C"]
"Snijder"    ["Company A","Company B","Company C"]
"Ultee"      ["Company C"]                        
"Vrijhof"    ["Company A","Company B"]            
"Zwijnenberg"["Company A","Company B","Company C"]

List of employees for each company

MATCH (c:Company)<-[f1:WORKS_FOR]-(p:Person)
return c.name AS company, apoc.coll.sort(collect(p.name)) AS employee
ORDER BY company
"company"   "employee"                                                            
"Company A" ["Dobbelaar","Hagedoorn","Hordijk","Kreuk","Sepers","Snijder","Vrijhof","Zwijnenberg"]                                                      
"Company B" ["Berden","Hordijk","Kreuk","Sepers","Snijder","Vrijhof","Zwijnenberg"]                                                                     
"Company C" ["Berden","Hagedoorn","Hordijk","Kreuk","Sepers","Snijder","Ultee","Zwijnenberg"]

The dataset

CREATE (Kreuk:Person {name: "Kreuk"})
CREATE (Hordijk:Person {name: "Hordijk"})
CREATE (Sepers:Person {name: "Sepers"})
CREATE (Zwijnenberg:Person {name: "Zwijnenberg"})
CREATE (Snijder:Person {name: "Snijder"})
CREATE (Hagedoorn:Person {name: "Hagedoorn"})
CREATE (Vrijhof:Person {name: "Vrijhof"})
CREATE (Berden:Person {name: "Berden"})
CREATE (Ultee:Person {name: "Ultee"})
CREATE (Dobbelaar:Person {name: "Dobbelaar"})

CREATE (companyA:Company {name: "Company A"})
CREATE (companyB:Company {name: "Company B"})
CREATE (companyC:Company {name: "Company C"})

CREATE (Kreuk)-[:WORKS_FOR]->(companyA)
CREATE (Hordijk)-[:WORKS_FOR]->(companyB)
CREATE (Sepers)-[:WORKS_FOR]->(companyC)
CREATE (Zwijnenberg)-[:WORKS_FOR]->(companyA)
CREATE (Snijder)-[:WORKS_FOR]->(companyB)
CREATE (Hagedoorn)-[:WORKS_FOR]->(companyC)
CREATE (Vrijhof)-[:WORKS_FOR]->(companyA)
CREATE (Berden)-[:WORKS_FOR]->(companyB)
CREATE (Ultee)-[:WORKS_FOR]->(companyC)
CREATE (Dobbelaar)-[:WORKS_FOR]->(companyA)
CREATE (Kreuk)-[:WORKS_FOR]->(companyB)
CREATE (Hordijk)-[:WORKS_FOR]->(companyC)
CREATE (Sepers)-[:WORKS_FOR]->(companyA)
CREATE (Zwijnenberg)-[:WORKS_FOR]->(companyB)
CREATE (Snijder)-[:WORKS_FOR]->(companyC)
CREATE (Hagedoorn)-[:WORKS_FOR]->(companyA)
CREATE (Vrijhof)-[:WORKS_FOR]->(companyB)
CREATE (Berden)-[:WORKS_FOR]->(companyC)
CREATE (Kreuk)-[:WORKS_FOR]->(companyC)
CREATE (Hordijk)-[:WORKS_FOR]->(companyA)
CREATE (Sepers)-[:WORKS_FOR]->(companyB)
CREATE (Zwijnenberg)-[:WORKS_FOR]->(companyC)
CREATE (Snijder)-[:WORKS_FOR]->(companyA)
2 REPLIES 2

For now my solution.

Any improvements?

MATCH (c:Company)<-[f1:WORKS_FOR]-(p:Person)
WITH p, apoc.coll.sort(collect(c.name)) AS companies
WHERE SIZE(companies) > 1
WITH {companies: companies, employee: collect(p.name)} as map
WITH COLLECT(map) AS list
UNWIND list AS row
WITH row.companies AS companies, 
	REDUCE(l = row.employee, 
	i IN range(0, size(list) - 1) | 
    CASE WHEN apoc.coll.containsAll(list[i].companies, row.companies) THEN
    l + list[i].employee
    ELSE l
    END) AS emps
RETURN companies, apoc.coll.toSet(apoc.coll.sort(emps)) AS employee

This is an overlap question, similar to "finding all movies that are in this set of genres"

Do you have any starting point for this?

One way to do it is to project this to a person-to-person network or company to company network.
And then using clustering from graph data science library to see who's in a cluster

i.e. project via cypher

nodes:

MATCH (p:Person) RETURN id(p) as id

relationships

MATCH (p1:Person)-[:WORKS_FOR]->(c:Company)<-[:WORKS_FOR]-(p2:Person)
RETURN id(p1) as source, id(p2) as target, count(*) as weight

And then run louvain or WCC on top of that projected graph.