cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

How to find intersections of three node sets based on a common field

I have models Region, Industry, Size, which have common property "month" - a date when the specific entry was created.
We know that all the entities might share the same month.
My algorithm should do the following:
Accept a list of possible values for each entity (region, industry, size) and find their intersection based on the month.
Example:
Input: Industry=["Information Technology", "Medicine"], Region=["Europe", "USA"], Size=["Small", "Medium"]
Output: [201910,202005]

How do I create a relationship between these models? What is the most efficient way to request this intersection?

I tried to build the graph as follows, but seems it's not very optimal because requests are processed too slow:

1 ACCEPTED SOLUTION

Hi @miron4dev,

You may like some indexes like:

CREATE INDEX ON :Industry(industry);
CREATE INDEX ON :Region(region);
CREATE INDEX ON :Size(size);

Then, can you try this quey (sorry if it slower than yours, I hope is not 😄 )

MATCH (i:Industry)-[:OCCURRED]->(m:Month)
WHERE i.industry in ['Medicine', 'Information Technology']
WITH collect(m) as out
MATCH (r:Region)-[:OCCURRED]->(m:Month)
WHERE r.region in ['USA', 'Europe']
WITH apoc.coll.intersection(out,collect(m)) as out
MATCH (s:Size)-[:OCCURRED]->(m:Month)
WHERE s.size in ['Small', 'Medium']
WITH apoc.coll.intersection(out,collect(m)) as out
RETURN out 

H

View solution in original post

3 REPLIES 3

Bennu
Graph Fellow

Hi Miron!

How is the query you tried with this data model? Have you tried collecting them all (Input nodes) and then :

WITH collect(inputs) as inp
MATCH (d:DateMonth)
WHERE ALL(i in inp WHERE (i)-[:OCCURRED]->(m))
return m

H

Hi @Bennu
I tried the following query:

MATCH (i:Industry)-[:OCCURRED]->(m:Month),(r:Region)-[:OCCURRED]->(m:Month),(s:Size)-[:OCCURRED]->(m:Month)
WHERE i.industry in ['Medicine', 'Information Technology'] and r.region in ["USA", "Europe"] and s.size in ["Small", "Medium"]
WITH i,r,s,m
ORDER by m.value desc
return i, r, s, collect(distinct m.value)

and it produces the expected results in 621 ms. Is there any way to optimize it somehow? Change the following structure is also an option.
I already added indexes for Industry, Region, Size, and Month, but I hope I can make it better.

Hi @miron4dev,

You may like some indexes like:

CREATE INDEX ON :Industry(industry);
CREATE INDEX ON :Region(region);
CREATE INDEX ON :Size(size);

Then, can you try this quey (sorry if it slower than yours, I hope is not 😄 )

MATCH (i:Industry)-[:OCCURRED]->(m:Month)
WHERE i.industry in ['Medicine', 'Information Technology']
WITH collect(m) as out
MATCH (r:Region)-[:OCCURRED]->(m:Month)
WHERE r.region in ['USA', 'Europe']
WITH apoc.coll.intersection(out,collect(m)) as out
MATCH (s:Size)-[:OCCURRED]->(m:Month)
WHERE s.size in ['Small', 'Medium']
WITH apoc.coll.intersection(out,collect(m)) as out
RETURN out 

H