Neo4j

miron4dev · ‎08-13-2020

I have models Region, Industry, Size, which have common property "month" - a date when the specific entry was created.
We know that all the entities might share the same month.
My algorithm should do the following:
Accept a list of possible values for each entity (region, industry, size) and find their intersection based on the month.
Example:
Input: Industry=["Information Technology", "Medicine"], Region=["Europe", "USA"], Size=["Small", "Medium"]
Output: [201910,202005]

How do I create a relationship between these models? What is the most efficient way to request this intersection?

I tried to build the graph as follows, but seems it's not very optimal because requests are processed too slow:

Bennu · ‎08-14-2020

Hi @miron4dev,

You may like some indexes like:

CREATE INDEX ON :Industry(industry);
CREATE INDEX ON :Region(region);
CREATE INDEX ON :Size(size);

Then, can you try this quey (sorry if it slower than yours, I hope is not 😄 )

MATCH (i:Industry)-[:OCCURRED]->(m:Month)
WHERE i.industry in ['Medicine', 'Information Technology']
WITH collect(m) as out
MATCH (r:Region)-[:OCCURRED]->(m:Month)
WHERE r.region in ['USA', 'Europe']
WITH apoc.coll.intersection(out,collect(m)) as out
MATCH (s:Size)-[:OCCURRED]->(m:Month)
WHERE s.size in ['Small', 'Medium']
WITH apoc.coll.intersection(out,collect(m)) as out
RETURN out

H

View solution in original post

Bennu · ‎08-13-2020

Hi Miron!

How is the query you tried with this data model? Have you tried collecting them all (Input nodes) and then :

WITH collect(inputs) as inp
MATCH (d:DateMonth)
WHERE ALL(i in inp WHERE (i)-[:OCCURRED]->(m))
return m

H

miron4dev · ‎08-14-2020

Hi @Bennu
I tried the following query:

MATCH (i:Industry)-[:OCCURRED]->(m:Month),(r:Region)-[:OCCURRED]->(m:Month),(s:Size)-[:OCCURRED]->(m:Month)
WHERE i.industry in ['Medicine', 'Information Technology'] and r.region in ["USA", "Europe"] and s.size in ["Small", "Medium"]
WITH i,r,s,m
ORDER by m.value desc
return i, r, s, collect(distinct m.value)

and it produces the expected results in 621 ms. Is there any way to optimize it somehow? Change the following structure is also an option.
I already added indexes for Industry, Region, Size, and Month, but I hope I can make it better.

Bennu · ‎08-14-2020

Hi @miron4dev,

You may like some indexes like:

CREATE INDEX ON :Industry(industry);
CREATE INDEX ON :Region(region);
CREATE INDEX ON :Size(size);

Then, can you try this quey (sorry if it slower than yours, I hope is not 😄 )

MATCH (i:Industry)-[:OCCURRED]->(m:Month)
WHERE i.industry in ['Medicine', 'Information Technology']
WITH collect(m) as out
MATCH (r:Region)-[:OCCURRED]->(m:Month)
WHERE r.region in ['USA', 'Europe']
WITH apoc.coll.intersection(out,collect(m)) as out
MATCH (s:Size)-[:OCCURRED]->(m:Month)
WHERE s.size in ['Small', 'Medium']
WITH apoc.coll.intersection(out,collect(m)) as out
RETURN out

H

Neo4j

How to find intersections of three node sets based on a common field