Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
04-01-2021 04:03 PM
Hello,
How to best use MATCH across 5 different node labels to collect in a list unique values found on all 5 nodes with the same property label? I'm writing a somewhat functional query here but it's getting quite long so I don't know if I'm taking the right approach.
As a hypothetical scenario, I have these 5 nodes with these uniqueness constraints:
Actor, actor_id
Director, director_id
Producer, producer_id
Writer, writer_id
Designer, designer_id
And all of these 5 nodes also contain a business_id as an additional node property for example. I created indexes on these 5 node properties too for faster search performance. Indexes were created as follows:
CREATE INDEX ActorBusID FOR (a:Actor) ON (a.bus_id);
CREATE INDEX DirectorBusID FOR (d:Director) ON (d.bus_id);
CREATE INDEX ProducerBusID FOR (p:Producer) ON (p.bus_id);
CREATE INDEX WriterBusID FOR (w:Writer) ON (w.bus_id);
CREATE INDEX DesignerBusID FOR (dg:Desginer) ON (dg.bus_id);
This is what I'm trying:
MATCH(a:Actor)
WITH a.bus_id AS actorids
ORDER BY actorids ASC
MATCH(d:Director)
WITH d.bus_id AS directorids
ORDER BY directorids ASC
MATCH(p:Producer)
WITH p.bus_id AS producerids
ORDER BY producerids ASC
MATCH(w:Writer)
WITH w.bus_id AS writerids
ORDER BY writerids ASC
MATCH(dg:Desginer)
WITH dg.bus_id AS designerids
ORDER BY designerids ASC
RETURN collect( DISTINCT [actorids, directorids, producerids, writerids, designerids]) AS `Business_IDs`
I'm hoping to return a list of ordered and ascending unique values
My end result should be something like:
Business_IDs
[1, 2, 3, 4, 5]
It seems to me like the query I wrote is super long and it could be a lot shorter or simpler.
Solved! Go to Solution.
04-02-2021 01:27 PM
The simplest approach would be to come up with some label that could be applied to all of these nodes, such as :BusinessEntity or something, and add the label to all of those (remember nodes can be multi-labeled) as well as create a corresponding index (which might replace the individual indexes on bus_id on the other labels).
Then your query simplifies to:
MATCH (n:BusinessEntity)
WITH DISTINCT n.bus_id as bus_id
ORDER BY bus_id ASC
RETURN collect(bus_id) as `Business_IDs`
And if you don't want to do that, then you should be able to get Cypher to perform unions of each label scan under the hood with something like:
MATCH (n)
WHERE n:Actor OR n:Director OR n:Producer OR n:Writer OR n:Designer
WITH DISTINCT n.bus_id as bus_id
ORDER BY bus_id ASC
RETURN collect(bus_id) as `Business_IDs`
Use an EXPLAIN of the plan to verify this. You should see a series of branching at the start, with NodeByLabelScan and Union operations. You do NOT want to see AllNodesScan operators.
04-01-2021 07:14 PM
Try this:
MATCH(a:Actor)
WITH collect(distinct a.bus_id) AS actorids
MATCH(d:Director)
WITH collect(distinct d.bus_id) AS directorids, actorids
with apoc.coll.sort(apoc.coll.union(actorids, directorids)) as actdir
MATCH(p:Producer)
WITH collect(distinct p.bus_id) AS producerids, actdir
with apoc.coll.sort(apoc.coll.union(actdir, producerids)) as actdirpd
MATCH(w:Writer)
WITH Collect(distinct w.bus_id) AS writerids, actdirpd
with apoc.coll.sort(apoc.coll.union(actdirpd, writerids)) as actdirpdwr
MATCH(dg:Desginer)
WITH collect(distinct dg.bus_id) AS designerids, actdirpdwr
with apoc.coll.sort(apoc.coll.union(actdirpdwr, designerids)) as final
RETURN final
04-02-2021 01:27 PM
The simplest approach would be to come up with some label that could be applied to all of these nodes, such as :BusinessEntity or something, and add the label to all of those (remember nodes can be multi-labeled) as well as create a corresponding index (which might replace the individual indexes on bus_id on the other labels).
Then your query simplifies to:
MATCH (n:BusinessEntity)
WITH DISTINCT n.bus_id as bus_id
ORDER BY bus_id ASC
RETURN collect(bus_id) as `Business_IDs`
And if you don't want to do that, then you should be able to get Cypher to perform unions of each label scan under the hood with something like:
MATCH (n)
WHERE n:Actor OR n:Director OR n:Producer OR n:Writer OR n:Designer
WITH DISTINCT n.bus_id as bus_id
ORDER BY bus_id ASC
RETURN collect(bus_id) as `Business_IDs`
Use an EXPLAIN of the plan to verify this. You should see a series of branching at the start, with NodeByLabelScan and Union operations. You do NOT want to see AllNodesScan operators.
04-02-2021 01:36 PM
@ameyasoft Thank you for your reply! I did not know about those two APOC functions. I'll give those a try.
@andrew.bowman
Oh ok, yeah I can definitely add another label to those 5 nodes. These 5 nodes get updated on a daily basis so adding a new label to these nodes when they get updated shouldn't be a problem.
Thank you Neo4j community! I think this is what I was looking for.
04-02-2021 01:57 PM
Just to make sure, when I create this index on a multi-labeled node, is this the correct syntax so that the index is applied to all of the bus_id on all 5 nodes?
CREATE INDEX BusinessIDs FOR (x:BusinessEntity) ON (x.bus_id);
Then, will this index apply to all of the 5 bus_id properties on each node when I run MATCH(n:BusinessEntity)?
n:Actor:BusinessEntity
n:Producer:BusinessEntity
n:Director:BusinessEntity
n:Writer:BusinessEntity
n:Designer:BusinessEntity
If I'm reading the documentation correctly, it seems like Indexes on multi-labeled nodes are not allowed. No?
A named index on a single property for all nodes that have a particular label can be created with CREATE INDEX index_name FOR (n:Label) ON (n.property)
.
I'm trying this out right now, and will report my findings here.
Thanks!
04-02-2021 02:48 PM
You will need to add the new label on all the nodes of all the other labels:
MATCH (n)
WHERE n:Actor OR n:Director OR n:Producer OR n:Writer OR n:Designer
SET n:BusinessEntity
After that, you can use the simpler query on just :BusinessEntity nodes for the bus_id and it will use the index.
However this does mean you will have to make sure for every new node of those labels that you also add the :BusinessEntity label.
All the sessions of the conference are now available online