Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
05-26-2021 10:06 AM
Some Cypher newbies might find this interesting.
I was curious to find Movies and Stars that had a co-star with the name that starts with "Tom" but not return Tom.
Of course, with any large enough DB, you're going to get many results, so it's good to restrict your trial and error to a small subset. It makes it easier to debug and to see what's going on. So, I picked the Movie"Top Gun"
which has two Tom
's in it. (I anticipated having two Tom
's would be trouble!)
My first stab is:
// does not return Tom Skerritt and Tom Cruise
MATCH(p1:Person)-[:ACTED_IN]->(m:Movie{title:'Top Gun'})<-[:ACTED_IN]-(p2:Person )
WHERE p1.name STARTS WITH 'Tom'
AND NOT p2.name STARTS WITH 'Tom'
RETURN m, p2
This is unsatisfying to me because I have to repeat the condition Person:name STARTS WITH 'Tom'
(a second time with a negation). There's always the possibility that you might make a copy and paste error OR if the condition changes and you wanted a different first name, then you have to make two edits to change one condition. This violates the DRY principle (Don't Repeat Yourself.)
I thought if I had the id
of the Tom's, then I could test for equality of id
's.
I suspect that following wouldn't work (and I was right!). The hope is the id
of Tom Skerritt
won't match id
of Tom Cruise
so that it Tom's won't show up in the results. However, I suspected this wouldn't work: id(p1) <> id(p2)
.
// Didn't work: returns Tom Skerritt and Tom Cruise
MATCH(p1:Person)-[:ACTED_IN]->(m:Movie{title:'Top Gun'})<-[:ACTED_IN]-(p2:Person )
WHERE p1.name STARTS WITH 'Tom'
AND id(p1) <> id(p2)
RETURN m, p2
This is the funny thing about Database Declarative Query languages... conceptually things happen in "parallel": e.g. once through for p1= Tom Skerritt and once p1 = Tom Cruise. This can be confusing for people used to Procedural languages (such as Python or Java.)
So, to make this work, you need to first collect all the id
's of all the Tom, then second test each actor, to see if their id
is in the collection.
So, now this occurs in two steps:
MATCH (p1:Person)-[:ACTED_IN]->(m:Movie{title:'Top Gun'})
WHERE p1.name STARTS WITH 'Tom'
RETURN collect(id(p1))
Another conceptual hurdle with using Cypher better is to understand the WITH
statement. This allows you to have multiple steps in your Cypher. So, after getting the persons whose names start with Tom, you can engage in the second step by carrying the variables forward to the next Cypher statement. You also need WITH m
otherwise, the match on Movies won't be carried forward to the next Cypher statement.
MATCH (p1:Person)-[:ACTED_IN]->(m:Movie{title:'Top Gun'})
WHERE p1.name STARTS WITH 'Tom'
WITH COLLECT(id(p1)) AS tomList, m
MATCH (p2:Person)-[:ACTED_IN]->(m) WHERE NOT id(p2) IN tomList
RETURN m, p2
Now that I got this to work for Top Gun, then generalizing it to all movies in the DB, I remove the restriction of the Movie title:
MATCH (p1:Person)-[:ACTED_IN]->(m:Movie)
WHERE p1.name STARTS WITH 'Tom'
WITH COLLECT(id(p1)) AS tomList, m
MATCH (p2:Person)-[:ACTED_IN]->(m) WHERE NOT id(p2) IN tomList
RETURN m, p2
I hope this enlightens.
05-28-2021 08:07 AM
Great walk through...thank you! The use of WITH
is really great, particularly when you have a large amount of data that would otherwise be sent to the next step, potentially slowing things down. Great example!
All the sessions of the conference are now available online