Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
05-28-2019 08:19 AM
I am using neo4j-3.5 community version .And have built a huge graph using neo4j containing ~20 million nodes on movie data. I have also inserted genres & keywords and constructed the graph.
The following query takes > 5 seconds
MATCH
(p:`Program`),
(p:Program)-[genre:of_genre]->(g:Genre),
(p:Program)-[key_rel:associated_keyword]->(k:Keyword)
WHERE
((g.id IN [1010]) OR (k.id IN ['keyword_121'])) AND
((p.show_type IN ['movie'])) AND
(p.imdb_score > 0)
RETURN distinct p.id, p.imdb_score
ORDER BY p.imdb_score desc
LIMIT 50
whereas if i replace OR with AND, it takes < 100 ms
MATCH
(p:`Program`),
(p:Program)-[genre:of_genre]->(g:Genre),
(p:Program)-[key_rel:associated_keyword]->(k:Keyword)
WHERE
((g.id IN [1010]) AND (k.id IN ['keyword_121'])) AND
((p.show_type IN ['movie'])) AND
(p.imdb_score > 0)
RETURN distinct p.id, p.imdb_score
ORDER BY p.imdb_score desc
LIMIT 50
Indexes are there for genre id and keyword id.
The PROFILE response for 'OR' :
THE PROFILE response for 'AND'
Any better way to write the query with OR operator?
05-28-2019 02:54 PM
You can turn it into a UNION
MATCH
(p:`Program`),
(p:Program)-[genre:of_genre]->(g:Genre),
(p:Program)-[key_rel:associated_keyword]->(k:Keyword)
WHERE
((g.id IN [1010])) AND
((p.show_type IN ['movie'])) AND
(p.imdb_score > 0)
RETURN distinct p.id, p.imdb_score
ORDER BY p.imdb_score desc
LIMIT 50
UNION ALL
MATCH
(p:`Program`),
(p:Program)-[genre:of_genre]->(g:Genre),
(p:Program)-[key_rel:associated_keyword]->(k:Keyword)
WHERE
((k.id IN ['keyword_121'])) AND
((p.show_type IN ['movie'])) AND
(p.imdb_score > 0)
RETURN distinct p.id, p.imdb_score
ORDER BY p.imdb_score desc
LIMIT 50
05-28-2019 05:27 PM
@michael.hunger : I will not be able to use UNION ALL as I am building a conversational interface for movie data and the query is constructed in real time with different types of nodes and the expression changes for every query.
Is there any other change I could do to improve the performance?
05-28-2019 10:30 PM
I think something else you run into is the combinatorial explosion.
After all you just use your matches as conditions.
So usually you'd to put a with distinct p
in between the branches but with the OR that doesn't work.
So instead you could use pattern comprehensions for that.
MATCH
(p:`Program`)
WHERE
((p.show_type IN ['movie'])) AND (p.imdb_score > 0)
AND (
size([ (p:Program)-[genre:of_genre]->(g:Genre) WHERE (g.id IN [1010]) | true ]) > 0
OR
size([ (p:Program)-[key_rel:associated_keyword]->(k:Keyword) WHERE k.id IN ['keyword_121'] ]) > 0)
)
RETURN distinct p.id, p.imdb_score
ORDER BY p.imdb_score desc
LIMIT 50
The next version of Neo4j which will be out as a milestone soon, will have existential subqueries (where
exists { MATCH () WHERE ... }
OR
exists { MATCH () WHERE ... }
which should exactly match your requirement.
All the sessions of the conference are now available online