Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
09-04-2019 02:48 PM
Hi,
I have a graph made up of the nodes 'recipe' and 'ingredient'. The links connect the ingredients to the recipes.
I want to be able to retrieve recipes based on the ingredients I have in my fridge. I have noticed that the time of retrieval grows exponentially as I add more ingredients (in fact, it is not possible to search for more than 4 ingredients at a time).
Suppose I want to retrieve all recipes that contain lemon and peach. This is the type of query I have come up with in Cypher.
MATCH (r:Recipe)-[:CONTAINS]-(i:Ingredient {name:"honey"}) WITH collect(r) AS hRep MATCH (r)-[:CONTAINS]-(i:Ingredient {name:"peach"}) WHERE r IN hRep RETURN r LIMIT 5
Is there a better/more efficient way to write it?
Any advice would be more than welcome.
Thanks a lot
09-04-2019 06:46 PM
Hi Barbara!
A couple questions:
Are you building your query dynamically? It seems you could run your initial query (with two ingredients) as follow: MATCH (i:Ingredient {name:"peach"})-[:CONTAINS]-(r:Recipe)-[:CONTAINS]-(i:Ingredient {name:"honey"}) by doing so you remove one less NodeByLabelScan.
09-04-2019 10:16 PM
Try this:
With four ingredients:
MATCH (r:Recipe)-[:CONTAINS]-(i:Ingredient {name:"honey"}) WITH collect(r) AS hRep, i
UNWIND hRep as r1
MATCH (r1)-[:CONTAINS]-(i1:Ingredient {name:"peach"}) WITH collect(r1) AS hRep1, i, i1
UNWIND hRep1 as r2
MATCH (r2)-[:CONTAINS]-(i2:Ingredient {name:"banana"}) WITH collect(r2) AS hRep2, i, i1, i2
UNWIND hRep2 as r3
MATCH (r3)-[:CONTAINS]-(i3:Ingredient {name:"guava"}) WITH collect(r3) AS hRep3, i, i1, i2, i3
UNWIND hRep3 as r4
RETURN r4.name as Recipe, i.name as Ingr1, i2.name as Ingr2, i3.name as Ingr3, i4.name as Ingr4
09-05-2019 02:18 AM
Thanks a lot, folks. I'll try both solutions.
09-06-2019 10:22 AM
Actually, I realised that the last version of my query is the following:
CALL db.index.fulltext.queryNodes('ingredients', 'lemons') YIELD node WITH node AS ing0
CALL db.index.fulltext.queryNodes('ingredients', 'carrot') YIELD node WITH node AS ing1, ing0
MATCH (r:Recipe)-[:CONTAINS]-(ing0), (r:Recipe)-[:CONTAINS]-(ing1)
Any idea why it takes forever?
Graph has 80.000 nodes (60.000 recipes + 20.000 ingredients) and 600.000 links.
Thanks again!
10-16-2019 04:03 PM
Hello,
So you may find this article on performing match intersection useful for what you're trying to do.
As for why you're seeing your latest query taking forever, this is due to cardinality issues you're introducing from these back-to-back lookups. Please review this article on understanding Cypher cardinality for more detail, but the gist of it is that operations produce records/rows as results, and operations execute per input record/row. Your second index call is being executed for every result that was generated from your first index call, and the result leads to a multiplicative number of rows. You could better visualize this from a PROFILE of the query, but with the cardinality issue in play here it may not finish.
All the sessions of the conference are now available online