Neo4j

trent_fowler · ‎02-05-2021

What I want to do is find the members of a region who have not bought any of the n most popular nutrients in that region.

Getting the most popular nutrients isn't hard:

MATCH  (:region {reg: 'Central Cornbelt'})-[:hasDiv]->(:division)-[:hasRep]->(:Sales_rep)-[:hasCustomer]-(:customer)-[:madePurchase]->(t:transaction)
RETURN t.item_name, count(t) AS frequency
ORDER BY frequency DESC
LIMIT 3

And then it's not hard to filter by people who haven't bought any of these:

MATCH  (:region {reg: 'Central Cornbelt'})-[:hasDiv]->(:division)-[:hasRep]->(:Sales_rep)-[:hasCustomer]-(c:customer)-[:madePurchase]->(t:transaction)
WHERE NOT t.item_name IN  ["ITEM 1", "ITEM 2", "ITEM 3"]
RETURN DISTINCT(c.customer_name)

What I want to do is somehow combine the two, getting the list [ITEM 1...] directly and then filtering.

I've spent half a day messing around with CALL {} and haven't been able to get it to work.

My most recent attempt is here:

MATCH  (:region {reg: 'Central Cornbelt'})-[:hasDiv]->(:division)-[:hasRep]->(:Sales_rep)-[:hasCustomer]-(c:customer)-[:madePurchase]->(t:transaction)
WHERE NOT t.item_name IN  
	CALL {
          MATCH  (:region {reg: 'Central Cornbelt'})-[:hasDiv]->(:division)-[:hasRep]->(:Sales_rep)-[:hasCustomer]-(:customer)-[:madePurchase]->(t:transaction)
      RETURN t.item_name, count(t) AS frequency
      ORDER BY frequency DESC
      LIMIT 5
    }

RETURN DISTINCT(c.customer_name)

I've also tried various uses of WITH to sort transactions before returning them, but haven't gotten anywhere with that either.

elena_kohlwey · ‎02-08-2021

Hi trent.fowler,

here are some snippets of code that could help you:

what you first need is to make a list out of the transactions / items in order to have it ready for the second part of you query. The following query gives you a list of all transactions.

WITH collect(t.item_name) AS transactions

Now you don't want just any list but an ordered list:

WITH t, count(t) as frequency
ORDER BY frequency DESC
WITH collect(t.item_name) AS transactions

Last but not least, you only want some first entries of the list. The following query only saves the top three items names in the list "transactions":

WITH t, count(t) as frequency
ORDER BY frequency DESC
WITH collect(t.item_name)[0..3] AS transactions

Now, I hope that you can connect your two queries with that middle part.

Regards,
Elena

trent_fowler · ‎02-08-2021

Elena,

Thanks so much for your reply, that was a big leap forward.

When I tried to connect my original queries with this bridge, I encountered behavior I didn't expect.

For example, this:

MATCH  (:region {reg: 'Central Cornbelt'})-[:hasDiv]->(:division)-[:hasRep]->(:Sales_rep)-[:hasCustomer]->(c:customer)-[:madePurchase]->(t:transaction)

WITH t, count(t) as frequency
ORDER BY frequency DESC
WITH collect(t.item_name)[0..3] AS transactions

RETURN c.customer_name

Throws a Variable c not defined (line 6, column 8 (offset: 262)) "RETURN c.customer_name" error.

Does collect() clear the namespace, or something? I read the documentation but found no answer.

Here was my full attempt to bridge my original queries with your code:

MATCH  (:region {reg: 'Central Cornbelt'})-[:hasDiv]->(:division)-[:hasRep]->(:Sales_rep)-[:hasCustomer]->(c:customer)-[:madePurchase]->(t:transaction)

WITH t, count(t) as frequency
ORDER BY frequency DESC
WITH collect(t.item_name)[0..3] AS transactions

WHERE NOT t.item_name IN  transactions
RETURN DISTINCT(c.customer_name)

This is awfully close to what I'm after, if I can figure out what's going wrong with Neo4j not recognizing the variable names.

ameyasoft · ‎02-12-2021

trent.fowler:

MATCH  (:region {reg: 'Central Cornbelt'})-[:hasDiv]->(:division)-[:hasRep]->(:Sales_rep)-[:hasCustomer]->(c:customer)-[:madePurchase]->(t:transaction)

WITH t, count(t) as frequency
ORDER BY frequency DESC
WITH collect(t.item_name)[0..3] AS transactions

WHERE NOT t.item_name IN  transactions
RETURN DISTINCT(c.customer_name)

Try this:

MATCH  (:region {reg: 'Central Cornbelt'})-[:hasDiv]->(:division)-[:hasRep]->(:Sales_rep)-[:hasCustomer]
->(c:customer)-[:madePurchase]->(t:transaction)

WITH t, count(t) as frequency
ORDER BY frequency DESC
WITH collect(t.item_name)[0..3] AS transactions

MATCH (c1:customer)-[:madePurchase]->(t1:transaction)

WHERE NOT t1.item_name IN  transactions
RETURN DISTINCT(c1.customer_name)

paltusplintus · ‎02-15-2021

I would further improve the query:

MATCH  (:region {reg: 'Central Cornbelt'})-[:hasDiv]->(:division)-[:hasRep]->(:Sales_rep)-[:hasCustomer]
->(c:customer)-[:madePurchase]->(t:transaction)

WITH t, count(t) as frequency
ORDER BY frequency DESC
WITH collect(t.item_name)[0..3] AS transactions

MATCH  (:region {reg: 'Central Cornbelt'})-[:hasDiv]->(:division)-[:hasRep]->(:Sales_rep)-[:hasCustomer]-(c1:customer)
WHERE NOT any(t_item_name in transactions WHERE EXISTS ( (c1)-[:madePurchase]->(:transaction{item_name:t_item_name}) ) )
RETURN c1.customer_name

This should be more performant and will account for customers without any transactions

trent_fowler · ‎02-15-2021

Thanks so much for the help! I hope you don't mind my asking a follow up question, as I want to understand what's happening.

I ran a 'naive' version of the query which has a less complicated WHERE clause, and the list it returns is not the same as the one returned by your query (though there's plenty of overlap):

MATCH  (:region {reg: 'Central Cornbelt'})-[:hasDiv]->(:division)-[:hasRep]->(:Sales_rep)-[:hasCustomer]
->(c:customer)-[:madePurchase]->(t:transaction)

WITH t, count(t) as frequency
ORDER BY frequency DESC
WITH collect(t.item_name)[0..3] AS transactions

MATCH  (:region {reg: 'Central Cornbelt'})-[:hasDiv]->(:division)-[:hasRep]->(:Sales_rep)-[:hasCustomer]-(c1:customer)-[:madePurchase]->(t:transaction)
WHERE NOT t.item_name IN  transactions
RETURN DISTINCT c1.customer_name

The list returned by this query and by manually checking is the same.

I read WHERE NOT any(t_item_name in transactions WHERE EXISTS ( (c1)-[:madePurchase]->(:transaction{item_name:t_item_name}) ) ) as filtering out any customers with no transactions, but that can't be right because I grabbed one of the customers from the list returned by the naive query that wasn't in the list returned by yours and there were transactions.

So my question is, what exactly is going on in your WHERE clause?

paltusplintus · ‎02-15-2021

Your query excludes transactions of items we identified in the first step and returns unique customers. But if a customer bought something of the 3 and as well something else, the customer will be returned by your query.

My query excludes every customer that had at least 1 transaction including one of the three items identified in the first step. I. e. Relationship from a customer to a transaction of the items specified in the list not exists.

paltusplintus · ‎02-16-2021

btw thanks for mentioning the "CALL" for subqueries - I've never used it before 🙂
For the purpose of exercise I have rewritten the query with OPTIONAL MATCH - it might better self-explain what is happening:

MATCH  (:region {reg: 'Central Cornbelt'})-[:hasDiv]->(:division)-[:hasRep]->(:Sales_rep)-[:hasCustomer]
->(c:customer)-[:madePurchase]->(t:transaction)

WITH t, count(t) as frequency
ORDER BY frequency DESC
WITH collect(t.item_name)[0..3] AS transactions

   MATCH  (:region {reg: 'Central Cornbelt'})-[:hasDiv]->(:division)-[:hasRep]->(:Sales_rep)-[:hasCustomer]-(c1:customer)   
   OPTIONAL MATCH (c1:customer)-[:madePurchase]->(t:transaction)
   WHERE t.item_name in transactions
   WITH c1, count(t) as cnt
   WITH c1, (cnt=0) as no_such_transation_flag  

WHERE no_such_transation_flag  
RETURN DISTINCT c1.customer_name

elena_kohlwey · ‎02-09-2021

Hi,

by using "WITH" you loose all variables that you are not taking "with" you ;-). So, by adding a ", c" to both your WITH-statements, the error should disappear.

Regards,
Elena

Neo4j

Best way to embed one query within another query?