Neo4j

Gosforth · ‎02-22-2022

I wonder if I can use GDS library to find similar clients; clients who have bought similar products. Perfect if I could find similarity where base is similar variety of products AND not only product types but also quantity of these products.

My sample data:

CREATE
  (karol:Client {name: 'Karol'}),
  (michal:Client {name: 'Mike'}),
  (anna:Client {name: 'Anna'}),
  (shoes:Product {name: 'Shoes', product_no: 100}),
  (coat:Product {name: 'Coat', product_no: 101}),
  (pants:Product {name: 'Pants', product_no: 102}),
  (jacket:Product {name: 'Jacket', product_no: 103}),
  (skirt:Product {name: 'Skirt', product_no: 104}),

  (karol)-[:BOUGHT {date: '2022-03-01', quantity: 1}]->(shoes),
  (karol)-[:BOUGHT {date: '2022-03-02', quantity: 1}]->(coat),
  (karol)-[:BOUGHT {date: '2022-03-04', quantity: 1}]->(pants),
  (mike)-[:BOUGHT {date: '2022-03-01', quantity: 1}]->(shoes),
  (mike)-[:BOUGHT {date: '2022-03-02', quantity: 1}]->(coat),
  (mike)-[:BOUGHT {date: '2022-05-11', quantity: 2}]->(jacket),
  (anna)-[:BOUGHT {date: '2022-05-14', quantity: 3}]->(jacket),
  (anna)-[:BOUGHT {date: '2022-04-20', quantity: 1}]->(skirt);

So Karol and Mike should be close. But if Anna and Mike bought a lot of jackets, they would be more similar than others.

If someone could help me how to create such query.

Regards

glilienfield · ‎02-22-2022

I have no idea if this make sense for your application or is an accurate metric to measure similarity, but throwing it out there as an idea. The query estimates the similarity between two customers by counting the number of times the two customers bought the same product.

If you have some classification for the products, you could alter it to count the number of times they bought products in the same category.

match(n:Client)
match pn = (n)-[:BOUGHT]->(p:Product)
with n, p, count(pn) as cn
match(m:Client)
where n<>m and id(n) < id(m)
match pm = (m)-[:BOUGHT]->(p)
with n, m, p, cn, count(pm) as cm
with n, m, p, apoc.coll.min([cn, cm]) as cnt
return n as cust1, m as cust2, count(cnt) as similarity

Gosforth · ‎02-23-2022

Thank you Gary, this could be some approach. But only takes into account number of common relations - does not include type of products.
I keep it in notebook, maybe useful for other projects.
Thanks

glilienfield · ‎02-23-2022

I agree with you. It is fairly simple. I think the metric makes sense, but it would be a lot more accurate if you could have a more fuzzy similarity criteria between products instead of exact match, as used in the query.

Neo4j

How to find similar clients?