Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
03-08-2021 09:22 AM
I am trying to create a graph where nodes are products and they're linked by the ids of customer who've bought both products. The basic idea is contained in "Graph-Based Structures for the Market Baskets Analysis", and the key figure is here:
Eventually I want to assign a weight to the relationships based on how many customers bought two products, probably with a Jaccard algorithm in the data science library.
It wasn't hard to create a radial structure:
LOAD CSV WITH HEADERS FROM 'file:///BV-1350-Market_Basket_Input_Data.csv' AS line
WITH
line.HUB_PRODUCT_KEY AS HUB_PRODUCT_KEY,
line.HUB_CUSTOMER_KEY AS HUB_CUSTOMER_KEY
MERGE (p:Product {hub_product_key: HUB_PRODUCT_KEY})
MERGE (c:Customer {hub_product_key: HUB_PRODUCT_KEY,
hub_customer_key: HUB_CUSTOMER_KEY})
WITH p,c
MATCH (p),(c)
WHERE
p.hub_product_key = c.hub_product_key
MERGE (p)-[:BOUGHT_BY]->(c)
RETURN p,c
And my hope was that I'd be able to create a subgraph consisting of products linked by customers who'd bought more than one product:
MATCH path=(p1:Product)-[:BOUGHT_BY]->(c1:Customer)<-[:BOUGHT_BY]-(p2:Product)
RETURN path
This returned no changes, no records
, even though I've verified in Snowflake that there are customers who have bought more than one product.
I don't think this is the best approach anymore, but I include it for completeness in case it's helpful.
Then I thought I might create product nodes with a 'customer id' property containing a list of all the customers who've bought that product:
LOAD CSV WITH HEADERS FROM 'file:///BV-1350-Market_Basket_Input_Data.csv' AS line
MERGE (p:Product {hub_product_key: line.HUB_PRODUCT_KEY})
ON CREATE SET p.hub_customer_key = [line.hub_customer_key]
ON MATCH SET p.hub_customer_key = p.hub_customer_key + line.hub_customer_key
With this I would create some sort of relationship which looks at the list of customer ids for products and links them that way.
I've since discovered that Neo4j hates you trying to put more than one value in a property, and at any rate I keep getting Property values can only be of primitive types or arrays thereof
errors.
This still seems like the best approach, but I could be wrong.
Finally, I tried loading product nodes in with the customer ids as properties, which I'd use to make links between products by matching on the customer ids, and this tanked the database so hard I have to completely reset it:
LOAD CSV WITH HEADERS FROM 'file:///BV-1350-Market_Basket_Input_Data.csv' AS line
WITH
line.HUB_PRODUCT_KEY AS HUB_PRODUCT_KEY,
line.HUB_CUSTOMER_KEY AS HUB_CUSTOMER_KEY
MERGE (p:Product {hub_product_key: HUB_PRODUCT_KEY,
hub_customer_key: HUB_CUSTOMER_KEY})
WITH p
MATCH (p1:Product), (p2:Product)
WHERE
p1.hub_customer_key = p2.hub_customer_key AND
NOT p1.hub_product_key = p2.hub_product_key
MERGE (p1)-[:CUSTOMER {customer_key:p1.hub_customer_key}]->(p2)
RETURN p1, p2
I'm not sure why, because when I inspect the product nodes they each contain one customer id each, so I don't know why there'd be so much draw it'd crash my local instance.
So I guess my question is: given what I've tried so far, how can I set up a graph of products that are connected by customers who've purchased both products such that I can then assign weights between the products.
03-08-2021 09:26 AM
@alicia.frame1 Given your excellent webinars on graph algorithms this seems like something you'd know how to solve.
03-08-2021 12:08 PM
You might want to borrow the architecture I use for my retail demos with neo4j: GitHub - AliciaFrame/GDS_Retail_Demo: This repo contains the data and queries from the Neo4j Connect...
BOUGHT
relationship, but I added it in to make running certain algorithms easier).(:Customer)-[:MADE_TRANSACTION]->(:transaction)-[:CONTAINS]->(:Item)
) or if you want a simpler schema, just (:Customer)-[:BOUGHT]->(:Item)
)PURCHASED_SAME_ITEM
relationship) or you can use the collapsePath algorithm to just modify the in-memory graph.MATCH (c1:Customer)-[:BOUGHT]->(i:Item)<-[:BOUGHT]-(c2:Customer)
WITH c1, c2, count(i) as weight
MERGE (c1)-[:BOUGHT_SAME_ITEM{weight:weight}]->(c2)
Another approach you could take (there are a lot!) is to run node similarity to create SIMILAR_TO
relationships (using Jaccard) between customers based on the number of items they purchased in common.
All the sessions of the conference are now available online