Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
05-11-2020 10:09 PM
Hello Friends ,
I am using the below query to return the csv of recommended product for each customer using content based filtering based on customers recent browsing history .We are looking into the categories and price of the product that they have browsed and
recommending a list of product for each customer, the number of customer node is around 38000 and to develop a list of product for each customer, the query is taking more than 1 hour .
here is my query-
"
** match(p1:Customer) where((p1) -[:InteractsWith]->(:Product))**
with COLLECT({CustomerId: p1.CustomerId }) as ws
unwind ws as ws1
match(p1:Customer{CustomerId: ws1.CustomerId})-[x:InteractsWith]->(pr:Product),
(c:Category)-[:HasProduct]->(pr)
where not exists ((p1)-[:InteractedWith]->(pr))
with c.Category as Category, x.Date as date, p1.CustomerId as Id
order by date desc
with collect(Category)[..5] as category, Id
unwind category as w2
match(c:Category{Category: w2})-[:HasProduct]->(pr:Product)
return Id, w2, collect(pr.ProductId)[..3] as pid "
now if i break the query into instances and run it on python the time taken is around 35 minutes
here is my python code for the same-
"
from py2neo import Graph
from tqdm import tqdm
import threading
#connecting database
graph = Graph('bolt://192.168.1.156:11002',user='neo4j',password='***')
#query for customers id and developing a list for customers
query_customer_id = 'match(p1:Customer) where((p1) -[:InteractsWith]->(:Product)) return p1.CustomerId'
customer = graph.run(query_customer_id).to_data_frame()
history_based_product =
#getting the category of recent browsing history category
def get_category_history(c):
query_category_history = '''
match(p1:Customer{CustomerId: "'''+c+'''"})-[x:InteractsWith]->(pr:Product),
(c:Category)-[:HasProduct]->(pr)
return x.Date as date,
c.Category as product_category
order by x.Date desc
limit 20'''
return graph.run(query_category_history).to_data_frame()
#getting product
def get_product_for_5(i):
query_getting_product_from_category_for_5 = '''
match(c:Category{Category: "'''+i+'''" })-[:HasProduct]->(pr:Product)
return pr.ProductId as pid limit 3'''
return graph.run(query_getting_product_from_category_for_5).to_data_frame().iloc[:,0].to_list()
"
is there any problem with the query itself ?
please help me out on the same.
Thank you
05-12-2020 03:24 AM
Hi Shubham,
I see that both the codes are not same in the Python query you are not looking for relationship ** where not exists ((p1)-[:InteractedWith]->(pr))
You are using Cartesian Product is there any specific reason for the same?
I am not sure home much data your code his hitting but 1 or 35 mins is too much .
I am ready to brainstorm with you on the performance.
Although I am very sure how much will it help, please just try below query post creating index on the node properties which are used in Where clause.
** match(p1:Customer) -[:InteractsWith]->(:Product))**
with COLLECT( p1.CustomerId ) as ws
unwind ws as ws1
match(p1:Customer{CustomerId: ws1.CustomerId})-[x:InteractsWith]->(pr:Product),
Optional Match (c:Category)-[:HasProduct]->(pr)
where not exists ((p1)-[:InteractedWith]->(pr))
with c.Category as Category, x.Date as date, p1.CustomerId as Id
order by date desc
with collect(Category)[..5] as category, Id
unwind category as w2
match(c:Category{Category: w2})-[:HasProduct]->(pr:Product)
return Id, w2, collect(pr.ProductId)[..3] as pid "
05-12-2020 11:02 AM
hii @intouch.vivek , Thanks for the support
05-12-2020 12:21 PM
Could you please try below
match(p1:Customer) -[:InteractsWith]->(:Product)
with COLLECT(p1.CustomerId ) as ws
unwind ws as ws1
match(p1:Customer{CustomerId: ws1.CustomerId})-[x:InteractsWith]->(pr:Product)<--[:HasProduct]-(c:Category)
with c.Category as Category, x.Date as date, p1.CustomerId as Id
order by date desc
with collect(Category)[..5] as category, Id
unwind category as w2
match(c:Category{Category: w2})-[:HasProduct]->(pr:Product)
return Id, w2, collect(pr.ProductId)[..3] as pid
05-12-2020 10:50 PM
hi @intouch.vivek , tried executing the above code
it returned the error Type mismatch: expected a map but was String("100000911")
so i changed with COLLECT( p1.CustomerId ) as ws line by
with COLLECT({CustomerId: p1.CustomerId }) as ws and it was running till 500 second and at the same time i call the query log, page hits were around 399065748 and after that query-logs were not there for the query i executed, and it was still running on terminal but didn't return anything.
05-13-2020 01:09 AM
Hi Shubham
Can we see it now?
All the sessions of the conference are now available online