Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
05-17-2020 12:11 AM
Hello all,
I am using this query to extract products similar to a product in a certain category within a price range and it is taking a lot of time.
**match(p1:Product) **
with p1
limit 5000
match(p1)<-[:HasProduct]-(c:Category)-[:HasProduct]->(p:Product)
where tofloat(p.Price)>0.5tofloat(p1.Price) and tofloat(p.Price)<2tofloat(p1.Price)
with p1, collect(p)[..3] as products
return p1.ProductId, [product in products | product.ProductId] as pid
There are 87000 product nodes and 219 category nodes. With a limit 0f 5000, it is taking around 8 minutes to return.
Here is the explain plan for the query:
I have already created required indexes while creating the Graph DB itself. Snippet for the same is below:
05-17-2020 03:44 AM
Hello,
Did you add unique constraints on your nodes?
https://neo4j.com/docs/cypher-manual/current/administration/constraints/
05-17-2020 06:09 AM
Hello there,
I haven't added any constraints on my nodes (Probably because I am not very sure about them). Maybe you can help me with the same. I have already showed you my created indexes.
Here is my query for the node and relationship creation with all the labels and their properties.
:auto using periodic commit 5000
load csv with headers from "file:///data.csv" as line
merge (s:State{name:line.State})
merge (p:Product{ProductId:line.ProductId,Price:tofloat(line.ProductPrice),ProductName:line.ProductName})
merge (c:Category{Category:line.Category})
merge (cr:CategoryRollUp{CategoryRollUp:line.CategoryRollUp})
merge (cu:Customer{CustomerId:line.CustomerId,Grouping:line.NewGrouping})
merge (cu)-[:InteractsWith{Date:date(line.TransactionDate),Month:line.Month,EventScore:tofloat(line.EventScore),Event:line.Event,Quantity:toInteger(line.QtySold)}]->(p)
merge (s)-[:HasUser]->(cu)
merge (c)-[:HasProduct]->(p)
merge (cr)-[:HasCategory]->(c)
I have tried creating a constraint on the Price property of the Product Label but certainlt that is not unique so it showed me an error.
Can you guide me in creating contraints for the same?
05-17-2020 06:29 AM
A unique constraint must be unique:) So for each Label Node, you must create an index and create a unique constraint on them:)
You can use uuid
to generate unique ID: https://neo4j.com/docs/labs/apoc/current/graph-updates/uuid/ if you don't have unique ID. But I can see you have a ProductId
property, so you can do this:
Before creating nodes, you must create a unique constraint, for exemple:
CREATE CONSTRAINT ON (p:Product) ASSERT p.ProductId IS UNIQUE
05-21-2020 12:52 AM
Hii there, I have created a unique id constraint by following the link you shared after adding a uuid property to the Product node. The code and screen grab for the same is here:
match(p1:Product)
SET p1.uuids = apoc.create.uuid()
return p1.uuids
But surprisingly the execution time is the same. Where am I going wrong?
05-21-2020 01:01 AM
I think it's because of your Cypher request, it's doing several Merge at the same time. On my projects, I have at least 2 requests:
and I call them several times and I can create thousand of nodes and relations in a few seconds:)
Do you currently have your data in the same CSV file? So you use a single line to get everything you need, how many times does it take to create all the nodes and relationships?
05-21-2020 01:05 AM
I have no issue with the node and relationship creation query with that query I am able to make around 100000 nodes and 1.7 million relationships in 9 minutes but I am having issue with the following read type query:
match(p1:Product)
with p1
limit 5000
match(p1)<-[:HasProduct]-(c:Category)-[:HasProduct]->(p:Product)
where tofloat(p.Price)>0.5 tofloat(p1.Price) and tofloat(p.Price)<2 tofloat(p1.Price)
with p1, collect(p)[..3] as products
return p1.ProductId, [product in products | product.ProductId] as pid
With only 5000 products it took around 15 minutes to return and there are 87000 unique products
05-21-2020 01:15 AM
You also have unique constraint on Category?
I suppose the real request is:
MATCH (p1:Product)<-[:HasProduct]-(c:Category)-[:HasProduct]->(p:Product)
WHERE tofloat(p.Price) > 0.5 * tofloat(p1.Price)
AND tofloat(p.Price) < 2 * tofloat(p1.Price)
WITH p1, collect(p)[..3] AS products
RETURN p1.ProductId, [product in products | product.ProductId] AS pid
Can you show me the results of these requests:
CALL db.constraints
CALL db.schema.visualization()
05-21-2020 01:34 AM
Here are the results:
05-21-2020 01:37 AM
Your schema and your request are very simple so that's weird, did you try to increase the RAM of Neo4j in the neo4j.conf file?
05-21-2020 01:40 AM
I have already set the initial and maximum heap size to 20 and 24GB respectively but still it is not running the query. I have a 7 node cluster on the server each with similar specifications.
05-21-2020 01:42 AM
What do you mean by: a 7 node cluster on the server each with similar specifications.
?
05-21-2020 01:43 AM
Probably this might help with your question:
05-21-2020 01:45 AM
Oh you are using Causal Clustering right?
05-21-2020 01:46 AM
Yes I am using Causal Clustering.
05-21-2020 01:48 AM
I didn't use it at the moment, did you try to create a simple local database to test without Causal Clustering if the same query was taking the same time or not?
05-21-2020 01:49 AM
Yes sure, I have created the same DB on desktop version too and it is taking time there also.
05-21-2020 01:53 AM
Did you try PROFILE and EXPLAIN on your request? Can you show me the results with them?
05-21-2020 02:08 AM
The PROFILE is only for limit 100 since it was taking time for the whole DB.
12-03-2021 03:13 AM
I know it is a bit too late for my answer but did you check on this:
To give hints to the query executor which index to use because there might be any possible indexes confusion.
05-21-2020 03:36 AM
You can force Neo4j to use index: https://neo4j.com/docs/cypher-manual/current/query-tuning/using/
To be honest, after I have no idea
All the sessions of the conference are now available online