Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
01-19-2020 08:24 PM
I am using Neo4j To support Abandoned Cart/Search use case.
I store User-> Product relationships
But, I see most of the times at peak load Neo4j CPU choking and Query times are not good.
Cpu mostly chokes on IOWait.
NeoVersion - 3.5.6
Instance - i3.2xlarge (8 vCPUS)
Page cache- 40g
Heap-15g
This is a causal cluster with 3 Read replicas, and ~20 Python servers querying on Neo4j
Concurrent Writes/Deletions(In batches) are also happening on database.
Have a read throughput of 10k/min, Do we need to tune bolt configs? I am using bolt+routing python driver
My graph size is ~200GB.
I want to know, recent products which a User has abandoned in cart or search?
User -[r:ADDED_TO_CART]-> Product
User -[r:PURCHASED]-> Product
Query uses Bulk reads 100 users at a time.
Query -
MATCH (user:mapUser20201_14)-[action_rel:ADDED_TO_CART]->(product:mapProduct20201_14)
WHERE (user.user_id IN $user_ids)
AND (action_rel.action_time >= $action_time
AND action_rel.action_time < $action_time)
with action_rel,user,product order by action_rel.action_time desc
return distinct user{.user_id}, product{.product_id} ,
head(collect(action_rel{.action_time})) as action_rel
I am using DISTINCT as I want the top unique product and time of that action.
Checkpointing was taking ~50m so increased the iops.limit=-1, now it is reduced to ~2m, also changed the default interval of 15m to 1h
I usually see queries takes ~500-2000ms, and CPU is very high (User+IOWait) > 200%.
As there is no way to index relationship property, I cant do much here.
I already have index on mapUser20201_14{user_id} and mapProduct20201_14{product_id} (Unique constarint which also creates index)
I have tried so many things but nothing is working out. Probably Neo4j is not fit fo the use case where query involves timestamps or relation props.
01-21-2020 12:47 AM
I see basically 2 strategies to improve the runtime of your queries:
Have more RAM, e.g. use a 256GB machine. Assign 200 GB for pagecache, then all your queries will operate 100% on cache and don't produce IO load.
If adding RAM is not an option you can either go with a instance using direct attached SSD instead of EBS volumes.
If this is also not an option, make sure you have ENA enabled: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking-ena.html
_2019
as a suffix to your ADDED_TO_CART
relationship type -> ADDED_TO_CART_2019
. Now your cypher statement can be more selective before going through the filter which results in less CPU activity.01-21-2020 12:56 AM
@stefan.armbruster We are using i3.2xlarge. This is an instance store and attached SSD. The IO stats are from there only.
I will see after changing dataModel. So, if I have ADDED_TO_CART_2019, but still no INDEX.
So, will CPU be less because that will be included in the cypher?
01-21-2020 01:02 AM
I'd really give a machine with more RAM a test shot.
01-21-2020 01:58 AM
@stefan.armbruster Do you see benefit in Data model change? Shall i try with it?
01-21-2020 02:11 AM
I'd probably give a quick shot at increasing RAM first, data model changes will for sure help as well, but I guess this requires more effort.
01-23-2020 04:43 PM
From a modeling perspective you can elevate the time property of your relationships to the rel-type, e.g. to :ADDED_TO_CART_yyyy_mm_dd
so you can subfilter much quicker on the time information.
Otherwise as Stefan said, it will just trash memory all the time, having to reload data from disk.
You should also not do distinct on properties
with distinct user, product, head(collect(action)) as rel
return user.user_id, product.product_id, rel.action_time
All the sessions of the conference are now available online