Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
01-06-2021 09:23 PM
I have a MERGE query (on relationship) of the below form, and about 2000 queries are invoked around the same time, its taking ~5 minutes to complete all of them.
How can I refactor the query or application logic so that this can be drastically faster.
MERGE (a: category { name: {name} })
MERGE (b: item { hash: {hash}, timestamp: {timestamp} } )
MERGE (b)-[r:PRESENT_IN{count: {count}}]->(a)
Conditions:
category.name
and item.hash
PRESENT_IN
relationships created should be unique , no duplicatesprotocol
or item
already existsprotocol
and item
already exists01-07-2021 12:17 AM
Hi @tinqnit
Can You add timestamp index to this DB to make it faster?
CREATE INDEX item_timestamp FOR (n:item) ON (n.timestamp)
01-07-2021 01:02 AM
I already have a unique constraint placed on item.hash
, assuming this implicitly behaves like an index and suffices?
01-07-2021 12:26 AM
Can you provide the code where this request is used?
You might use the EXPLAIN and PROFILE clause before your MERGE clauses to see what is happening there?
I wonder if we have a cartesian product here between each MERGE, I have to double check that.
I don't thing MERGE does cartesian product, must be in your code.
01-07-2021 01:04 AM
Not sure what you mean by Cartesian product in my code.
I dont think it matters what the data actually is, we have ~2000 relationships that needs to be inserted by processing an input file at once, and I feel ~5 minutes is too long.
01-07-2021 01:07 AM
Five minutes is huge for a tiny 2000 relations indeed
01-07-2021 01:10 AM
Yes, and the query structure is very simple, which is confusing me.
The actual query we are running in our code has a lot more properties in the nodes and relationships. But the basic essence is the same.
Any other clues on this please?
01-07-2021 01:22 AM
The way the session and transactions are coded in you language have a huge influence on the speed of your data creation process.
If, as an example, you create a session for each query in a loop the time will increase drastically.
( Don't do that, I did, it was funny )
Your MERGE must include the strict necessary properties to check if it's already exist only.
The other properties on creation must be set with the ON CREATE subclause.
Matching or merging with the clause MERGE with too much properties inside {} can slow down the process significantly too. Unless using a really big composite index. Do not hesitate to use the EXPLAIN or PROFILE clause. It's the neo4j magic debugger.
03-24-2022 01:20 AM
Hi,
Why are you saying "If, as an example, you create a session for each query in a loop the time will increase drastically." ? (see this link : Should the session be closed each query? · Issue #384 · neo4j/neo4j-javascript-driver · GitHub where neo4j is saying that is it not a bad practice.)
That's exactly what i am doing lol. Got many microservices pushing data, and everytime a microservices found new data, he's opening new session, making his transaction thought it and then closing it.
You think it's a better idea to create one session peer microservices, and then make all transaction thought the same session, even if the session is alive for many days ?
Thanks,
Gautier
03-24-2022 02:37 PM
Excuse me gautier
It's not what I mean, I didn't want to put you in the wrong way.
Yes indeed the session are extrêmely lightweight and disposable, but I only mean that often your code or the way you code your transactions / query / sessions in the language for your choisce might have a huge impact on the performance.
It's not necessary your query, but in this case yes your query and some catesian products in it which can be deadly for huge dataset.
01-07-2021 01:31 AM
I am reusing the session. The same query is invoked concurrently in batches of 50 at a time, until all the 2000 requests are completed.
I did try the PROFILE on my query, and it seems to show (786,579 db hits) and (455,803 db hits) for category and items. I believe the existing unique constraint should be sufficient.
However, I will make changes to include the ON CREATE subclause you pointed you! Not sure if it will make a big change in execution time though.
01-07-2021 01:49 AM
This is pretty lame, but just realised on this test machine the constraints were not setup correctly.
Thanks for the help!
All the sessions of the conference are now available online