Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
05-14-2021 01:26 AM
Hi,
Our use case is to load bulk data in neo4j graph after processing (not from excel or other sources). We are using Neo4j-java Asyn API to do the same.
My question is: What should be the transaction boundary? How many queries to run in one transaction?
Sharing sample code
CompletionStage<Boolean> ss = driver.asyncSession()
.beginTransactionAsync()
.thenComposeAsync(transaction -> {
// Some code here to get/generate queries and parameters
CompletionStage<ResultCursor> rs1 = transaction.runAsync(query1.toString(), arguments1);
CompletionStage<ResultCursor> rs2 = transaction.runAsync(query2.toString(), arguments2);
CompletionStage<ResultCursor> rs3 = transaction.runAsync(query3.toString(), arguments3);
CompletionStage<ResultCursor> rs4 = transaction.runAsync(query4.toString(), arguments4);
CompletionStage<ResultCursor> rs5 = transaction.runAsync(query5.toString(), arguments5);
// some more
CompletionStage<ResultCursor> rs6 = transaction.runAsync(query500.toString(), arguments500);
// code to compile the result of all
}.exceptionally(e -> {
//logger.error(" 2-- Error occurred: {} ", e.getMessage(), e);
/// .. do something
});
Note: Each query is creating/deleting multiple nodes and relationships - like each query updating a small graph by itself.
Please suggest.
05-17-2021 12:37 PM
There is no general answer other than "it depends". What it depends on is whether the different async transactions you're running lock each other's data, or whether they depend on changes made by previous transactions.
Simple example, let's say you have a graph (A)->(B)->(C)->(D)
Now let's imagine that your async transactions are like this:
Notice how none of these transactions depend on one another, nor touch what one of the other transactions need. Yeah, you can do them all in parallel you're fine.
What if all 4 transactions were changing A? Then it would be a very bad idea to do them async because what you'd end up with would depend on which order they succeeded in, or otherwise it's likely your transactions would fail because they're all trying to take a lock on A at the same time.
Principles:
The real answer is that you should read about Neo4j's "causal consistency" model here, because once you understand these principles you'll be able to work out the answer for your workload.
05-18-2021 04:17 AM
Hey, @david.allen thank you for in dept description. I have gone through the material you shared.
But my question is: What should be the transaction boundary? How many queries to run in ONE TRANSACTION?
driver.asyncSession().beginTransactionAsync()
gives ONE Transaction and we are using the same to run multiple (1000+) queries in one go.
Can we combine n number of queries in ONE TRANSACTION? What is the recommended?
05-21-2021 07:54 AM
you can combine as many queries as you like in one transaction, subject to memory limitations. If the TX becomes too big or changes too much in one go, it could cause an out of memory error.
How many queries should you put in one transaction? I can't answer this because it depends 100% on what the queries do, and whether they need to modify data created by the previous queries.
The purpose of a transaction is to be an atomic unit of work that either succeeds or fails. You never want transactions to "partly work". So the transactional boundary is up to your business need of what you're trying to do. If you are streaming a bunch of records in, and you want to write them no matter if there are errors or not, then it doesn't matter.
If you want to add either 1,000 or zero records, then you put them all in one TX, so they all either atomically succeed or fail.
All the sessions of the conference are now available online