Neo4j

poem_daga · ‎05-14-2021

Hi,

Our use case is to load bulk data in neo4j graph after processing (not from excel or other sources). We are using Neo4j-java Asyn API to do the same.

My question is: What should be the transaction boundary? How many queries to run in one transaction?

Sharing sample code

CompletionStage<Boolean> ss = driver.asyncSession()
                    .beginTransactionAsync()
                    .thenComposeAsync(transaction -> {
					
					// Some code here to get/generate queries and parameters
					
					CompletionStage<ResultCursor> rs1 = transaction.runAsync(query1.toString(), arguments1);
					CompletionStage<ResultCursor> rs2 = transaction.runAsync(query2.toString(), arguments2);
					CompletionStage<ResultCursor> rs3 = transaction.runAsync(query3.toString(), arguments3);
					CompletionStage<ResultCursor> rs4 = transaction.runAsync(query4.toString(), arguments4);
					CompletionStage<ResultCursor> rs5 = transaction.runAsync(query5.toString(), arguments5);
					// some more 
					CompletionStage<ResultCursor> rs6 = transaction.runAsync(query500.toString(), arguments500);
					
					// code to compile the result of all 
					
					}.exceptionally(e -> {
                        //logger.error(" 2-- Error occurred: {} ", e.getMessage(), e);
                        /// .. do something
                    });

Note: Each query is creating/deleting multiple nodes and relationships - like each query updating a small graph by itself.

Please suggest.

david_allen · ‎05-17-2021

There is no general answer other than "it depends". What it depends on is whether the different async transactions you're running lock each other's data, or whether they depend on changes made by previous transactions.

Simple example, let's say you have a graph (A)->(B)->(C)->(D)

Now let's imagine that your async transactions are like this:

Change properties of A
Change properties of C
Change properties of D
Create a new node E and link it to B

Notice how none of these transactions depend on one another, nor touch what one of the other transactions need. Yeah, you can do them all in parallel you're fine.

What if all 4 transactions were changing A? Then it would be a very bad idea to do them async because what you'd end up with would depend on which order they succeeded in, or otherwise it's likely your transactions would fail because they're all trying to take a lock on A at the same time.

Principles:

When you need a transaction to depend on what happened before (order is important) then don't do them async. Do them synchronously within one session object.
When 2 or more transactions are all touching the same data (but order isn't important) then do them synchronously.
When transactions are truly independent, feel free to do them in parallel.

The real answer is that you should read about Neo4j's "causal consistency" model here, because once you understand these principles you'll be able to work out the answer for your workload.

poem_daga · ‎05-18-2021

Hey, @david.allen thank you for in dept description. I have gone through the material you shared.
But my question is: What should be the transaction boundary? How many queries to run in ONE TRANSACTION?

driver.asyncSession().beginTransactionAsync()

gives ONE Transaction and we are using the same to run multiple (1000+) queries in one go.

Can we combine n number of queries in ONE TRANSACTION? What is the recommended?

david_allen · ‎05-21-2021

you can combine as many queries as you like in one transaction, subject to memory limitations. If the TX becomes too big or changes too much in one go, it could cause an out of memory error.

How many queries should you put in one transaction? I can't answer this because it depends 100% on what the queries do, and whether they need to modify data created by the previous queries.

The purpose of a transaction is to be an atomic unit of work that either succeeds or fails. You never want transactions to "partly work". So the transactional boundary is up to your business need of what you're trying to do. If you are streaming a bunch of records in, and you want to write them no matter if there are errors or not, then it doesn't matter.

If you want to add either 1,000 or zero records, then you put them all in one TX, so they all either atomically succeed or fail.

Neo4j

How many queries to run in one Async Session - Transaction