Neo4j

florin_ratajcza · ‎12-05-2019

Hi all,

I am currently ingesting millions of lines of data into my graph and looking for the most stable and performative way to do so. I already came up with batching my data and feeding it via UNWIND, so that is pretty much solved performance wise. Now i had a testing environment where i simply ran

session.run(self.statement,batch = batch)

and it processed roughly 30-40 batches per second. Then i went on to make it more stable and put it into a write_transaction as follows:

def commit_batch(tx, batch):
    return tx.run(self.statement, batch=batch)

session.write_transaction(commit_batch, batch)

and the performance goes down to ~10 batch per second. Since i am processing 160k batches, this is an issue. I would be very happy if someone could point me to further optimizations or an explanation of why the performance drops that drastically in the write_transaction environment.

Thanks!

PS: I am only merging and creating nodes and relationships, no RETURN statement included. Still i have to collect some returned values in an object and occasionally call results.consume() to save my DB from crashing due to a full outgoing buffer. Any way to circumvent this so that it really does not return anything into the buffer? It seems like quite a waste of recources. Thanks again!

ganesanmithun32 · ‎12-05-2019

HI @florin.ratajczak1 , do you index created on the node properties you are merging ?
if not , that would improve the performance drastically

florin_ratajcza · ‎12-05-2019

Hi! Thank you for your interest in the question. Yes, i have set an index on the property i am merging on. I had the batches per second wrong by a factor of 10, but we are still talking hours for the complete import.

Neo4j

Why is write_transaction() so much slower than a simple run()?