Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
12-05-2019 01:50 AM
Hi all,
I am currently ingesting millions of lines of data into my graph and looking for the most stable and performative way to do so. I already came up with batching my data and feeding it via UNWIND, so that is pretty much solved performance wise. Now i had a testing environment where i simply ran
session.run(self.statement,batch = batch)
and it processed roughly 30-40 batches per second. Then i went on to make it more stable and put it into a write_transaction as follows:
def commit_batch(tx, batch):
return tx.run(self.statement, batch=batch)
session.write_transaction(commit_batch, batch)
and the performance goes down to ~10 batch per second. Since i am processing 160k batches, this is an issue. I would be very happy if someone could point me to further optimizations or an explanation of why the performance drops that drastically in the write_transaction environment.
Thanks!
PS: I am only merging and creating nodes and relationships, no RETURN statement included. Still i have to collect some returned values in an object and occasionally call results.consume() to save my DB from crashing due to a full outgoing buffer. Any way to circumvent this so that it really does not return anything into the buffer? It seems like quite a waste of recources. Thanks again!
12-05-2019 04:52 AM
HI @florin.ratajczak1 , do you index created on the node properties you are merging ?
if not , that would improve the performance drastically
12-05-2019 05:55 AM
Hi! Thank you for your interest in the question. Yes, i have set an index on the property i am merging on. I had the batches per second wrong by a factor of 10, but we are still talking hours for the complete import.
All the sessions of the conference are now available online