Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
06-02-2022 11:09 AM
i tried to ingest my edges and nodes from delta file to neo4j database using spark connector, but it gets slower and slower the first 4 mill edges took 1hr and it gets slower
i ingested 130 million nodes in 6 hrs, i see that other people ingest there billions on nodes and edges like in 1-2 hrs, what did i do wrong here
06-02-2022 11:21 AM
I think you'll need a bit more memory on the neo4j machine.
Did you create the constraints so that the db can look up data efficiently during ingest for creating the connections?
06-02-2022 12:46 PM
What you mean by creating constraints, i used
"schema.optimization.type": "NODE_CONSTRAINTS
06-03-2022 01:46 AM
There are a ton of reasons that can contribute to slow down the process:
The first thing to check is the query.log in order to understand which queries are slow
06-10-2022 03:01 PM
@santand84 called it out. We have a graph that is ~32M nodes/1.7B edges that we load from Apache spark. We've had to work our way through quite a number of performance issues on the loading side, mostly tuning the batch size and partitioning/executor count.
The bigger issue we run into with large loads where there is significant overlap in relationship/node coverage is locking issues on nodes from parallel/concurrent transactions.
06-15-2022 07:25 AM
@brianmartin the best practice on batch importing the data with Spark is:
All the sessions of the conference are now available online