Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
05-07-2020 12:25 AM
I've been using the python driver to generate a very large graph. Tens of millions of nodes. The problem is, I only seem to be generating a few hundred nodes per second. At this rate it will take days to load in what is essentially only a few megabytes of data.
This leads me to be believe that somewhere the data is being bottlenecked. And idea where this might be, and how I could get around it?
05-07-2020 01:40 AM
Hi @m.a.klaczynski,
Welcome to Neo4j community!!
You can try these points:
There are multiple ways to fasten the ingestion
05-07-2020 04:31 AM
Here's a generic python script that is driven by config yml file to ingest data in batch mode for CSV and json files.
A sample config yml file would be
server_uri: bolt://localhost:7687
admin_user: neo4j
admin_pass: test
files:
- url: file://my/file/a.csv
cql: |
WITH $dict.rows as rows UNWIND rows as row
MERGE (a:ALabel {p1: row.p1, p2: row.p2, p3: row.p3})
- url: file://my/file/b.csv
chunk_size: 100
cql: |
WITH $dict.rows as rows UNWIND rows as row
MERGE (b:BLabel {p1: row.p1, p2: row.p2, p3: row.p3})
Increase the chunk size based on your heap and txn size.
I was able to import few 100,000 csv rows into
81,708 nodes (13 labels)
311,730 relationships (15 types)
under a minute with chunk size of 1000.
All the sessions of the conference are now available online