Neo4j

m_a_klaczynski · ‎05-07-2020

I've been using the python driver to generate a very large graph. Tens of millions of nodes. The problem is, I only seem to be generating a few hundred nodes per second. At this rate it will take days to load in what is essentially only a few megabytes of data.
This leads me to be believe that somewhere the data is being bottlenecked. And idea where this might be, and how I could get around it?

intouch_vivek · ‎05-07-2020

Hi @m.a.klaczynski,

Welcome to Neo4j community!!

You can try these points:

Use apoc.periodic.iterate. https://neo4j.com/docs/labs/apoc/current/graph-updates/periodic-execution/
Also you can try parallel execution in python.
You can try chunking data using pandas before ingestion.

There are multiple ways to fasten the ingestion

anthapu · ‎05-07-2020

Here's a generic python script that is driven by config yml file to ingest data in batch mode for CSV and json files.

A sample config yml file would be

server_uri: bolt://localhost:7687
admin_user: neo4j
admin_pass: test

files:
  - url: file://my/file/a.csv
    cql: |
      WITH $dict.rows as rows UNWIND rows as row
          MERGE (a:ALabel {p1: row.p1, p2: row.p2, p3: row.p3})
  - url: file://my/file/b.csv
    chunk_size: 100
    cql: |
      WITH $dict.rows as rows UNWIND rows as row
        MERGE (b:BLabel {p1: row.p1, p2: row.p2, p3: row.p3})

Increase the chunk size based on your heap and txn size.

I was able to import few 100,000 csv rows into

81,708 nodes (13 labels)

311,730 relationships (15 types)

under a minute with chunk size of 1000.

Neo4j

Creating a lot of nodes at once