cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Creating a lot of nodes at once

I've been using the python driver to generate a very large graph. Tens of millions of nodes. The problem is, I only seem to be generating a few hundred nodes per second. At this rate it will take days to load in what is essentially only a few megabytes of data.
This leads me to be believe that somewhere the data is being bottlenecked. And idea where this might be, and how I could get around it?

2 REPLIES 2

intouch_vivek
Graph Steward

Hi @m.a.klaczynski,

Welcome to Neo4j community!!

You can try these points:

  1. Use apoc.periodic.iterate. https://neo4j.com/docs/labs/apoc/current/graph-updates/periodic-execution/
  2. Also you can try parallel execution in python.
  3. You can try chunking data using pandas before ingestion.

There are multiple ways to fasten the ingestion

anthapu
Graph Fellow

Here's a generic python script that is driven by config yml file to ingest data in batch mode for CSV and json files.

A sample config yml file would be

server_uri: bolt://localhost:7687
admin_user: neo4j
admin_pass: test

files:
  - url: file://my/file/a.csv
    cql: |
      WITH $dict.rows as rows UNWIND rows as row
          MERGE (a:ALabel {p1: row.p1, p2: row.p2, p3: row.p3})
  - url: file://my/file/b.csv
    chunk_size: 100
    cql: |
      WITH $dict.rows as rows UNWIND rows as row
        MERGE (b:BLabel {p1: row.p1, p2: row.p2, p3: row.p3})

Increase the chunk size based on your heap and txn size.

I was able to import few 100,000 csv rows into

81,708 nodes (13 labels)

311,730 relationships (15 types)

under a minute with chunk size of 1000.