cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Load large volume of data in neo4j

12kunal34
Graph Fellow

Hi Graphis,

I have some doubts regarding importing data into Neo4j.
I have a large volume of data (i have 100k JSON files and each JSON contains 200k records).
what is the best way to import this data?
I am using Pyspark and neo4j-admin import currently. is there any alternative method for this or can I import this much of huge data using pyspark only?

3 REPLIES 3

paulare
Graph Buddy

Hi @12kunal34

Maybe this blog is helpful

If you can describe a more specific issue you having with your current method, then maybe the community may give you more ideas back?

krisgeus
Node Clone

Using apache spark only will most likely result in deadlock situations for large graphs. Creating files and using the admin import is currently the best option I believe.
There might be a possibility to run a clustering algorithm on your graph in spark and separate the clusters so you get rid of the deadlocks. At the end you need to create the cluster connections again of course.
This is in no way an easy solution though.

anthapu
Graph Fellow

You could try this utility written in python.

It has a config yaml file, where you can specify the file URL and corresponding cypher to ingest the data. It import each file in sequence.

If you want to parallelize the import, you can create multiple config yaml files and run them in parallel.

As others mentioned, when you run in parallel there is a possibility of dead locks, as relationship creation locks both side nodes.

Neo4J admin import would still be fastest way to import huge amount of initial data.