cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Neo4j on Ubuntu

How is the write performance of Neo4j on ubuntu? I'm trying to load yelp dataset to neo4j using a cypher query and it took me 7 hours to load a 300 mb json file with 200k nodes. I'm using ubuntu 19.04 laptop, 32gb ram, 4 core i7 cpu with NVMe. I tried changing the IO scheduler from deadline to none, increased the heap size and memory in the config files but with little improvements. Is this due to the Linux file system or is the load performance so poor in all environments?

If anyone has faced a similar problem and has a figured a way out, let me know.

1 ACCEPTED SOLUTION

Common issues with large imports are:

  1. lack of indexes (in case your statements use MATCH or MERGE)
  2. too large transactions

To get a better understanding pls share what exactly you do for importing.

View solution in original post

3 REPLIES 3

Common issues with large imports are:

  1. lack of indexes (in case your statements use MATCH or MERGE)
  2. too large transactions

To get a better understanding pls share what exactly you do for importing.

Hi Stephan,

This is the cypher query I'm using to import.

CALL apoc.periodic.iterate(
"CALL apoc.load.json('file:///business.json')
YIELD value
WITH value
RETURN value",
"MERGE (b:Business {id:value.business_id})
SET b += apoc.map.clean(value,
['attributes','hours','business_id','categories',
'address','postal_code'],
)",
{iterateList: true, batchSize:10000, parallel: true});

and this is a sample json record in the file
{"business_id":"1SWheh84yJXfytovILXOAQ","name":"Arizona Biltmore Golf Club","address":"2818 E Camino Acequia Drive","city":"Phoenix","state":"AZ","postal_code":"85016","latitude":33.5221425,"longitude":-112.0184807,"stars":3.0,"review_count":5,"is_open":0,"attributes":{"GoodForKids":"False"},"categories":"Golf, Active Life","hours":null}

The cypher statement looks good to me.

Do you have an index created upfront prior to run the import: create index on :Business(id)? Not that if use a unique constraint it requires global lock upon writes, so parallel:true will not work in this case.
For maximum performance use a regular index.
You can also play with batchsize value, try e.g. 1000, 10000 and maybe 100000 to see what is faster in your case.