Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
03-26-2021 01:50 AM
Hi, i have csv files with fairly large number of rows(<10M for now). I have to read that
csv and create nodes and relationships among them in neo4j. I'm using pandas for csv data manipulation and py2neo for node and relation creation. But the problem is that for a dataset as small as 500,000 rows it is taking hours(>10 hours) to read data, create nodes and relationships in the graph DB. Is there any solution to this ?
Thanks
Solved! Go to Solution.
03-26-2021 02:51 PM
apoc.load.csv is your new friend if it's a new database or one who needs to be updated.
It's explicitly build for your use case, and it won't fear your tiny 500 000 lines.
Be aware that you must always create constraints before importing any data with a MERGE OR MATCH clause, or your children will die before it's finish.
I used Python before for my data injection, now I use apoc.load.csv to standardize and improve the speed of the process.
03-26-2021 11:40 AM
the absolute fastest way (by far) to load large datasets into neo4j is to use the bulk loader
it is orders of magnitude faster, for one reason it only builds a database from the ground up, so transaction tracking can be (and is) turned off during the load.
caveats: this only works for new databases, it can't be used to add new data to an existing database
03-26-2021 02:51 PM
apoc.load.csv is your new friend if it's a new database or one who needs to be updated.
It's explicitly build for your use case, and it won't fear your tiny 500 000 lines.
Be aware that you must always create constraints before importing any data with a MERGE OR MATCH clause, or your children will die before it's finish.
I used Python before for my data injection, now I use apoc.load.csv to standardize and improve the speed of the process.
05-03-2021 03:34 AM
I'm facing one problem while loading csv data using cypher. The script I am using works fine for both node and relationship creation from the csv but only for limited number of rows in csv (400-500 rows). While i use the same script for original dataset with large number of rows, the script is running infinitely and at last throws an error:
*"ServiceUnavailable: WebSocket connection failure. Due to security constraints in your web browser, *
the reason for the failure is not available to this Neo4j Driver. Please use your browsers development
*console to determine the root cause of the failure. Common reasons include the database being unavailable, *
using the wrong connection URL or temporary network problems. If you have enabled encryption, ensure your browser is
configured to trust the certificate Neo4j is configured to use. WebSocket readyState
is: 3
"
I'm not able to find any working solution for this problem. Can you guide me through it ?
05-03-2021 03:37 AM
Following is the cypher script I'm using:
LOAD CSV WITH HEADERS FROM 'file:///Links.csv' as row
WITH row WHERE row.ObjectID IS NOT NULL
MERGE (f:Fiber Cable
{ObjectID: row.ObjectID, Identifier: row.Identifier, Status: row.Status, RouteType: row.Type})
with f, row
UNWIND split(row.Segments, ' ') AS node
MERGE (n:Node{Identifier:node})
MERGE (f)-[r:ATTACHED_TO]->(n)
return count(f)
05-04-2021 11:40 PM
Node(Identifier)
and on :Fiber Cable(Identifier)
so that the nodes are looked up quickly from that (or add a Node label to the first node you're creating)LOAD CSV WITH HEADERS FROM 'file:///Links.csv' as row
WITH row WHERE row.ObjectID IS NOT NULL
MERGE (f:Node {Identifier: row.Identifier})
ON CREATE SET f:`Fiber Cable`, f.ObjectId=row.ObjectID, f.Status=row.Status, f.RouteType=row.Type
with f, row
UNWIND split(row.Segments, ' ') AS node
MERGE (n:Node{Identifier:node})
MERGE (f)-[r:ATTACHED_TO]->(n)
return count(f)
All the sessions of the conference are now available online