Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
09-23-2021 07:26 AM
Hi Everyone,
I am loading a csv file to create nodes in neo4j, when I tried with a csv of around 1,000 rows it took 1 second for the nodes to be created, when I increased my dataset to 3,000 rows it is taking 15 seconds.
Can someone please suggest, how to reduce this time and why is this difference coming for 3,000 rows ?
What is the best way to create a graph using csv with large dataset ?
Below is the query that I use to create my graph nodes and set the properties:
load csv with headers from 'file:///storage.csv' as line
merge(a:Storage{name:line.code+" "+date(line.Date).month+"-"+date(line.Date).year+" "+line.Product}) on create set a.Incoming_Stock=toFloat(line.incoming_stock),a.Opening_Inv_Physical=toFloat(line.opening_inventory_physical),a.Target_Closing_Inv=toFloat(line.target_closing_inventory),a.Outflow_Requirement=toFloat(line.outflow_requirement), a.date=date(line.Date), a.Product=line.Product,a.Node=line.code;
Thanks
Solved! Go to Solution.
09-24-2021 10:36 AM
Creating an index helped me to reduce the time in relationship creation
09-23-2021 07:34 AM
09-23-2021 07:41 AM
Hi,
No I haven't created a UNIQUE CONSTRAINT, but the property name
which I am creating for my nodes will always be unique as that's how I create my input csv. So I know that multiple nodes will not be created.
Just now I tried to create Unique Constraint on my Storage Node, after importing the csv and creating the node. But how will this reduce the time taken to load csv ?
Its taking 15 seconds just to create 3000 nodes in my graph.
Thanks,
Pragya
09-24-2021 03:06 AM
Hi,
My nodes are now getting created within milliseconds as I changed my cypher query from merge
to create
, but my relationship is taking around 60 seconds. Any suggestions on how to increase the speed of relationship creation from csv.
Below is the code that I have used:
load csv with headers from 'file:///transport_laporte_db10.csv' as line
match(sender:Storage{name:line.sender_node+" "+date(line.sender_date).month+"-"+date(line.sender_date).year+" "+line.Product})
match(receiver:Storage{name:line.receiver_node+" "+date(line.receiver_date).month+"-"+date(line.receiver_date).year+" "+line.Product})
merge(sender)-[rel:transport{mode:line.mode,lead_time:toInteger(line.lead_time), quota:toInteger(line.quota)}]->(receiver);
Thanks
09-24-2021 10:36 AM
Creating an index helped me to reduce the time in relationship creation
09-24-2021 05:54 PM
Just some context for this...
When there is no index present, then, per row, Cypher will do a label scan for every single :Storage node, performing property access to see if the node exists.
So if you have 10000 :Storage nodes in the database, and 3000 rows in the CSV, then it will be performing 3000 label scans, meaning that it will ultimately be doing 3000 * 10000 = 30000000 node comparisons. So the speed of loading becomes linearly proportional to the number of nodes with the given label * the number of rows in your CSV, and that's only considering a single MERGE. If there are multiple MERGEs on nodes that aren't index-backed, then the problem compounds.
By contrast, when there is an index in place, then there will be only one index lookup performed per row, so 3000 index lookups, which are quite quick.
All the sessions of the conference are now available online