cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

LOAD CSV long loading times

I know this question has been asked many time, but from those threads I didn't get any straight solutions that would solve my loading times

I am pretty sure my loading times are taking too long, and I am not sure is it the code or the software or the hardware. So far my loading times are in minutes for LOAD CSV, while the admin import takes few seconds.

E.g. for a CSV file of 14k lines I am getting following times:

Added 14019 labels, created 14019 nodes, set 28037 properties, created 14018 relationships, completed after 183360 ms

The file contains date/time and value measurements in a format:

2019-07-10T19:38:00.062000|51.744617
2019-07-10T19:39:00.065000|52.153733
2019-07-10T19:40:00.066000|51.226583
2019-07-10T19:41:00.069000|51.341583
2019-07-10T19:42:00.070000|51.524967

My code is

LOAD CSV WITH HEADERS FROM 'file:///data.csv' AS row
FIELDTERMINATOR '|'
MERGE (d:Data{time:datetime(row.Time),value:row.Value})
MERGE (s:Sensor{Name:"Test"})   
MERGE (s)-[:HAS_DATA]->(d)

I am using 1.4.5 desktop version and 4.2.5 database version

1 ACCEPTED SOLUTION

Benoit_d
Graph Buddy

Hi @IFC_modeller,

actually I presume your sensor "Test" only exist one time. You should create it separately and call it in the query with a "match " instead of a "merge".

The second point, as @Bennu said, a "merge" is to be used when you want to avoid that a node or a relation is created twice. If a Data-node exist twice, how is it to handle with it? Always take the firsrt one? According to the given information it is not to recognize, that it is mandatory to have a data recording twice as they are all bind to the sensor. It could be a strategy to build them with a create and then if some data recording appears twice to destroy one of them.

LOAD CSV WITH HEADERS FROM 'file:///data.csv' AS row
FIELDTERMINATOR '|'
MATCH (s:Sensor{Name:"Test"})   
CREATE (s)-[:HAS_DATA]->(d:Data{time:datetime(row.Time),value:row.Value})

View solution in original post

4 REPLIES 4

Bennu
Graph Fellow

Hi @IFC_modeller!

Slow loading times may indicate that your MERGE is using too much time to decide if your potential node is already in the DB. In this case the, :Data variability could be an issue. If you have 14k nodes created it means that every merge turns out to be a CREATE so in that case just use it instead. Otherwise, you may like a different usage of the data on your model.

Lastly but most important. Use Constraints on your DB if you are planning to use MERGE and use it as property selector, then split your logic on ON MATCH or ON CREATE.

Hoping to be useful.

H

Thank you for your help, seems like I was overusing merge, by taking some shortcuts, instead of doing it correctly.

Benoit_d
Graph Buddy

Hi @IFC_modeller,

actually I presume your sensor "Test" only exist one time. You should create it separately and call it in the query with a "match " instead of a "merge".

The second point, as @Bennu said, a "merge" is to be used when you want to avoid that a node or a relation is created twice. If a Data-node exist twice, how is it to handle with it? Always take the firsrt one? According to the given information it is not to recognize, that it is mandatory to have a data recording twice as they are all bind to the sensor. It could be a strategy to build them with a create and then if some data recording appears twice to destroy one of them.

LOAD CSV WITH HEADERS FROM 'file:///data.csv' AS row
FIELDTERMINATOR '|'
MATCH (s:Sensor{Name:"Test"})   
CREATE (s)-[:HAS_DATA]->(d:Data{time:datetime(row.Time),value:row.Value})

Hi @Benoit_d
Thank you, by using create instead of merge, the import only took a second.