cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

ETL for loading csv

Hi ,
I am trying to load a CSV file for relationships. I am using CSV load and it is very slow. I do not want to use import , because that requires a clean database and I already have data present. I am wondering if NEO4J ETL would be faster ?
I am running the following code as of now through the NEO4J browser

:auto USING PERIODIC COMMIT 10000 LOAD CSV WITH HEADERS FROM "file:///ACTIVE_INGREDIENTS_BY_COUNTRY_COUNTRY_MAPPING.csv" AS row
MATCH (ac:ActiveIngredientsByCountry {ACTIVE_INGREDIENT_BY_COUNTRY_ID: row.ACTIVE_INGREDIENT_BY_COUNTRY_ID}),(c:Country {COUNTRY_CODE : row.COUNTRY_CODE} )
create (ac)-[:ACTIVE_INGREDIENT_BY_COUNTRY_COUNTRY_ASSOCIATION]->(c)

It creates relationships between two kinds of entities ACTIVE_INGREDIENTS_BY_COUNTRY (contains about 400k nodes)
COUNTRY (contains about 3 nodes)

This query takes about 3 days and we need to make it faster. How can I do this in a database in which data is already present ?

Thanks,
Samik

1 REPLY 1

webtic
Graph Fellow

What I would do is create a script which reads the CSV, does any pre-processing needed and apply the Cypher to the database. Personally I would grab Python because I am proficient in it.

Without seeing the actual data it is always a guess but from what you sketch I would not be surprised if you could optimise it in something which takes hours instead of days.