Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
11-26-2018 07:12 AM
Hello!
I'm brand new to this forum (and to Neo4j actually).
I would like to import a csv data file, with its nodes and relationships onto Neo4j, using a python script.
I have 2 nodes with a few properties, and would like to establish a relationship between them.
By cypher commands on the browser (Chrome, up-to-date) it works fine; it's just super long (the file has 17M lines, I have to split it to do it in Chrome and it's still very long - hence the python import).
I'm cutting the csv file between header and "core" part as requested.
I'm calling the shell commands --import for both nodes and relationships, it works fine for nodes, but breaks for the relationships.
It tells me that the TYPE of the relationship is missing. But I have no idea what it should be, plus where to define it... And in the Neo4j documentation I didn't find a clear anwser about what are the relationship types. Is that different of the label?
Here is the part of my code where I define the relationship between my 2 nodes:
test_rel = node1[node2['common_variable'] != 'NaN']
test_rel['Label'] = 'CREATES'
#write data
test_rel.to_csv(export_path+'/test_rel.csv',index=False, header=False)
#write header
with open(export_path+'/test_rel-header.csv','w',newline='') as f:
writer=csv.writer(f)
writer.writerow([':START_ID','common_variable',':END_ID', ':TYPE'])
And the error I get:
original error: start:B010 (global id space) type:null end:CREATES (global id space) is missing TYPE field
Any idea? I can provide a lot more details if needed, but I don't really know what would be relevant for you guys.
Thank you!
Arnaud
11-26-2018 10:21 PM
could you please share your Csv's header and let us know that what properties you want to have for both nodes .?
and i think you can do it directly in python , you have to create connection with neo4j using python and after that just pass your query in script ,
it would work like charm.
11-27-2018 12:59 AM
Hello,
Thank you for your answer.
My CSV header is
:START_ID,common_variable,:END_ID,:TYPE
The properties for node1 would be prop1, common_variable, prop2
And for node2 it'd be common_variable, prop3.
I already set up a connection with neo4j with python, it works fine to connect, but I don't see the advantages or just running the queries directly in python, compared to run them in the browser. It should be as slow as in the browser, right?
The point here is to extract and load the csv files much faster (and later in an automated way), by cutting the header and corpse and defining nodes properties and relationships in python. Does that make sense?
Thanks!
11-27-2018 01:55 PM
You can basically feed a list of pair to your cypher statement e.g. batches of 10k pairs
and then use in cypher
UNWIND $rows AS row
MATCH (a:Label),(b:Label2) where a.id = row.from, b.id = row.to
MERGE (a)-[:REL]->(b)
see:
11-27-2018 01:57 PM
neo4j-import is an offline bulk loader
so it creates a new database from your CSV files.
See: https://neo4j.com/docs/operations-manual/current/tutorial/import-tool/
11-28-2018 02:41 AM
Thank you for your answers Michael.
But additionnal to create a new database with the neo4j-import command, my script is performing some cleansing on my data (which is not really clean), so I would like to do everything in this script.
The python import of the whole 17M lines file is working fine and is pretty quick (a few minutes at worst). My problem is when I try to define the relationships between nodes created from this file.
11-28-2018 03:30 AM
Do you mean manually create via cypher?
There are some ways of speeding it up, depending on what exactly you need.
Did you try my statement?
If you need to create a lot of data, you send smaller batches (10k-100k) from python.
I guess you have already indexes/constraints on your key fields that you look up the nodes with?
There is one extra trick by using apoc.map.groupBy to create an in-memory cache.
MATCH (n:Label)
WITH apoc.map.groupBy(collect(n),"id") as cache1
MATCH (m:Label2)
WITH cache1, apoc.map.groupBy(collect(m),"id") as cache2
UNWIND $rows AS row
WITH cache1[row.from] AS a, cache2[row.to] as b
MERGE (a)-[:REL]->(b)
11-28-2018 03:30 AM
Oh and there is apoc.import.csv which can read the neo4j-import
files directly into a live database.
06-12-2020 01:21 PM
Hello guys, it's a 1st time I am using neo4j, I am not sure about most of the things in neo4j.
Is there any python script which can read csv file and feed it into neo4j ?
All the sessions of the conference are now available online