Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
11-11-2019 12:07 AM
Hi there, I am brand new to neo4j
Does anyone know how to export data to CSV file including relationships and nodes using py2neo?
Thanks for any help that anyone can offer
khaled
11-12-2019 05:35 AM
rather than write your own CSV exporter you could use APOC and specifically the export functions are described at https://neo4j.com/docs/labs/apoc/current/export/
Installation is also described in this same document
11-12-2019 05:48 AM
Thank you for replying.
I have tried to use APOC export but it is too slow since i have more than 200M nodes.
Any suggestion to speed up the exporting process
11-12-2019 05:51 AM
do you have more detail on 'it is too slow'? if you change the cypher query to simply return count(*);
rather than for example return person.name, person.age, person.address
does this significantly effect performance. If so then maybe the slowness is simply disk IO for writing the 200m nodes to a file?
Can you post the explain plan of the query?
Have you configured min/max heap and pagecache in the neo4j.conf.
11-12-2019 06:07 AM
This is the query i used
CALL apoc.export.csv.query("MATCH (v:Txhash)-[r:TO]->(u:Address)-[:PARTOF]-(m:User) RETURN m.userID ,u.address,v.txhash,v.n_inputs,v.unixtim,r.value,m.balances ", "inputs.csv", {batchSize:200000, parallel:false})
I have configured the min/max heap and pagecache in the neo4j.conf
Note that my RAM is 8G
Regarding py2neo i have come up with the following script
from neo4j import GraphDatabase
import csv
driver = GraphDatabase.driver(uri="bolt://localhost:7687", auth=("neo4j", "123"))
with open('result.csv', 'w',newline='') as csvFile:
writer = csv.writer(csvFile)
session = driver.session()
q1 = "MATCH (v:Txhash)-[r:TO]->(u:Address)-[:PARTOF]-(m:User) RETURN m.userID ,u.address,v.txhash,v.n_inputs,v.unixtim,r.value,m.balances"
nod = session.run(q1)
for j in nod:
writer.writerow(j)
It works fine but it is also slow, any suggestion
11-12-2019 06:32 AM
8G RAM is quite small. 8G to be split between pagecahce and heap ?
what about
if you change the cypher query to simply return count(*); rather than for example return person.name, person.age, person.address does this significantly effect performance. If so then maybe the slowness is simply disk IO for writing the 200m nodes to a file?
also for what its worth as you have no WHERE clause in your MATCH statement the query is effectively a ScanNodesByLabel and there is no opportunity to use indexes etc. Effectively a TableScan in RDBMS world.
All the sessions of the conference are now available online