cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Export data to csv using py2neo

Khaled
Node Link

Hi there, I am brand new to neo4j
Does anyone know how to export data to CSV file including relationships and nodes using py2neo?

Thanks for any help that anyone can offer
khaled

5 REPLIES 5

rather than write your own CSV exporter you could use APOC and specifically the export functions are described at https://neo4j.com/docs/labs/apoc/current/export/

Installation is also described in this same document

Thank you for replying.
I have tried to use APOC export but it is too slow since i have more than 200M nodes.
Any suggestion to speed up the exporting process

do you have more detail on 'it is too slow'? if you change the cypher query to simply return count(*); rather than for example return person.name, person.age, person.address does this significantly effect performance. If so then maybe the slowness is simply disk IO for writing the 200m nodes to a file?
Can you post the explain plan of the query?
Have you configured min/max heap and pagecache in the neo4j.conf.

This is the query i used

CALL apoc.export.csv.query("MATCH (v:Txhash)-[r:TO]->(u:Address)-[:PARTOF]-(m:User) RETURN m.userID ,u.address,v.txhash,v.n_inputs,v.unixtim,r.value,m.balances ", "inputs.csv", {batchSize:200000, parallel:false})

I have configured the min/max heap and pagecache in the neo4j.conf
Note that my RAM is 8G

Regarding py2neo i have come up with the following script

from neo4j import GraphDatabase
import csv

driver = GraphDatabase.driver(uri="bolt://localhost:7687", auth=("neo4j", "123"))

with open('result.csv', 'w',newline='') as csvFile:
writer = csv.writer(csvFile)
session = driver.session()
q1 = "MATCH (v:Txhash)-[r:TO]->(u:Address)-[:PARTOF]-(m:User) RETURN m.userID ,u.address,v.txhash,v.n_inputs,v.unixtim,r.value,m.balances"
nod = session.run(q1)

for j in nod:
    writer.writerow(j)

It works fine but it is also slow, any suggestion

8G RAM is quite small. 8G to be split between pagecahce and heap ?

what about

if you change the cypher query to simply return count(*); rather than for example return person.name, person.age, person.address does this significantly effect performance. If so then maybe the slowness is simply disk IO for writing the 200m nodes to a file?

also for what its worth as you have no WHERE clause in your MATCH statement the query is effectively a ScanNodesByLabel and there is no opportunity to use indexes etc. Effectively a TableScan in RDBMS world.