Neo4j

Khaled · ‎11-11-2019

Hi there, I am brand new to neo4j
Does anyone know how to export data to CSV file including relationships and nodes using py2neo?

Thanks for any help that anyone can offer
khaled

dana_canzano · ‎11-12-2019

rather than write your own CSV exporter you could use APOC and specifically the export functions are described at https://neo4j.com/docs/labs/apoc/current/export/

Installation is also described in this same document

Khaled · ‎11-12-2019

Thank you for replying.
I have tried to use APOC export but it is too slow since i have more than 200M nodes.
Any suggestion to speed up the exporting process

dana_canzano · ‎11-12-2019

do you have more detail on 'it is too slow'? if you change the cypher query to simply return count(*); rather than for example return person.name, person.age, person.address does this significantly effect performance. If so then maybe the slowness is simply disk IO for writing the 200m nodes to a file?
Can you post the explain plan of the query?
Have you configured min/max heap and pagecache in the neo4j.conf.

Khaled · ‎11-12-2019

This is the query i used

CALL apoc.export.csv.query("MATCH (v:Txhash)-[r:TO]->(u:Address)-[:PARTOF]-(m:User) RETURN m.userID ,u.address,v.txhash,v.n_inputs,v.unixtim,r.value,m.balances ", "inputs.csv", {batchSize:200000, parallel:false})

I have configured the min/max heap and pagecache in the neo4j.conf
Note that my RAM is 8G

Regarding py2neo i have come up with the following script

from neo4j import GraphDatabase
import csv

driver = GraphDatabase.driver(uri="bolt://localhost:7687", auth=("neo4j", "123"))

with open('result.csv', 'w',newline='') as csvFile:
writer = csv.writer(csvFile)
session = driver.session()
q1 = "MATCH (v:Txhash)-[r:TO]->(u:Address)-[:PARTOF]-(m:User) RETURN m.userID ,u.address,v.txhash,v.n_inputs,v.unixtim,r.value,m.balances"
nod = session.run(q1)

for j in nod:
    writer.writerow(j)

It works fine but it is also slow, any suggestion

dana_canzano · ‎11-12-2019

8G RAM is quite small. 8G to be split between pagecahce and heap ?

what about

if you change the cypher query to simply return count(*); rather than for example return person.name, person.age, person.address does this significantly effect performance. If so then maybe the slowness is simply disk IO for writing the 200m nodes to a file?

also for what its worth as you have no WHERE clause in your MATCH statement the query is effectively a ScanNodesByLabel and there is no opportunity to use indexes etc. Effectively a TableScan in RDBMS world.

Neo4j

Export data to csv using py2neo