Neo4j

kostas_vavouris · ‎09-25-2021

Hi,
I want to write python code in order to manage huge volumes of bibliographic data. Specifically i want to write & execute efficiently python code in order to check dataset quality, access huge volumes of bibliographic data in streaming json format and produce csv files from them. Then from the csv files i want to create quickly a Neo4j database that will be accessible through internet. What is more, i want to use for my execution runtimes GPU in order to speed up execution. What infrastructure can implement this kind of work load efficiently and efficiency? This kind of workload must be executed in a fully automated way.

Thanks in advance for your time!

michael_hunger · ‎09-28-2021

That's a lot of questions:

GPU - there is currently no special treatment of GPUs
CSV you can use neo4j-admin import to quickly create databases in bulk from huge CSVs it also supports GZ compression

Python and data import.
The Python driver is not the fastest driver for high volumes, but it should be good enough if you have some patience.
Some people had more success from python using the http API in terms of throughput

they even created a separate library: Announcing neo4j-connector 1.0.0 (python 3.5+)

Otherwise for queries on data quality etc. I suggest the cypher query tuning course to make sure you have always fast queries:

kostas_vavouris · ‎09-28-2021

Thank you very much. Your tips are very valuable!

Neo4j

What is the best Neo4j deployment to create Graph Database in Neo4j with huge volumes of data (~250GB)?