cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

APOC Periodic Iterate with APOC Load JDBC To Load Data in Batches

carib
Graph Buddy

Hello,

What is the best way to import 2,000,000 rows of data into Neo4j from a postgresql database? By following the docs on apoc.load.jdbc, I managed to install the postgresql driver and to query the postgresql by yielding as rows in the Neo4j browser the data that I need.

But I'm confused in this part of the docs here to load data in batches:

Does this mean that the first statement in apoc.periodic.iterate ('CALL apoc.load.jdbc("jdbc:mysql://localhost:3306/northwind?user=root","company")') runs first and then that entire data is partitioned into batches of 10000 rows and then loaded using the second statement (CREATE (p:Person) SET p += value)?

If my understanding is correct, does that mean that if my sql query to be passed to the apoc.load.jdbc is something like:

SELECT dt, employee_id, department_id
FROM my_table
WHERE dt > now() - INTERVAL \'24 hours\'
ORDER BY dt DESC, employee_id

and this returns 2,000,000 rows, then apoc.periodic.iterate will take the results of this query and load them to Neo4j in batchSize of 10,000 at a time?

What would be a faster way to load data from postgresql to Neo4j?

Thanks

3 REPLIES 3

Hello

I think the way could be to export the data from postgre into CSV files with 20000 rows in each CSV and load them periodically into Neo4j through cron scheduler.

Thanking you
Yours faithfully
Sameer Gijare

Hi Sameer,

Thank you for your input. If I export csv files with 20,000 rows in each, we're talking about 100 files. There are days when my sql query returns 5,000,000 rows so breaking up the output from postgres into multiple csv files may not scale over time.

What if I do batch transactions with apoc.periodic.iterate?

You can write your own script to read input csvs whether large or small. It could be small java program that polls a particular folder periodically, connects to graph DB ingests the data. You can if it is required to transform the data in your domain business objects. Later on once tested you can install it as a plug in in neo4j with configuration file for tuning dynamic parameters.
Thanking you
Sameer Sudhir G