Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
07-24-2022 06:55 AM
Hey Guys,
I have 2 needs as of now where I need some guidance desperately(I am new to Neo4J as well):
So, for one of my projects, I need to build ETL pipelines within GCP for:
1. Transferring the Historical data from BigQuery to Neo4J
2. Transferring the Incremental data from BigQuery to Neo4J
Now, I am done with creating my pipelines in BQ for incremental load.
How should I proceed?
I can see some approaches suggesting:
1. Airflow. Will it work for huge amount of data? Will it be able to scale? Also, I think it will take time to load the data into Neo4J if the data volume is huge.(p.s.: Although, client doesn't have an existing composer instance; so, it will be painful to convince them for the same. If need be, and if there is no option, then I will do it).
2. The second approach that I could find was using Cloud DataProc where Apache Spark jobs can be written with Neo4J Connector.
Any suggestions on which approach should I follow? If there is any other approach than these, feel free to let me know. It will help me a ton!
Regards,
Pinakin
07-24-2022 10:14 AM
If your data are that huge, you can also choose to export your tables as neo4j formated csvs in cloud storage then retreive these csv in your neo4j environment and using neo4j-admin import tool.
It's THE tool for massive data
All the sessions of the conference are now available online