Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
02-21-2022 03:09 PM
Hello all,
I'm loading data from Azure SQL server to neo4j hosted on docker in a Linux server. one of the table that includes ~7 million rows cannot be loaded in one run and kept causing connection timeout issue.
can someone help on configuring the timeout during the JDBC connection?
below is my connection string and import query -
:param azure_url => 'jdbc:sqlserver://SERVER;databaseName=DATABASE;user=User;password=PW';
CALL apoc.periodic.iterate('call apoc.load.jdbc($azure_url, "select distinct m.person_id, m.measurement_id, m.measurement_concept_id,m.measurement_date, m.unit_concept_id, m.unit_concept_name, m.value_as_number from omop.measurement m where m.measurement_id > 200000000")
yield row
','
with row
where row.unit_concept_id is not null
and row.value_as_number is not null
match (p:Person {person_id:row.person_id})
match (m:Measurement {measurement_concept_id: row.measurement_concept_id})
merge (p)-[r:HAS_MEASUREMENT {measurement_id:row.measurement_id}]->(m)
on create
set
r.measurement_date=row.measurement_date,
r.unit_concept_id=row.unit_concept_id,
r.unit_concept_name=row.unit_concept_name,
r.value_as_number=row.value_as_number
',{batchSize:100000, iterateList:True, parallel:False, params:{azure_url:$azure_url}, retries:5
}
);
Solved! Go to Solution.
03-02-2022 02:28 PM
Thank you all for the nice suggestions!
just wanted to update that this was solved by testing using neo4j-admin import - Import - Operations Manual
loaded super fast compared to apoc.load.jdbc or apoc.load csv as below
IMPORT DONE in 29s 560ms.
Imported:
1944 nodes
12987721 relationships
45588552 properties
Peak memory usage: 76.06MiB
02-21-2022 03:43 PM
apoc.load.jdbc is bit difficult to control and continue in case of failure.
There are 2 options.
Advantage of this approach is that you can start from the place of failure so that you don't need to restart the whole process. You can fix the cypher and continue.
02-22-2022 09:08 AM
Thanks a lot @anthapu for the suggestions.
I haven't tried apache hop yet but will look into that. as for csv, I don't think it can store more than ~1million rows so might need to use txt (not sure if neo4j supports) or json.
02-22-2022 04:39 PM
You could load json into neo4j.
The Python ingest utility GitHub - neo4j-field/pyingest can ingest JSON also into neo4j
02-23-2022 08:31 AM
I think one can export csv in sql server in manageable volumes and write a cutom ETL pipeline to ingest data into Neo4j rather than relying on ready made open source libraries until you are sure of its design goals.
03-02-2022 02:28 PM
Thank you all for the nice suggestions!
just wanted to update that this was solved by testing using neo4j-admin import - Import - Operations Manual
loaded super fast compared to apoc.load.jdbc or apoc.load csv as below
IMPORT DONE in 29s 560ms.
Imported:
1944 nodes
12987721 relationships
45588552 properties
Peak memory usage: 76.06MiB
03-03-2022 01:02 PM
@mengjiakang2017
neo4j-admin import
will be faster than apoc.load.jdbc
but neo4j-admin import
can only operate on a new/empty database. You can not run neo4j-admin import
against an existing/populated database. It is also run against a offline database. Whereas apoc.load.jdbc
behaves in the exact opposite manner
03-09-2022 11:32 AM
Yes. I agree that apoc is great for database updates while neo4j-admin works well for database building from the ground. I haven't checked apache hop but it might be a good candidate for large data import based on Anthapu's suggestion.
All the sessions of the conference are now available online