Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
02-07-2020 04:42 AM
Hi there.
I tried neo4j-etl cli tool V1.3.1.
This is very fast for importing only nodes, but it seems to be very slowly when relations import.
Is it check the connection target table in the process of generating the CSV?
Details are as follows,
RDB(PostgreSQL) has one table for node and relation table like under ER.
There are 3 sets it same structural tables and actually more columns has it.
I edited the mapping.json like this.
[ {
"name" : "NODE1_db1",
"schema" : "db1",
"graph-object-type" : "Node",
"sql" : "SELECT id FROM db1.company AS company",
"mappings" : [ {
"column" : {
"type" : "CompositeColumn",
"table" : "company",
"schema" : "db1",
"role" : "PrimaryKey",
"columns" : [ {
"type" : "SimpleColumn",
"role" : "Data",
"table" : "company",
"schema" : "db1",
"name" : "id",
"alias" : "id",
"sql-data-type" : "INTEGER",
"column-value-selection-strategy" : "SelectColumnValue"
} ]
},
"field" : {
"type" : "Id",
"name" : "",
"id-space" : "db1.company"
}
}]}, {
"name" : "Relation_db1",
"schema" : "db1",
"graph-object-type" : "Relation",
"sql" : "SELECT start_id, end_id, relation_type FROM (SELECT start_id, end_id, relation_type FROM db1.company AS start_company JOIN db1.relation AS relation ON start_company.id = relation.start_id JOIN db1.company AS end_company ON end_company.id = relation.end_id) AS company_relation",
"mappings" : [ {
"column" : {
"type" : "SimpleColumn",
"role" : "Data",
"table" : "company",
"schema" : "db1",
"name" : "start_id",
"alias" : "start_id",
"sql-data-type" : "INTEGER",
"column-value-selection-strategy" : "SelectColumnValue"
},
"field" : {
"type" : "startId",
"name" : "start_id",
"id-space" : "db1.company_relation"
}},{
"column" : {
"type" : "SimpleColumn",
"role" : "Data",
"table" : "company",
"schema" : "db1",
"name" : "end_id",
"alias" : "end_id",
"sql-data-type" : "INTEGER",
"column-value-selection-strategy" : "SelectColumnValue"
},
"field" : {
"type" : "endId",
"name" : "end_id",
"id-space" : "db1.company_relation"
}}, {
"column" : {
"type" : "SimpleColumn",
"role" : "Data",
"table" : "company",
"schema" : "db1",
"name" : "relation_type",
"alias" : "relation_type",
"sql-data-type" : "INTEGER",
"column-value-selection-strategy" : "SelectColumnValue"
},
"field" : {
"type" : "RELATION_TYPE",
"name" : "relation_type",
"id-space" : "db1.company_relation"
}
}]
}
]
The number of result records is 700,000 nodes and 2,500,000 relations.
10300s, about 3 hours..
When I deleted the description of Relation and tried to execute it only on the node, it ended in about 20 seconds. Searching the RDB with SQL alone takes only a few seconds, so I think that there is something in the Neo4j ETL logic.
please let me know if you have any informations
Thanks.
Solved! Go to Solution.
02-07-2020 04:52 AM
Importing relationships is very very slow. When creating a relationship between two nodes, the database locks both nodes. So it is not even possible to speed it up with parallel processing unless it can be guaranteed that no two processes run at the same time, is making a relationship with an overlapping node.
I don't have any experience with the ETL tool, but if you are injecting data in a new database, I would definitely look into using the neo4j-admin import tool and make exports from the PostgreSQL database to CSV, preprocess it with a scripting language of your choice (make sure there are no duplicates, escape quotes, write csv files as documented …), and then run the import.
02-07-2020 04:52 AM
Importing relationships is very very slow. When creating a relationship between two nodes, the database locks both nodes. So it is not even possible to speed it up with parallel processing unless it can be guaranteed that no two processes run at the same time, is making a relationship with an overlapping node.
I don't have any experience with the ETL tool, but if you are injecting data in a new database, I would definitely look into using the neo4j-admin import tool and make exports from the PostgreSQL database to CSV, preprocess it with a scripting language of your choice (make sure there are no duplicates, escape quotes, write csv files as documented …), and then run the import.
02-07-2020 05:09 AM
Lock!!!! That's the point!
As you say, admin-import is very fast it alone, so I'm going to go that way CSV export from PostgreSQL.
Your explanation was really helpful!
Thanks a lot
All the sessions of the conference are now available online