Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
09-21-2020 06:57 AM
Hi everyone,
I'm struggling to efficiently import multiple types of nodes from a single CSV file.
The setup is as follows:
I have a file containing (unique) patent application IDs. The file also holds the associated patent family ID for each application (not unique, multiple patents can be assigned to the same family). I want to create applications and families as distinctly labeled nodes and record the affiliation relation between them. Currently I do so as follows:
:auto USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM 'file:///path/to/file.csv' AS row
FIELDTERMINATOR ';'
CREATE (app:Application {appln_id: toInteger(row.appln_id)})
MERGE (fam:Family {family_id: toInteger(row.family_id)})
CREATE (app)-[:BELONGS_TO]->(fam)
SET app.filing_date = date(row.appln_filing_date),
app.filing_year = toInteger(row.appln_filing_year),
app.granted = row.granted
Because the family IDs are not unique in the file (and there also exists no file with unique family IDs), I have to use the MERGE
clause (without an index on family IDs). Running this query seems prohibitively expensive even for a small sample of the final dataset, which has many millions of applications.
I'd appreciate any help in making this feasible!
PS: I hope I'm not overlooking similar questions that have been answered before, im still very new to Neo4j and don't know yet how to effectively search for stuff.
Solved! Go to Solution.
09-21-2020 11:53 PM
Here's my solution after some fiddling, just in case another newbie like me has a similar problem :
CREATE CONSTRAINT ON (a:Application) ASSERT a.appln_id IS UNIQUE;
CREATE CONSTRAINT ON (f:Family) ASSERT f.family_id IS UNIQUE;
:auto USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM 'file:///path/to/file.csv' AS row
FIELDTERMINATOR ';'
CREATE (app:Application {appln_id: toInteger(row.appln_id)})
SET app.filing_date = date(row.appln_filing_date),
app.filing_year = toInteger(row.appln_filing_year),
app.granted = row.granted;
:auto USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM 'file:///path/to/file.csv' AS row
FIELDTERMINATOR ';'
MERGE (fam:Family {family_id: toInteger(row.docdb_family_id)});
09-21-2020 07:00 AM
You can still use the index even though the values might not be unique.
The syntax is:
CREATE INDEX index_name FOR (n:Person)
ON (n.surname)
09-21-2020 07:24 AM
Thanks for the tip!
I tried splitting the import into two steps by first importing the family IDs as an attribute, putting an index on them and then using MERGE
to create nodes for the family IDs and the relationship between the two sets of nodes:
:auto USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM 'file:///path/to/file.csv' AS row
FIELDTERMINATOR ';'
CREATE (app:Application {appln_id: toInteger(row.appln_id)})
SET app.docdb_family_id = toInteger(row.docdb_family_id),
app.filing_date = date(row.appln_filing_date),
app.filing_year = toInteger(row.appln_filing_year),
app.granted = row.granted;
CREATE INDEX idx_docdb_family_id FOR (a:Application) ON (a.docdb_family_id);
MATCH (app:Application)
MERGE (fam:Family {family_id: app.docdb_family_id})
MERGE (app)-[:BELONGS_TO]->(fam)
However, this seems to still take very long so I'm not sure if I'm utilizing the index correctly.
09-21-2020 11:53 PM
Here's my solution after some fiddling, just in case another newbie like me has a similar problem :
CREATE CONSTRAINT ON (a:Application) ASSERT a.appln_id IS UNIQUE;
CREATE CONSTRAINT ON (f:Family) ASSERT f.family_id IS UNIQUE;
:auto USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM 'file:///path/to/file.csv' AS row
FIELDTERMINATOR ';'
CREATE (app:Application {appln_id: toInteger(row.appln_id)})
SET app.filing_date = date(row.appln_filing_date),
app.filing_year = toInteger(row.appln_filing_year),
app.granted = row.granted;
:auto USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM 'file:///path/to/file.csv' AS row
FIELDTERMINATOR ';'
MERGE (fam:Family {family_id: toInteger(row.docdb_family_id)});
All the sessions of the conference are now available online