Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
10-22-2021 02:46 AM
Neo4j Server Community edition 4.3.3 Ubuntu 20.04
Hi all,
For a long time I used a query to import a complex CSV file, and it run nicely.
Yesterday night I 'dream' an horrible bug
in the query, but I have no idea how to solve it.
The problem resides in the structure of the data, and in the fact that I must maintain the existing DB content:
Each line describe the following info:
- a primary node, each one with:
- three different labels for the node;
- one of more relationships with another node (regions);
- one relationship with a complex object
So, in my import query, I do the following:
LOAD CSV WITH HEADERS FROM 'xxx' as row FIELDTERMINATOR ';'
// Select a primary node and change its name
WITH row, split(row.Appellation,",") as appellationNames
MERGE (appellation:Appellation { uuid: row.`Appellation UUID` })
SET appellation.name = appellationNames
// Replace the primary node labels
WITH row, appellation
CALL apoc.create.setLabels(appellation,[row.`EU Classification`,row.`Classification`, 'Appellation']) YIELD node
// delete and replace the relationships with the regions
WITH row, appellation, split(row.Region,",") as regions
UNWIND regions as aRegion
MATCH (region:Region)
WHERE ToLower(region.name) = trim(ToLower(aRegion))
WITH row, appellation, region
MATCH (appellation)-[oldRelationship:IS_PRODUCED_IN]->(region)
DETACH DELETE oldRelationship
WITH row, appellation, region
MERGE (appellation)-[r:IS_PRODUCED_IN]->(region)
// then finally start working on the core content which differ for each line
with row, appellation
<DO A LOT OF OTHER THINGS WITH REMAINING INFO>
Where is the problem?
The input file is ordered, and I have a bunch of sequential rows having the same appellation, labels and regions (because these are common data), but with all the remaining information that differs.
So, if I have a sequence of, just to say, 100 rows with the same initial info, I repeat 100 time the query described above, deleting and recreating the same info for each row, when I would execute just the latest query content <DO A LOT OF THINGS WITH REMAINING INFO>
for all the lines except the first.
If I have been able to describe the problem, there someone who see an approach to solve it?
A nice workaround, for me, would be to have the ability to access the previous row ....
Thanks
Solved! Go to Solution.
10-25-2021 05:57 AM
Of course. Let's suppose I have the following csv test.csv
:
Appellation;EU Classification;Classification;Region;Appellation UUID;Other
q,w,e;AAA;1;1,2,3;123-31312;123123
q,w,e;AAA;1;1,2,3;4324-555;6456565656
q,w,e;AAA;1;1,2,3;322-554;wfhaffsdak
a,s,d;AAA;1;1,2,3;342-6545;6456565656
q,w,e;ZZZ;1;1,2,3;23432-3534;dsbjkhffdbkjh
I could do:
LOAD CSV WITH HEADERS FROM 'file:///test.csv' as row FIELDTERMINATOR ';'
with split(row.Appellation,",") as appellationNames, row.`EU Classification` as label, row.`Classification` as classification, split(row.Region,",") as regions, collect(row) as commonRows
MERGE (appellation:Appellation { uuid: commonRows[0].`Appellation UUID` })
SET appellation.name = appellationNames
// Replace the primary node labels
with appellation, label, classification, regions, commonRows
CALL apoc.create.setLabels(appellation,[label,classification, 'Appellation']) YIELD node
WITH node, regions, appellation, commonRows
UNWIND regions as aRegion
MATCH (region:Region)
WHERE ToLower(region.name) = trim(ToLower(aRegion))
WITH commonRows, appellation, region
MATCH (appellation)-[oldRelationship:IS_PRODUCED_IN]->(region)
DETACH DELETE oldRelationship
WITH commonRows, appellation, region
MERGE (appellation)-[r:IS_PRODUCED_IN]->(region)
WITH commonRows, appellation , region
// <DO A LOT OF OTHER THINGS WITH REMAINING INFO>
10-22-2021 06:33 AM
@paolodipietro58
If I understood well, your problem is that the UNWIND
cycle every row, even if appellation, labels and regions are equal.
Therefore, you can aggregate rows based on the same above properties, for example as follows (up to the line 6 is mock data, just to show you how I thought about this one, that is, first 3 row with equal props, and other differents. You have to change it with LOAD CSV WITH HEADERS FROM 'xxx' as row FIELDTERMINATOR ';'
😞
unwind [{Appellation: "q,w,e", `EU Classification`: 'AAA', `Classification`: "1", Region: "1,2,3", `Appellation UUID`: '123-31312', Other: 123123},
{Appellation: "q,w,e", `EU Classification`: 'AAA', `Classification`: "1", Region: "1,2,3", `Appellation UUID`: '4324-555', Other: 6456565656},
{Appellation: "q,w,e", `EU Classification`: 'AAA', `Classification`: "1", Region: "1,2,3", `Appellation UUID`: '322-554',Other: 'wfhaffsdak'},
{Appellation: "a,s,d", `EU Classification`: 'AAA', `Classification`: "1", Region: "1,2,3", `Appellation UUID`: '342-6545',Other: '6456565656'},
{Appellation: "q,w,e", `EU Classification`: 'ZZZ', `Classification`: "1", Region: "1,2,3", `Appellation UUID`: '23432-3534',Other: 'dsbjkhffdbkjh'}] as row
// --end mock data
with split(row.Appellation,",") as appellationNames, row.`EU Classification` as label, row.`Classification` as classification, split(row.Region,",") as regions, collect(row) as commonRows
MERGE (appellation:Appellation { uuid: commonRows[0].`Appellation UUID` })
SET appellation.name = appellationNames
// Replace the primary node labels
with appellation, label, classification, regions, commonRows
CALL apoc.create.setLabels(appellation,[label,classification, 'Appellation']) YIELD node
WITH node, regions, appellation, commonRows
UNWIND regions as aRegion
MATCH (region:Region)
WHERE ToLower(region.name) = trim(ToLower(aRegion))
WITH commonRows, appellation, region
MATCH (appellation)-[oldRelationship:IS_PRODUCED_IN]->(region)
DETACH DELETE oldRelationship
WITH commonRows, appellation, region
MERGE (appellation)-[r:IS_PRODUCED_IN]->(region)
WITH commonRows, appellation , region
// <DO A LOT OF OTHER THINGS WITH REMAINING INFO>
The "core" thing is the line with split(row.Appellation,",") as appellationNames, row.
EU Classification as label, row.
Classification as classification, split(row.Region,",") as regions, collect(row) as commonRows
, where I collect all rows based on appellationNames, label, classification, regions.
Then, you can possibly cycle commonRows in <DO A LOT OF OTHER THINGS WITH REMAINING INFO>
part.
In this example I put first Appellation UUID
found (based on common properties), otherwise you can change commonRows[0]
to commonRows[size(commonRows) - 1]
10-22-2021 06:42 AM
Hi Giuseppe,
Yes, I understand what you are trying to tell me, but, IMHO, this is a solution that can be used with a standard query.
I don't understand, and you would try to describe, how can I obtain such result starting from LOAD CSV
.
10-25-2021 05:57 AM
Of course. Let's suppose I have the following csv test.csv
:
Appellation;EU Classification;Classification;Region;Appellation UUID;Other
q,w,e;AAA;1;1,2,3;123-31312;123123
q,w,e;AAA;1;1,2,3;4324-555;6456565656
q,w,e;AAA;1;1,2,3;322-554;wfhaffsdak
a,s,d;AAA;1;1,2,3;342-6545;6456565656
q,w,e;ZZZ;1;1,2,3;23432-3534;dsbjkhffdbkjh
I could do:
LOAD CSV WITH HEADERS FROM 'file:///test.csv' as row FIELDTERMINATOR ';'
with split(row.Appellation,",") as appellationNames, row.`EU Classification` as label, row.`Classification` as classification, split(row.Region,",") as regions, collect(row) as commonRows
MERGE (appellation:Appellation { uuid: commonRows[0].`Appellation UUID` })
SET appellation.name = appellationNames
// Replace the primary node labels
with appellation, label, classification, regions, commonRows
CALL apoc.create.setLabels(appellation,[label,classification, 'Appellation']) YIELD node
WITH node, regions, appellation, commonRows
UNWIND regions as aRegion
MATCH (region:Region)
WHERE ToLower(region.name) = trim(ToLower(aRegion))
WITH commonRows, appellation, region
MATCH (appellation)-[oldRelationship:IS_PRODUCED_IN]->(region)
DETACH DELETE oldRelationship
WITH commonRows, appellation, region
MERGE (appellation)-[r:IS_PRODUCED_IN]->(region)
WITH commonRows, appellation , region
// <DO A LOT OF OTHER THINGS WITH REMAINING INFO>
All the sessions of the conference are now available online