cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Multiple uses of WITH in LOAD CSV - impact on variable scope and iteration

The use case is to link an 'app' node to 'locations' for that app's development and hosting

For the DevLocation and the HostLocationI am loading multi-value comma-separated fields from a CSV as an array, so that I can iterate and create a Location for each element of the array

I'm looking for a better way to do this; the approach I am using feels wrong and ultimately doesn't seem to work and I am sure I am misunderstanding the indexing/iteration/scope model in use

Sample (simplified) data:

App: Application 1
dLoc: CRO, India
hLoc: SLO, CRO

Doing this once works for the location, but the scope of 'app' is lost for the final statement line:

UNWIND split(csvLine.dLoc, ",") AS locd 
//provide readable labels for obvious countries
WITH locd, 
(CASE trim(locd)
 WHEN 'CRO' THEN 'Croatia'
 WHEN 'SLO' THEN 'Slovakia'
 ELSE trim(locd) END) AS devPlace,
MERGE (DevLoc:Location {name: devPlace})
CREATE (app)-[:DEVELOPED_IN]->(DevLoc)

I can pass 'app' through the WITH statement:

UNWIND split(csvLine.dLoc, ",") AS locd 
//provide readable labels for obvious countries
WITH locd, 
(CASE trim(locd)
 WHEN 'CRO' THEN 'Croatia'
 WHEN 'SLO' THEN 'Slovakia'
 ELSE trim(locd) END) AS devPlace, app
MERGE (DevLoc:Location {name: devPlace})
CREATE (app)-[:DEVELOPED_IN]->(DevLoc)

But this feels wrong and gets worse when I try to do the same thing for the hosting location afterwards, requiring both 'app' and 'csvLine' to be passed through with an additional WITH, to remain in scope:

WITH csvLine, app
UNWIND split(csvLine.dLoc, ",") AS locd 
//provide readable labels for obvious countries
WITH locd, 
(CASE trim(locd)
 WHEN 'CRO' THEN 'Croatia'
 WHEN 'SLO' THEN 'Slovakia'
 ELSE trim(locd) END) AS devPlace, csvLine, app
MERGE (DevLoc:Location {name: devPlace})
CREATE (app)-[:DEVELOPED_IN]->(DevLoc)

WITH csvLine, app
UNWIND split(csvLine.hLoc, ",") AS loch  
WITH loch,
(CASE trim(loch)
 WHEN 'CRO' THEN 'Croatia'
 WHEN 'SLO' THEN 'Slovakia'
 ELSE trim(loch) END) AS hostPlace, csvLine, app
MERGE (HostLoc:Location {name: hostPlace})
CREATE (app)-[:HOSTED_IN]->(HostLoc)

This appears to work, but for more complex scenarios, I have noticed significant anomalies, such as the same app with multiple duplicated DEVELOPED_IN and HOSTED_IN location relationships

All suggestions for improvement, or references to more detailed examples, gratefully received

1 ACCEPTED SOLUTION

You're running into cardinality issues here.

UNWIND takes list elements and changes them into rows.

Cypher operations execute per row (this is why UNWIND seems like it's an iteration structure, but it's really not).

As you enter your next WITH, the cardinality hasn't reset from the previous UNWIND, you're going to have multiple rows with the same csvLine and app values, this is why the latter part of your query is generating duplicates.

You either need to reset cardinality through:

...
WITH DISTINCT csvLine, app
...

or change from using UNWIND to FOREACH (which is bounded and won't alter row cardinality), though you'll need to make sure all clauses in the FOREACH are updating clauses (no MATCHes or WITHs):

WITH csvLine, app
FOREACH (locd IN split(csvLine.dLoc, ",") | 
  MERGE (DevLoc:Location {name: CASE trim(locd)
                 WHEN 'CRO' THEN 'Croatia'
                 WHEN 'SLO' THEN 'Slovakia'
                 ELSE trim(locd) END})
  CREATE (app)-[:DEVELOPED_IN]->(DevLoc))

FOREACH (loch IN split(csvLine.hLoc, ",") | 
  MERGE (HostLoc:Location {name: CASE trim(loch)
                 WHEN 'CRO' THEN 'Croatia'
                 WHEN 'SLO' THEN 'Slovakia'
                 ELSE trim(loch) END})
  CREATE (app)-[: HOSTED_IN]->(HostLoc))

View solution in original post

2 REPLIES 2

You're running into cardinality issues here.

UNWIND takes list elements and changes them into rows.

Cypher operations execute per row (this is why UNWIND seems like it's an iteration structure, but it's really not).

As you enter your next WITH, the cardinality hasn't reset from the previous UNWIND, you're going to have multiple rows with the same csvLine and app values, this is why the latter part of your query is generating duplicates.

You either need to reset cardinality through:

...
WITH DISTINCT csvLine, app
...

or change from using UNWIND to FOREACH (which is bounded and won't alter row cardinality), though you'll need to make sure all clauses in the FOREACH are updating clauses (no MATCHes or WITHs):

WITH csvLine, app
FOREACH (locd IN split(csvLine.dLoc, ",") | 
  MERGE (DevLoc:Location {name: CASE trim(locd)
                 WHEN 'CRO' THEN 'Croatia'
                 WHEN 'SLO' THEN 'Slovakia'
                 ELSE trim(locd) END})
  CREATE (app)-[:DEVELOPED_IN]->(DevLoc))

FOREACH (loch IN split(csvLine.hLoc, ",") | 
  MERGE (HostLoc:Location {name: CASE trim(loch)
                 WHEN 'CRO' THEN 'Croatia'
                 WHEN 'SLO' THEN 'Slovakia'
                 ELSE trim(loch) END})
  CREATE (app)-[: HOSTED_IN]->(HostLoc))

Hi Andrew - both options work well for the requirement (but you knew that :-))

The FOREACH is much simpler and cleaner, thank you.

Tim