cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

String Equality Comparison Not Working in FOREACH

ddarmon
Node Clone

I have a CSV that I'm loading into Neo4j where one of the columns is a date in yyyyMMdd format, i.e. 20210628 for June 28, 2021.

Some of the dates have '00' for the day of the month, and I would like to modify those to be '01', while leaving the dates with appropriate days-of-month as-is.

I have tried to use the approach from this blog post using FOREACH, but the FOREACH is matching every date as having a terminal 00, when I know this isn't the case.

An example load statement is below.

LOAD CSV WITH HEADERS FROM 'file:///example-vertex-file__clean.csv' AS row

MERGE (p:Entity {bvd_id: row.bvd_id})
  FOREACH(tmp IN CASE WHEN right(trim(row.information_date), 2) = "00" THEN [1] ELSE [] END | SET p.information_date = apoc.date.convertFormat(left(row.information_date,6) + '01', "yyyyMMdd", "yyyy-MM-dd"))
  FOREACH(tmp IN CASE WHEN right(trim(row.information_date), 2) <> "00" THEN [1] ELSE [] END | SET p.information_date = apoc.date.convertFormat(row.information_date, "yyyyMMdd", "yyyy-MM-dd"))

RETURN count(p);
1 ACCEPTED SOLUTION

ddarmon
Node Clone

@andrew.bowman

I was able to find a solution.

First, I loaded in the data using apoc.load.csv(), which was much faster than the LOAD CSV command.

Then, after the load, I converted the dates-as-strings to dates-as-dates using:

MATCH ()-[r]->()
WITH r,
CASE WHEN right(trim(r.information_date), 2) = '00' THEN apoc.date.convertFormat(left(r.information_date,6) + '01', "yyyyMMdd", "yyyy-MM-dd")
     WHEN right(trim(r.information_date), 2) <> '00' THEN apoc.date.convertFormat(r.information_date, "yyyyMMdd", "yyyy-MM-dd")
END AS info_date
SET r.information_date = info_date

MATCH (n)
WITH n,
CASE WHEN right(trim(n.information_date), 2) = '00' THEN apoc.date.convertFormat(left(n.information_date,6) + '01', "yyyyMMdd", "yyyy-MM-dd")
     WHEN right(trim(n.information_date), 2) <> '00' THEN apoc.date.convertFormat(n.information_date, "yyyyMMdd", "yyyy-MM-dd")
END AS info_date
SET n.information_date = info_date

I don't know why the original approach didn't work, but using apoc.load.csv() seems to be the preferred way to load a CSV now, so hopefully I'll take this route from now on.

Thanks for your help!

View solution in original post

4 REPLIES 4

You may want to sanity check your data and the function you're applying. Try this and inspect the return to see if anything looks off:

LOAD CSV WITH HEADERS FROM 'file:///example-vertex-file__clean.csv' AS row
WITH row
LIMIT 10
WITH row, row.information_date as original, right(trim(row.information_date), 2) as lastTwo
RETURN original, lastTwo, 
 CASE WHEN lastTwo = "00" 
 THEN apoc.date.convertFormat(left(original,6) + '01', "yyyyMMdd", "yyyy-MM-dd") 
 ELSE apoc.date.convertFormat(original, "yyyyMMdd", "yyyy-MM-dd") END as converted

ddarmon
Node Clone

Thanks for the recommendation!

I tried your query, and all of the data parses correctly. I had a similar idea before posting here (I printed the Booleans for the conditional to check that I wasn't going crazy), and again it worked then.

But the original MERGE I posted still only ever hits the = "00" case, and it hits that for every date. The <> "00" case is never run. If I remove the former FOREACH() and keep the latter FOREACH(), none of the nodes get the information_date property.

I'm glad to know that I didn't miss something obvious. But I'm still baffled by why the conditional works everywhere except inside the FOREACH() calls.

ddarmon
Node Clone

I tried using the CASE WHEN statement with the MERGE, e.g.

LOAD CSV WITH HEADERS FROM 'file:///example-vertex-file__clean.csv' AS row
WITH row, 
 CASE WHEN right(row.information_date,2) = "00"
 THEN apoc.date.convertFormat(left(row.information_date,6) + '01', "yyyyMMdd", "yyyy-MM-dd") 
 ELSE apoc.date.convertFormat(row.information_dat, "yyyyMMdd", "yyyy-MM-dd") END as information_date
MERGE (p:Entity {bvd_id: row.bvd_id})
SET p.information_date = information_date

RETURN count(p);

but this now skips over the cases where the last two digits aren't 00, i.e. those nodes do not get an information_date property.

If it helps, here is the head of an example CSV file:

bvd_id,name,country_iso_code,entity_type,independence_indicator,guo_25,guo_50,guo_25c,guo_50c,information_date
CN9433079978,Wen ling jian feng qi ye guan li zi xun he huo qi ye ( you xian he huo ),CN,C,U,,,,,20190927
RU57156762,MUNITSIPALNOE KAZENNOE DOSHKOLNOE OBRAZOVATELNOE UCHREZHDENIE DETSKII SAD #4 SELO GRACHEVKA GRACHEVSKOGO MUNITSIPALNOGO OKRUGA STAVROPOLSKOGO KRAYA,RU,C,U,,,,,20210200
BG040846481,ALKOK 3 EOOD,BG,C,D,BG5809304609,BG160111419,BG160111419,BG160111419,20210314
TJ1830079882R,Gadoeva Safargul,TJ,C,D,TJP185955918,TJP185955918,TJ1830079882R,TJ1830079882R,20210100
CN*110293794693,WEINONG SHEN,CN,I,-,,,,,20210303
IT03141010839,MEDILAB S.R.L.,IT,C,D,ITPLZZSST75B16G377U,ITPLZZSST75B16G377U,IT03141010839,IT03141010839,20200100
DE8230272776-1000,CARRUS Fahrzeuge GmbH,DE,Q,D,DE*110005813233,DE8230272776,DE8230272776,DE8230272776,20210300
WW*893498078,MR DMITRO VOLODIMIROVICH GORYACHKIN,,I,-,WW*893498078,WW*893498078,,,20131231
TH0105556198399,ORIENT INTERNATIONAL (THAILAND) COMPANY LIMITED,TH,C,B+,TH*110179655203,,TH0105556198399,,20180131

ddarmon
Node Clone

@andrew.bowman

I was able to find a solution.

First, I loaded in the data using apoc.load.csv(), which was much faster than the LOAD CSV command.

Then, after the load, I converted the dates-as-strings to dates-as-dates using:

MATCH ()-[r]->()
WITH r,
CASE WHEN right(trim(r.information_date), 2) = '00' THEN apoc.date.convertFormat(left(r.information_date,6) + '01', "yyyyMMdd", "yyyy-MM-dd")
     WHEN right(trim(r.information_date), 2) <> '00' THEN apoc.date.convertFormat(r.information_date, "yyyyMMdd", "yyyy-MM-dd")
END AS info_date
SET r.information_date = info_date

MATCH (n)
WITH n,
CASE WHEN right(trim(n.information_date), 2) = '00' THEN apoc.date.convertFormat(left(n.information_date,6) + '01', "yyyyMMdd", "yyyy-MM-dd")
     WHEN right(trim(n.information_date), 2) <> '00' THEN apoc.date.convertFormat(n.information_date, "yyyyMMdd", "yyyy-MM-dd")
END AS info_date
SET n.information_date = info_date

I don't know why the original approach didn't work, but using apoc.load.csv() seems to be the preferred way to load a CSV now, so hopefully I'll take this route from now on.

Thanks for your help!