cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

String Equality Comparison Not Working in FOREACH

ddarmon
Node Clone

I have a CSV that I'm loading into Neo4j where one of the columns is a date in yyyyMMdd format, i.e. 20210628 for June 28, 2021.

Some of the dates have '00' for the day of the month, and I would like to modify those to be '01', while leaving the dates with appropriate days-of-month as-is.

I have tried to use the approach from this blog post using FOREACH, but the FOREACH is matching every date as having a terminal 00, when I know this isn't the case.

An example load statement is below.

LOAD CSV WITH HEADERS FROM 'file:///example-vertex-file__clean.csv' AS row

MERGE (p:Entity {bvd_id: row.bvd_id})
  FOREACH(tmp IN CASE WHEN right(trim(row.information_date), 2) = "00" THEN [1] ELSE [] END | SET p.information_date = apoc.date.convertFormat(left(row.information_date,6) + '01', "yyyyMMdd", "yyyy-MM-dd"))
  FOREACH(tmp IN CASE WHEN right(trim(row.information_date), 2) <> "00" THEN [1] ELSE [] END | SET p.information_date = apoc.date.convertFormat(row.information_date, "yyyyMMdd", "yyyy-MM-dd"))

RETURN count(p);
1 ACCEPTED SOLUTION

ddarmon
Node Clone

@andrew.bowman

I was able to find a solution.

First, I loaded in the data using apoc.load.csv(), which was much faster than the LOAD CSV command.

Then, after the load, I converted the dates-as-strings to dates-as-dates using:

MATCH ()-[r]->()
WITH r,
CASE WHEN right(trim(r.information_date), 2) = '00' THEN apoc.date.convertFormat(left(r.information_date,6) + '01', "yyyyMMdd", "yyyy-MM-dd")
     WHEN right(trim(r.information_date), 2) <> '00' THEN apoc.date.convertFormat(r.information_date, "yyyyMMdd", "yyyy-MM-dd")
END AS info_date
SET r.information_date = info_date

MATCH (n)
WITH n,
CASE WHEN right(trim(n.information_date), 2) = '00' THEN apoc.date.convertFormat(left(n.information_date,6) + '01', "yyyyMMdd", "yyyy-MM-dd")
     WHEN right(trim(n.information_date), 2) <> '00' THEN apoc.date.convertFormat(n.information_date, "yyyyMMdd", "yyyy-MM-dd")
END AS info_date
SET n.information_date = info_date

I don't know why the original approach didn't work, but using apoc.load.csv() seems to be the preferred way to load a CSV now, so hopefully I'll take this route from now on.

Thanks for your help!

View solution in original post

4 REPLIES 4

You may want to sanity check your data and the function you're applying. Try this and inspect the return to see if anything looks off:

LOAD CSV WITH HEADERS FROM 'file:///example-vertex-file__clean.csv' AS row
WITH row
LIMIT 10
WITH row, row.information_date as original, right(trim(row.information_date), 2) as lastTwo
RETURN original, lastTwo, 
 CASE WHEN lastTwo = "00" 
 THEN apoc.date.convertFormat(left(original,6) + '01', "yyyyMMdd", "yyyy-MM-dd") 
 ELSE apoc.date.convertFormat(original, "yyyyMMdd", "yyyy-MM-dd") END as converted

ddarmon
Node Clone

Thanks for the recommendation!

I tried your query, and all of the data parses correctly. I had a similar idea before posting here (I printed the Booleans for the conditional to check that I wasn't going crazy), and again it worked then.

But the original MERGE I posted still only ever hits the = "00" case, and it hits that for every date. The <> "00" case is never run. If I remove the former FOREACH() and keep the latter FOREACH(), none of the nodes get the information_date property.

I'm glad to know that I didn't miss something obvious. But I'm still baffled by why the conditional works everywhere except inside the FOREACH() calls.

ddarmon
Node Clone

I tried using the CASE WHEN statement with the MERGE, e.g.

LOAD CSV WITH HEADERS FROM 'file:///example-vertex-file__clean.csv' AS row
WITH row, 
 CASE WHEN right(row.information_date,2) = "00"
 THEN apoc.date.convertFormat(left(row.information_date,6) + '01', "yyyyMMdd", "yyyy-MM-dd") 
 ELSE apoc.date.convertFormat(row.information_dat, "yyyyMMdd", "yyyy-MM-dd") END as information_date
MERGE (p:Entity {bvd_id: row.bvd_id})
SET p.information_date = information_date

RETURN count(p);

but this now skips over the cases where the last two digits aren't 00, i.e. those nodes do not get an information_date property.

If it helps, here is the head of an example CSV file:

bvd_id,name,country_iso_code,entity_type,independence_indicator,guo_25,guo_50,guo_25c,guo_50c,information_date
CN9433079978,Wen ling jian feng qi ye guan li zi xun he huo qi ye ( you xian he huo ),CN,C,U,,,,,20190927
RU57156762,MUNITSIPALNOE KAZENNOE DOSHKOLNOE OBRAZOVATELNOE UCHREZHDENIE DETSKII SAD #4 SELO GRACHEVKA GRACHEVSKOGO MUNITSIPALNOGO OKRUGA STAVROPOLSKOGO KRAYA,RU,C,U,,,,,20210200
BG040846481,ALKOK 3 EOOD,BG,C,D,BG5809304609,BG160111419,BG160111419,BG160111419,20210314
TJ1830079882R,Gadoeva Safargul,TJ,C,D,TJP185955918,TJP185955918,TJ1830079882R,TJ1830079882R,20210100
CN*110293794693,WEINONG SHEN,CN,I,-,,,,,20210303
IT03141010839,MEDILAB S.R.L.,IT,C,D,ITPLZZSST75B16G377U,ITPLZZSST75B16G377U,IT03141010839,IT03141010839,20200100
DE8230272776-1000,CARRUS Fahrzeuge GmbH,DE,Q,D,DE*110005813233,DE8230272776,DE8230272776,DE8230272776,20210300
WW*893498078,MR DMITRO VOLODIMIROVICH GORYACHKIN,,I,-,WW*893498078,WW*893498078,,,20131231
TH0105556198399,ORIENT INTERNATIONAL (THAILAND) COMPANY LIMITED,TH,C,B+,TH*110179655203,,TH0105556198399,,20180131

ddarmon
Node Clone

@andrew.bowman

I was able to find a solution.

First, I loaded in the data using apoc.load.csv(), which was much faster than the LOAD CSV command.

Then, after the load, I converted the dates-as-strings to dates-as-dates using:

MATCH ()-[r]->()
WITH r,
CASE WHEN right(trim(r.information_date), 2) = '00' THEN apoc.date.convertFormat(left(r.information_date,6) + '01', "yyyyMMdd", "yyyy-MM-dd")
     WHEN right(trim(r.information_date), 2) <> '00' THEN apoc.date.convertFormat(r.information_date, "yyyyMMdd", "yyyy-MM-dd")
END AS info_date
SET r.information_date = info_date

MATCH (n)
WITH n,
CASE WHEN right(trim(n.information_date), 2) = '00' THEN apoc.date.convertFormat(left(n.information_date,6) + '01', "yyyyMMdd", "yyyy-MM-dd")
     WHEN right(trim(n.information_date), 2) <> '00' THEN apoc.date.convertFormat(n.information_date, "yyyyMMdd", "yyyy-MM-dd")
END AS info_date
SET n.information_date = info_date

I don't know why the original approach didn't work, but using apoc.load.csv() seems to be the preferred way to load a CSV now, so hopefully I'll take this route from now on.

Thanks for your help!

Nodes 2022
Nodes
NODES 2022, Neo4j Online Education Summit

All the sessions of the conference are now available online