Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
06-28-2021 08:31 AM
I have a CSV that I'm loading into Neo4j where one of the columns is a date in yyyyMMdd
format, i.e. 20210628
for June 28, 2021.
Some of the dates have '00' for the day of the month, and I would like to modify those to be '01', while leaving the dates with appropriate days-of-month as-is.
I have tried to use the approach from this blog post using FOREACH
, but the FOREACH
is matching every date as having a terminal 00
, when I know this isn't the case.
An example load statement is below.
LOAD CSV WITH HEADERS FROM 'file:///example-vertex-file__clean.csv' AS row
MERGE (p:Entity {bvd_id: row.bvd_id})
FOREACH(tmp IN CASE WHEN right(trim(row.information_date), 2) = "00" THEN [1] ELSE [] END | SET p.information_date = apoc.date.convertFormat(left(row.information_date,6) + '01', "yyyyMMdd", "yyyy-MM-dd"))
FOREACH(tmp IN CASE WHEN right(trim(row.information_date), 2) <> "00" THEN [1] ELSE [] END | SET p.information_date = apoc.date.convertFormat(row.information_date, "yyyyMMdd", "yyyy-MM-dd"))
RETURN count(p);
Solved! Go to Solution.
06-29-2021 01:58 PM
I was able to find a solution.
First, I loaded in the data using apoc.load.csv()
, which was much faster than the LOAD CSV
command.
Then, after the load, I converted the dates-as-strings to dates-as-dates using:
MATCH ()-[r]->()
WITH r,
CASE WHEN right(trim(r.information_date), 2) = '00' THEN apoc.date.convertFormat(left(r.information_date,6) + '01', "yyyyMMdd", "yyyy-MM-dd")
WHEN right(trim(r.information_date), 2) <> '00' THEN apoc.date.convertFormat(r.information_date, "yyyyMMdd", "yyyy-MM-dd")
END AS info_date
SET r.information_date = info_date
MATCH (n)
WITH n,
CASE WHEN right(trim(n.information_date), 2) = '00' THEN apoc.date.convertFormat(left(n.information_date,6) + '01', "yyyyMMdd", "yyyy-MM-dd")
WHEN right(trim(n.information_date), 2) <> '00' THEN apoc.date.convertFormat(n.information_date, "yyyyMMdd", "yyyy-MM-dd")
END AS info_date
SET n.information_date = info_date
I don't know why the original approach didn't work, but using apoc.load.csv()
seems to be the preferred way to load a CSV now, so hopefully I'll take this route from now on.
Thanks for your help!
06-28-2021 02:35 PM
You may want to sanity check your data and the function you're applying. Try this and inspect the return to see if anything looks off:
LOAD CSV WITH HEADERS FROM 'file:///example-vertex-file__clean.csv' AS row
WITH row
LIMIT 10
WITH row, row.information_date as original, right(trim(row.information_date), 2) as lastTwo
RETURN original, lastTwo,
CASE WHEN lastTwo = "00"
THEN apoc.date.convertFormat(left(original,6) + '01', "yyyyMMdd", "yyyy-MM-dd")
ELSE apoc.date.convertFormat(original, "yyyyMMdd", "yyyy-MM-dd") END as converted
06-28-2021 03:33 PM
Thanks for the recommendation!
I tried your query, and all of the data parses correctly. I had a similar idea before posting here (I printed the Booleans for the conditional to check that I wasn't going crazy), and again it worked then.
But the original MERGE
I posted still only ever hits the = "00"
case, and it hits that for every date. The <> "00"
case is never run. If I remove the former FOREACH()
and keep the latter FOREACH()
, none of the nodes get the information_date
property.
I'm glad to know that I didn't miss something obvious. But I'm still baffled by why the conditional works everywhere except inside the FOREACH()
calls.
06-29-2021 05:55 AM
I tried using the CASE WHEN
statement with the MERGE
, e.g.
LOAD CSV WITH HEADERS FROM 'file:///example-vertex-file__clean.csv' AS row
WITH row,
CASE WHEN right(row.information_date,2) = "00"
THEN apoc.date.convertFormat(left(row.information_date,6) + '01', "yyyyMMdd", "yyyy-MM-dd")
ELSE apoc.date.convertFormat(row.information_dat, "yyyyMMdd", "yyyy-MM-dd") END as information_date
MERGE (p:Entity {bvd_id: row.bvd_id})
SET p.information_date = information_date
RETURN count(p);
but this now skips over the cases where the last two digits aren't 00
, i.e. those nodes do not get an information_date
property.
If it helps, here is the head of an example CSV file:
bvd_id,name,country_iso_code,entity_type,independence_indicator,guo_25,guo_50,guo_25c,guo_50c,information_date
CN9433079978,Wen ling jian feng qi ye guan li zi xun he huo qi ye ( you xian he huo ),CN,C,U,,,,,20190927
RU57156762,MUNITSIPALNOE KAZENNOE DOSHKOLNOE OBRAZOVATELNOE UCHREZHDENIE DETSKII SAD #4 SELO GRACHEVKA GRACHEVSKOGO MUNITSIPALNOGO OKRUGA STAVROPOLSKOGO KRAYA,RU,C,U,,,,,20210200
BG040846481,ALKOK 3 EOOD,BG,C,D,BG5809304609,BG160111419,BG160111419,BG160111419,20210314
TJ1830079882R,Gadoeva Safargul,TJ,C,D,TJP185955918,TJP185955918,TJ1830079882R,TJ1830079882R,20210100
CN*110293794693,WEINONG SHEN,CN,I,-,,,,,20210303
IT03141010839,MEDILAB S.R.L.,IT,C,D,ITPLZZSST75B16G377U,ITPLZZSST75B16G377U,IT03141010839,IT03141010839,20200100
DE8230272776-1000,CARRUS Fahrzeuge GmbH,DE,Q,D,DE*110005813233,DE8230272776,DE8230272776,DE8230272776,20210300
WW*893498078,MR DMITRO VOLODIMIROVICH GORYACHKIN,,I,-,WW*893498078,WW*893498078,,,20131231
TH0105556198399,ORIENT INTERNATIONAL (THAILAND) COMPANY LIMITED,TH,C,B+,TH*110179655203,,TH0105556198399,,20180131
06-29-2021 01:58 PM
I was able to find a solution.
First, I loaded in the data using apoc.load.csv()
, which was much faster than the LOAD CSV
command.
Then, after the load, I converted the dates-as-strings to dates-as-dates using:
MATCH ()-[r]->()
WITH r,
CASE WHEN right(trim(r.information_date), 2) = '00' THEN apoc.date.convertFormat(left(r.information_date,6) + '01', "yyyyMMdd", "yyyy-MM-dd")
WHEN right(trim(r.information_date), 2) <> '00' THEN apoc.date.convertFormat(r.information_date, "yyyyMMdd", "yyyy-MM-dd")
END AS info_date
SET r.information_date = info_date
MATCH (n)
WITH n,
CASE WHEN right(trim(n.information_date), 2) = '00' THEN apoc.date.convertFormat(left(n.information_date,6) + '01', "yyyyMMdd", "yyyy-MM-dd")
WHEN right(trim(n.information_date), 2) <> '00' THEN apoc.date.convertFormat(n.information_date, "yyyyMMdd", "yyyy-MM-dd")
END AS info_date
SET n.information_date = info_date
I don't know why the original approach didn't work, but using apoc.load.csv()
seems to be the preferred way to load a CSV now, so hopefully I'll take this route from now on.
Thanks for your help!
All the sessions of the conference are now available online