cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Parsing date and time separately when loading CSV file

Hey guys. I am using the following code to load a csv file and parse the data abnd time columns.

CALL apoc.periodic.iterate(

'CALL apoc.load.csv("file:///newfile.csv") yield map as row'
,
'MERGE (s:Sender{from_send:row.From}) 
MERGE (r:Receiver{to_send:row.To}) 
MERGE (s)-[e:EMAILED {
date_d:datetime({epochMillis:apoc.date.parse(row.Date,'ms','dd/MM/yyyy')}), 
time_d:time(datetime({epochMillis:apoc.date.parse(row.Time,'ms','hh:mm:ss')})), 
subject:row.Subject, message_id : row.MessageID}]->(r)'
,
{batchSize:10000, iterateList:True, parallel:false}
)

but Im getting a syntax error,I can't figure out what Im doing wrong. Can anyone please help out. Thanks

NOTE, the "Date" column is in the format "08/06/2020"
while the "Time"column is in the form "23:59:52"

7 REPLIES 7

Ch3xMat3
Node Link

Initial thoughts when looking at your script you are running is you need to use different quotes for your strings in the script and the quotes that are around your entire script. I would suggest changing the quotes that are around the entire cypher script to double quotes and the ones around the strings to single quotes (I like using double quotes around my whole script, you could do it the other way if you prefer). This changes the script to look like this...

CALL apoc.periodic.iterate(
"CALL apoc.load.csv('file:///newfile.csv') yield map as row",
"MERGE (s:Sender{from_send:row.From})
MERGE (r:Receiver{to_send:row.To})
MERGE (s)-[e:EMAILED {
date_d:datetime({epochMillis:apoc.date.parse(row.Date,'ms','dd/MM/yyyy')}),
time_d:time(datetime({epochMillis:apoc.date.parse(row.Time,'ms','hh:mm:ss')})),
subject:row.Subject, message_id : row.MessageID}]->(r)",
{batchSize:10000, iterateList:True, parallel:false}
)

Let me know if this does not fix the issue.

Thank you for the answer. The query seems to load and throws no errors, but the execution is taking too long. It still hadn't completed in 4+ hours. The previous file (300k rows) completed in about an hour. This file however contains (700k rows). Can you also guide as to what else should I do? I have already increased the heap size and reduced the batch size further.

Hello @ahmedfazal405

I think your query is taking time because of the date and time conversion. Did you try to load without the parsing?

Regards,
Cobra

@Cobra I was wondering about the parsing of the date and time. Do you have another way to get this data properly from the csv to the database as a datetime without the parsing?

Hey @Cobra

I did load the previous csv file (300k rows) without parsing the date and time. But I require queries which would be able to handle the date, time columns in their appropriate format and not as "Strings". Is there another way around if this if taking too much time. Can these columns be parsed later on?

If possible, please send me the actual value for the date as in your .csv file.

To be honest, I avoid to format my data in Cypher, I always format the data in Python and after I laod them

You can do this in a few seconds and a few lines of code in Python