Neo4j

ahmedfazal405 · ‎06-18-2020

Hey guys. I am using the following code to load a csv file and parse the data abnd time columns.

CALL apoc.periodic.iterate(

'CALL apoc.load.csv("file:///newfile.csv") yield map as row'
,
'MERGE (s:Sender{from_send:row.From}) 
MERGE (r:Receiver{to_send:row.To}) 
MERGE (s)-[e:EMAILED {
date_d:datetime({epochMillis:apoc.date.parse(row.Date,'ms','dd/MM/yyyy')}), 
time_d:time(datetime({epochMillis:apoc.date.parse(row.Time,'ms','hh:mm:ss')})), 
subject:row.Subject, message_id : row.MessageID}]->(r)'
,
{batchSize:10000, iterateList:True, parallel:false}
)

but Im getting a syntax error,I can't figure out what Im doing wrong. Can anyone please help out. Thanks

NOTE, the "Date" column is in the format "08/06/2020"
while the "Time"column is in the form "23:59:52"

Ch3xMat3 · ‎06-19-2020

Initial thoughts when looking at your script you are running is you need to use different quotes for your strings in the script and the quotes that are around your entire script. I would suggest changing the quotes that are around the entire cypher script to double quotes and the ones around the strings to single quotes (I like using double quotes around my whole script, you could do it the other way if you prefer). This changes the script to look like this...

CALL apoc.periodic.iterate(
"CALL apoc.load.csv('file:///newfile.csv') yield map as row",
"MERGE (s:Sender{from_send:row.From})
MERGE (r:Receiver{to_send:row.To})
MERGE (s)-[e:EMAILED {
date_d:datetime({epochMillis:apoc.date.parse(row.Date,'ms','dd/MM/yyyy')}),
time_d:time(datetime({epochMillis:apoc.date.parse(row.Time,'ms','hh:mm:ss')})),
subject:row.Subject, message_id : row.MessageID}]->(r)",
{batchSize:10000, iterateList:True, parallel:false}
)

Let me know if this does not fix the issue.

ahmedfazal405 · ‎06-23-2020

Thank you for the answer. The query seems to load and throws no errors, but the execution is taking too long. It still hadn't completed in 4+ hours. The previous file (300k rows) completed in about an hour. This file however contains (700k rows). Can you also guide as to what else should I do? I have already increased the heap size and reduced the batch size further.

Cobra · ‎06-23-2020

Hello @ahmedfazal405

I think your query is taking time because of the date and time conversion. Did you try to load without the parsing?

Regards,
Cobra

Ch3xMat3 · ‎06-23-2020

@Cobra I was wondering about the parsing of the date and time. Do you have another way to get this data properly from the csv to the database as a datetime without the parsing?

ahmedfazal405 · ‎06-23-2020

Hey @Cobra

I did load the previous csv file (300k rows) without parsing the date and time. But I require queries which would be able to handle the date, time columns in their appropriate format and not as "Strings". Is there another way around if this if taking too much time. Can these columns be parsed later on?

ameyasoft · ‎06-23-2020

If possible, please send me the actual value for the date as in your .csv file.

Cobra · ‎06-23-2020

To be honest, I avoid to format my data in Cypher, I always format the data in Python and after I laod them

You can do this in a few seconds and a few lines of code in Python

Neo4j

Parsing date and time separately when loading CSV file