Neo4j

pw6163 · ‎07-08-2021

I'm importing a CSV which includes vehicle details. Here in Australia some classes of vehicle do not need to have registration, and the CSV has a blank field for these. I'd like to generate an "artificial" rego for these rows with a pattern like "UNREG1", "UNREG2", and so on.

Whilst loading the CSV can I maintain a variable which I increment as necessary to produce these values? I've started with:

WITH 1 AS unreg
LOAD CSV WITH HEADERS FROM 'file:///file.csv' AS row
WITH unreg, row CASE trim(row.REGO) WHEN "" THEN "UNREG" + toString(unreg) ELSE row.REGO END AS NewRego
MERGE (v:Vehicle {Rego: NewRego})
WITH row CASE trim(row.REGO) WHEN "" THEN unreg+1 ELSE unreg END AS unreg
RETURN unreg;

It's not working unfortunately, but I'm not particularly experienced with Cypher (unreg is always 1 and there are a dozen rows with no supplied REGO value). I'm trying to avoid pre-processing the CSV before loading into neo4j, but if I have to then I will.

I'm using neo4j Desktop 4.3.1

Cobra · ‎07-08-2021

Hello @pw6163 and welcome to the Neo4j community

With LOAD CSV, you can use linenumber() as shown in the doc:

LOAD CSV WITH HEADERS FROM 'file:///file.csv' AS row
WITH row, CASE trim(row.REGO) WHEN "" THEN "UNREG" + toString(linenumber()) ELSE row.REGO END AS NewRego
MERGE (v:Vehicle {Rego: NewRego})
RETURN NewRego;

Regards,
Cobra

pw6163 · ‎07-08-2021

Many thanks @cobra. I saw mention of linenumber() but somehow hadn't put it in the right context.

That easy huh?

Cobra · ‎07-08-2021

Normally, it shoud work , let me know!

pw6163 · ‎07-09-2021

@Cobra: Unfortunately not, but it's likely to be the way I'm loading the data. When a CSV file has an empty cell, will it be presented in Cypher as NULL or an empty string? At the moment the CASE statement looks for both but it feels clumsy. The script is:

LOAD CSV WITH HEADERS FROM 'file:///file.csv' AS row
WITH row
(CASE trim(row.REGO)
WHEN "" THEN "UNREG" + toString(linenumber())
WHEN NULL THEN "UNREG" + toString(linenumber())
ELSE row.REGO END) AS NewRego
MERGE (v:Vehicle {Rego: toUpper(replace(NewRego, " ", ""))})
RETURN count(v);

Searching for the data returns nothing, so I'm obviously doing something wrong.

Cobra · ‎07-09-2021

I don't understand what you mean. Can you share the CSV file?

dana_canzano · ‎07-09-2021

and linenumber() may work but if for example you run multiple LOAD CSVs then linenumber() will repeat, for example if the 1st LOAD CSV reads a file with 100 rows of data this is thus linenumber 1 though 100. If you run another LOAD CSV and this reads a file with 250 rows of data this is thus linenumbers 1 through 250. So you could have multiple nodes with the same UNREG##

pw6163 · ‎07-10-2021

Thanks. I've copied the suggestion from the Cypher guide and re-read the import file a number of times to get different clusters of nodes (i.e. once per label). The vehicle details are the only one where I need to artificially create missing data. In an RDBMS the vehicle rego would be the primary key, with neo4j it's the unique name used to create the relationships between a vehicle node and an incident, or policy.

pw6163 · ‎07-10-2021

@Cobra: Sure, here's a sample of the kind of data I'm using:

DATE_RECEIVED,REGO,VEHICLE_YEAR,VEHICLE_DESCRIPTION,CLIENT
12/03/2020,1eid123,2010,Landrover 90,Insurer 1
14/03/2020,,2009,Baler,Insurer 2
27/06/2020,abc987,2019,BMW 320i,Insurer 1
30/06/2020,,2020,Road bike,Insurer 3
30/06/2020,qgad34,2018,Iveco ambulance,Insurer 4

I've only shown a few rows and a few fields as an illustration. I haven't included the VIN (Vehicle Identification) for example because it's rarely provided and considerably less important.

Regards,
Paul

Cobra · ‎07-10-2021

Yeah @dana.canzano is rigth, that's why I guess the best option is to use the function apoc.create.uuid() in the APOC plugin.

If the cell is empty, the value will be considered null, so this query should work by using coalesce() function:

LOAD CSV WITH HEADERS FROM 'file:///file.csv' AS row
WITH coalesce(row.REGO, 'UNREG_'+apoc.create.uuid()) AS Rego
MERGE (v:Vehicle {Rego: Rego})
RETURN Rego;

Regards,
Cobra

pw6163 · ‎07-10-2021

@Cobra: I thought about using something like a UUID although I hadn't looked for that functionality in detail.

Unfortunately that's going to create a rego that's too long to be believable - that sounds like a small thing, and if neo4j was the only data source then it wouldn't apply. But the data is being taken from a SaaS system and I'm creating a graph to illustrate the value of a different view of the data we have. Here a rego has to be 9 characters or less and I pretty much have to stick to that limit.

Using a pattern like "UNREGnnnn" I can handle a 10,000 line import file if the digits are taken from the line number, and that's easily long enough for a demo.

I'll make a note of the suggestion though, it's sure to come up in future.

Thanks.
Paul

Cobra · ‎07-11-2021

The only solution I see is this one:

LOAD CSV WITH HEADERS FROM 'file:///test.csv' AS row
WITH coalesce(row.REGO, 'UNREG'+linenumber()) AS Rego
MERGE (v:Vehicle {Rego: Rego})
RETURN Rego;

As I said, If the cell is empty, the value will be considered null so with your dataset example, the first vehicle will have "UNREG3" as Rego value.

In your case, I think you should fill empty cells before loading the data into Neo4j, it will be easier for everything.

pw6163 · ‎07-11-2021

Thanks, that's the approach I'll go with for the moment.

I'm trying to avoid pre-processing the CSV for a couple of reasons - the more capability I can show from Cypher the more credibility neo4j has as a useful tool, but also we have a strong preference for reducing the amount of code that will have to be supported. Mind you, if this is approved there will be automated regular data extraction from the SaaS product as well as the Cypher code, so a bit of Python is unlikely to make much difference.

Thanks to @Cobra and @dana.canzano for all your help.

ameyasoft · ‎07-13-2021

Here is my solution:

I generated unique number from current date, time and linenumber. I used the data you presented.
Here is the Cypher:
LOAD CSV WITH HEADERS FROM "file:/rego.csv" AS row
with row, linenumber() as lnbr
with row, lnbr, apoc.date.format(apoc.date.parse(toSTring(date()), 'ms', 'yyyy-MM-dd'), 'ms', 'yy/MM/dd') as dt
with row, lnbr, split(dt, '/') as dt2
unwind dt2 as dt3
with row, lnbr, sum(toInteger(dt3)) as dt31

with row, lnbr, localtime({timezone: 'America/Los Angeles'}) AS sfo, dt31
with row, lnbr, split(toString(sfo), ':') as sfo1, dt31
unwind sfo1 as sfo2
with row, lnbr, dt31, sum(toInteger(sfo2))  as sfo3

with row, toString(dt31) + toString(sfo3) + toString(lnbr) as final

with coalesce(row.REGO, 'UNREG'+final) AS Rego
MERGE (v:Vehicle {Rego: Rego})

Unreg nodes: Rego:UNREG41363, Rego:UNREG41365

Result:

pw6163 · ‎07-14-2021

Wow! Thank you for that - there's a lot I haven't come across before in that answer. It's helpful to see what looks to me to be a sophisticated solution. Cypher is clearly more able that I'd realised !

Paul

ameyasoft · ‎07-14-2021

Thanks for your appreciation!

Neo4j

Incrementing variables in Cypher?