cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Import cell from CSV based on latest date

hlow
Node Link

Hi
I have some data in a CSV file in the format below. Let's assume that the date format is Neo4j friendly.

ID | User | Date
1 | 01 | 1/1/21
2 | 02 | 1/1/21
3 | 01 | 1/2/21

I'd like to create a node named User with the latest date. In this case, for User 01, how would I import the Date property as 1/2/21 assuming I don't want to use a 2nd merge/match.

Thanks!

1 ACCEPTED SOLUTION

Yeah, that does add a large wrinkle, as you can not include the 'id' in the 'with' clause, since it will remove the grouping. Instead, you need to collect all the user's rows and filter out all but the one with the max date. At least this is the approach I came up with. Others may have additional approaches.

The following two queries achieve this, but with slightly different implementations. I added both for learning purposes. Both approach pass the entire row through, so the solution will work if you add more columns.

Implemented with 'unwind'

load csv with headers from 'file:///Book1.csv' as row
with row.User as user, max(date(row.Date)) as maxDate, collect(row{.*, convertedDate: date(row.Date)}) as userRows
unwind userRows as userRow
with user, maxDate, userRow
where maxDate = userRow.convertedDate
merge(n:User{user: user}) set n.id = userRow.ID, n.date = userRow.convertedDate

Implemented with collection filtering:

load csv with headers from 'file:///Book1.csv' as row
with row.User as user, max(date(row.Date)) as maxDate, collect(row{.*, convertedDate: date(row.Date)}) as userRows
with user, maxDate, [x in userRows where x.convertedDate = maxDate][0] as rowWithMaxDate
merge(n:User{user: user}) set n.id = rowWithMaxDate.ID, n.date = rowWithMaxDate.convertedDate

Note, the two approaches behave slightly different if your data has multiple rows for a user with the same max date.

View solution in original post

3 REPLIES 3

I guess you could group by user to get the max date, then create the nodes just with the max date.

Assuming your dates are in 'yyyy-mm-dd' format, the following works:

load csv with headers from 'file:///Book1.csv' as row
with row.User as user, max(date(row.Date)) as maxDate
merge (n:User {user: user}) set n.date = maxDate

Can I add one more wrinkle to this problem?

How would I also include the ID from the row with the maxDate?

Yeah, that does add a large wrinkle, as you can not include the 'id' in the 'with' clause, since it will remove the grouping. Instead, you need to collect all the user's rows and filter out all but the one with the max date. At least this is the approach I came up with. Others may have additional approaches.

The following two queries achieve this, but with slightly different implementations. I added both for learning purposes. Both approach pass the entire row through, so the solution will work if you add more columns.

Implemented with 'unwind'

load csv with headers from 'file:///Book1.csv' as row
with row.User as user, max(date(row.Date)) as maxDate, collect(row{.*, convertedDate: date(row.Date)}) as userRows
unwind userRows as userRow
with user, maxDate, userRow
where maxDate = userRow.convertedDate
merge(n:User{user: user}) set n.id = userRow.ID, n.date = userRow.convertedDate

Implemented with collection filtering:

load csv with headers from 'file:///Book1.csv' as row
with row.User as user, max(date(row.Date)) as maxDate, collect(row{.*, convertedDate: date(row.Date)}) as userRows
with user, maxDate, [x in userRows where x.convertedDate = maxDate][0] as rowWithMaxDate
merge(n:User{user: user}) set n.id = rowWithMaxDate.ID, n.date = rowWithMaxDate.convertedDate

Note, the two approaches behave slightly different if your data has multiple rows for a user with the same max date.