Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
04-27-2022 03:41 PM
Hi
I have some data in a CSV file in the format below. Let's assume that the date format is Neo4j friendly.
ID | User | Date
1 | 01 | 1/1/21
2 | 02 | 1/1/21
3 | 01 | 1/2/21
I'd like to create a node named User with the latest date. In this case, for User 01, how would I import the Date property as 1/2/21 assuming I don't want to use a 2nd merge/match.
Thanks!
Solved! Go to Solution.
04-29-2022 12:03 PM
Yeah, that does add a large wrinkle, as you can not include the 'id' in the 'with' clause, since it will remove the grouping. Instead, you need to collect all the user's rows and filter out all but the one with the max date. At least this is the approach I came up with. Others may have additional approaches.
The following two queries achieve this, but with slightly different implementations. I added both for learning purposes. Both approach pass the entire row through, so the solution will work if you add more columns.
Implemented with 'unwind'
load csv with headers from 'file:///Book1.csv' as row
with row.User as user, max(date(row.Date)) as maxDate, collect(row{.*, convertedDate: date(row.Date)}) as userRows
unwind userRows as userRow
with user, maxDate, userRow
where maxDate = userRow.convertedDate
merge(n:User{user: user}) set n.id = userRow.ID, n.date = userRow.convertedDate
Implemented with collection filtering:
load csv with headers from 'file:///Book1.csv' as row
with row.User as user, max(date(row.Date)) as maxDate, collect(row{.*, convertedDate: date(row.Date)}) as userRows
with user, maxDate, [x in userRows where x.convertedDate = maxDate][0] as rowWithMaxDate
merge(n:User{user: user}) set n.id = rowWithMaxDate.ID, n.date = rowWithMaxDate.convertedDate
Note, the two approaches behave slightly different if your data has multiple rows for a user with the same max date.
04-27-2022 04:30 PM
I guess you could group by user to get the max date, then create the nodes just with the max date.
Assuming your dates are in 'yyyy-mm-dd' format, the following works:
load csv with headers from 'file:///Book1.csv' as row
with row.User as user, max(date(row.Date)) as maxDate
merge (n:User {user: user}) set n.date = maxDate
04-29-2022 11:17 AM
Can I add one more wrinkle to this problem?
How would I also include the ID from the row with the maxDate?
04-29-2022 12:03 PM
Yeah, that does add a large wrinkle, as you can not include the 'id' in the 'with' clause, since it will remove the grouping. Instead, you need to collect all the user's rows and filter out all but the one with the max date. At least this is the approach I came up with. Others may have additional approaches.
The following two queries achieve this, but with slightly different implementations. I added both for learning purposes. Both approach pass the entire row through, so the solution will work if you add more columns.
Implemented with 'unwind'
load csv with headers from 'file:///Book1.csv' as row
with row.User as user, max(date(row.Date)) as maxDate, collect(row{.*, convertedDate: date(row.Date)}) as userRows
unwind userRows as userRow
with user, maxDate, userRow
where maxDate = userRow.convertedDate
merge(n:User{user: user}) set n.id = userRow.ID, n.date = userRow.convertedDate
Implemented with collection filtering:
load csv with headers from 'file:///Book1.csv' as row
with row.User as user, max(date(row.Date)) as maxDate, collect(row{.*, convertedDate: date(row.Date)}) as userRows
with user, maxDate, [x in userRows where x.convertedDate = maxDate][0] as rowWithMaxDate
merge(n:User{user: user}) set n.id = rowWithMaxDate.ID, n.date = rowWithMaxDate.convertedDate
Note, the two approaches behave slightly different if your data has multiple rows for a user with the same max date.
All the sessions of the conference are now available online