Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
02-26-2022 11:17 AM
Hi,
I'm new to Neo4j and Cypher, will appriciate help with a query, don't know where to start.
my dataset (CSV) is structured as demostrated below:
EntityID | EventType | Timestamp
A 11 02/05/21
A 54 03/05/21
A 11 04/05/21
B 62 01/02/20
B 54 08/03/20
B 49 15/08/20
I created a node fo each Entity and each unique EventType,
i would like to create a relationship between each Entity and each EventType whose timestamp is the earliest for that Entity (e.g. Entity A's node will be connected to EventType's 11 node)
I have over 1M rows and 40K entities in my dataset.
Thanks!
Solved! Go to Solution.
02-27-2022 08:44 AM
I think this works. I tried it with the data sorted by date in both ascending and defending order, to ensure the result was not dependent on the order of the data.
Basically, it creates a new relationship between the entity and event in each row, setting the relationship date to the new date, or the min date if there was an existing relationship between the two entities. It then finds if there is a relationship between the row's entity and another event. If found, it deletes the one extra one with the greater date, leaving the relationship with the minimum event date.
LOAD CSV WITH HEADERS
FROM 'file:///xxx.csv' as rowStore
with rowStore.Entity as Entity, rowStore.EventType as EventType, date(rowStore.Timestamp) as EventDate
merge(n:Entity{id: Entity})
merge(m:EventType{id: EventType})
merge(n)-[r:EventDate]->(m)
on create set r.EventDate = EventDate
on match set r.EventDate = apoc.coll.min([r.EventDate, EventDate])
with n, EventDate
match(n)-[r:EventDate]->(p:EventType)
where r.EventDate <> apoc.coll.min([r.EventDate, EventDate])
delete r
02-26-2022 02:08 PM
If you connect Entity "A" to EventType "11" with timestamp "02/05/21", then EventType "11" with timestamp "04/05/21" will not be connected to "A". This way, you cannot find if "A" attended the event on "04/05/21". It would be helpful if you can specify the Entity and EventType node properties in your data model.
02-26-2022 08:15 PM
Since you have over 1M rows, I would suggest you use neo4j-admin import . For that you would need two CSV files representing the Nodes (EntityId and EventType) and one for the Relationships.
02-27-2022 08:44 AM
I think this works. I tried it with the data sorted by date in both ascending and defending order, to ensure the result was not dependent on the order of the data.
Basically, it creates a new relationship between the entity and event in each row, setting the relationship date to the new date, or the min date if there was an existing relationship between the two entities. It then finds if there is a relationship between the row's entity and another event. If found, it deletes the one extra one with the greater date, leaving the relationship with the minimum event date.
LOAD CSV WITH HEADERS
FROM 'file:///xxx.csv' as rowStore
with rowStore.Entity as Entity, rowStore.EventType as EventType, date(rowStore.Timestamp) as EventDate
merge(n:Entity{id: Entity})
merge(m:EventType{id: EventType})
merge(n)-[r:EventDate]->(m)
on create set r.EventDate = EventDate
on match set r.EventDate = apoc.coll.min([r.EventDate, EventDate])
with n, EventDate
match(n)-[r:EventDate]->(p:EventType)
where r.EventDate <> apoc.coll.min([r.EventDate, EventDate])
delete r
02-27-2022 09:42 AM
Solution is good But, we end up with lot of orphan nodes.
03-11-2022 05:07 AM
Thanks! this is very helpful.
02-27-2022 10:17 AM
What is an orphan node? Do you not want the event nodes that are not connected to an entity? If that is the case, change the last line to:
delete r, p
All the sessions of the conference are now available online