Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
01-25-2022 11:04 AM
I have an events database (that is still active where new events are added / deleted / updated) whereby for a new event that is added, I need to know the "previous" and "next" event so that I can add a "FOLLOWS" relationship between the events, thus, meaning i can "follow" an event to the next event to the next event etc such as:
(Basketball (3pm)) - FOLLOWS -> (Tennis (1pm)) - FOLLOWS -> (Music Concert (11am))
Events can be added in the future but can also be retrospectively updated if in the past.
An event has the following properties:
Each event has a "SHOWN_IN" relationship to a "venue", therefore a single venue has many events related to it.
(Football 10am) - SHOWN_IN -> (Venue 1)
There are many historical events for a venue, sometimes up to 15000 events in total, therefore performance is key.
I have tried to write a Cypher query that is quick (100ms) but with a large number of "hits", this is what i would like to resolve.
There are a few caveats i have to also consider:
The query i have so far is as follows:
MATCH(e:Event) WHERE e.id = "123"
MATCH(previous:Event)
WHERE previous.venueId = "abc"
AND previous.startTime < e.startTime ORDER BY previous.startTime DESC LIMIT 1
MATCH(next:Event)
WHERE next.venueId = "abc"
AND next.startTime > e.startTime ORDER BY next.startTime ASC LIMIT 1
CREATE (next)-[:FOLLOWS]->(e)
CREATE (e)-[:FOLLOWS]->(previous)
I have ran PROFILE on this query and i can see that the hits are very large, optimally i would like to see this < 100 hits for any size events in a venue if possible
01-25-2022 02:33 PM
It looks like you are inserting a new event in-between to existing events. If so, you can probably reduce the number of nodes tests by restricting your search to a pattern with connected nodes. Also, after inserting the new event between two existing connected events, don't you have to remove the existing FOLLOWS relationships, since it is no longer true?
The following query assumes an insertion scenario, i.e. connected events exist that are before and after the new event. Don't you have to also account for the case were a new event occurs after the last event in the chain and for the case of a historical event being added before all events in the chain? This could easily be handled if every time you add a new venue, you include two events connected to the venue that are connected by a FOLLOWS relationship and the start node has the earliest possible neo4j date for its time and the end node has the largest possible neo4j date for its time. In this way, every new event is always an insert.
MATCH(e:Event) WHERE e.id = "123"
MATCH(previous:Event)-[r:FOLLOWS]->(next:Event)
WHERE previous.venueId = "abc"
AND previous.startTime < e.startTime
AND next.venueId = "abc"
AND next.startTime > e.startTime
MERGE (next)-[:FOLLOWS]->(e)
MERGE (e)-[:FOLLOWS]->(previous)
WITH r
DELETE r
Sorry, I don't have test data, but it executes without error.
01-25-2022 02:37 PM
Hi, thanks for your reply, the existing data does NOT have existing FOLLOWS relationships, it is only now that we are looking to retrospectively add these, therefore, unfortunately, MATCH(previous:Event)-[r:FOLLOWS]->(next:Event)
will not return any results straight off the bat
01-26-2022 12:56 PM
Got it, so are you going to run a script to remediate the data to add the FOLLOWS relationship between every Event node? If so, performance is not as critical, but you need a script to do that.
Have you considered adding an index to improve performance during the remediation? An index on the combination of Event label and startTime property should prove helpful.
Are seeking help with a remediation script to add the FOLLOWS relationships?
I see the following steps if that is what you are looking for:
After remediation, you will need a query to add a new event for a given venue, by inserting it in the chain of events for the given venue?
Does this sound like what you are looking for?
01-26-2022 01:25 PM
thanks for your reply.
we are not planning on remediating the old data, we are planning on adding these FOLLOWS relationships adhoc (as we get changes on events (either inserts or updates).
We have added an index to improve the performance, specifically on the label and startTime which means that the query run time is ~150ms, however the number of "hits" for the query is still a lot due to the large result set (15000 events per venue). We get hundreds of events a minute so any query that has a large number of hits ends up backing up the service and causing throughput issues.
01-26-2022 03:22 PM
if you got that many and generated that frequently, can you insert a ':Date' node between venue and event to group events to specific date? Then you would only need to find the correct date and perform the insert only considering the events on the same day?
All the sessions of the conference are now available online