Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.
01-15-2023 03:36 AM
I am building a time series based pattern identifying project using Neo4j. Below is a sample schema of the graph i have created.
Below is the distinct count of the data we have,
Student: 30000, Sports:45, Academics:20, Extracurricular: 30
Below is the relationship count formed between the 4 labels,
STUDIES:62000, PLAYS:35000, PERFORMS:41000
I would like to find a pattern of students performing similar activities in a set period of time and what will be the next set of activities they may perform.
I am trying to achieve a time series based model like below,
Performing the same operation in regular time series based approach is challenging due to the large number of nodes in my actual project.
Please provide some achievable solutions using Neo4j and applicable GDS Algorithms that I can implement for the above problem.
01-15-2023 05:54 AM - edited 01-15-2023 05:56 AM
I think you are going have difficulties with your data model, because you are storing the dates in a list. Instead, create a new relationship for each month a student participated in an activity, with the date as a relationship property. The new version of neo4j introduced indexing on relationship properties, so you can leverage that to find all interactions for a date or range of dates quickly.
Addressing your timeline requirement would be easier with my suggested data model. You can get the timeline data for a specific user as follows.
match(u:User{id: 100})
match(u)-[r:PERFORMS|STUDIES|PLAYS]->(e)
return u.name as student, r.date as date, collect(e.name) as activities
order by date
the above will return a row for each date, with a list of activities the user participated in for that day.
having the dates in a list will make difficult to search and sort by.
this is just one options. There are others, but the best is based on its ability to allow you to answer your analytic questions.
01-15-2023 09:28 AM
Hi @glilienfield,
Thanks for the quick response. I have couple of doubts in your suggestion.
1. Can we have different relationships between same 2 nodes, but with different properties (i.e., 'date')?
2. While exploding the relationship property('month') from list to individual rows, does it affect the performance of the graph? And what is the max limitation of relationship count in the community edition?
Thanks in advance.
01-15-2023 11:55 AM
Answers:
The best solution depends on your needs. Which gives you the ability to efficiently answer your analytic questions.
match(u:User{id: 100})
match(u)-[r:PERFORMS|STUDIES|PLAYS]->(e)
return u.name as student, r.date as date, collect(e.name) as activities
order by date
Query for relationships with list of dates:
match(u:User{id: 100})
match(u)-[r:PERFORMS|STUDIES|PLAYS]->(e)
with u, e, r
unwind r.dates as date
return u.name as student, date, collect(e.name) as activities
order by date
01-22-2023 10:01 AM
Hi @glilienfield ,
Apologies for getting back a little delayed on this.
Thanks for your suggestions. I have recreated the data structure and also modified the Graph Schema to address the same. Have added a NEXT relationship among the various events so they form a chain.
As next step could you please help/point me to the GDS Algorithms that best address the Journey identification challenge.
Thanks a lot for your help!!
01-22-2023 10:35 AM
Glad you have made progress. I have to apologize; I am not a GDS user, so I am not familiar with the algorithms. You can find them with the link below. Maybe the node similarity algorithm would be a place to start.
https://neo4j.com/docs/graph-data-science/current/algorithms/
I would think you would project a filtered version of your graph that only projects the entities that meet your time frame.
01-22-2023 11:10 AM
Thanks @glilienfield I will explore the GDS algorithms keep this thread updated with my work.
As a first step taking hints from @alicia_frame's project from,
https://github.com/AliciaFrame/GDS_Patient_Journey
01-27-2023 07:30 AM
Hello @Dineshramk 😊
To tell you which algorithm to use, I need to know what you want to do. What questions are you trying to answer?
Regards,
Cobra
02-03-2023 07:41 AM
I am trying to detect
1. communities (of students having similar pattern of activities throughout multiple years).
2. Predict students to register for an activity based on similarities with other students prior activities.
3. Rank activities based on the amount of students taking part, leaving after certain time etc.
Kindly let me know if you require further details.
All the sessions of the conference are now available online