cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Date list as relationship property - create a Student Journey

I am building a time series based pattern identifying project using Neo4j. Below is a sample schema of the graph i have created.

Below is the distinct count of the data we have,

Student: 30000, Sports:45, Academics:20, Extracurricular: 30

Below is the relationship count formed between the 4 labels,

STUDIES:62000, PLAYS:35000, PERFORMS:41000

 Screen Shot 2023-01-15 at 4.47.09 PM.png

I would like to find a pattern of students performing similar activities in a set period of time and what will be the next set of activities they may perform. 

I am trying to achieve a time series based model like below, 

Screen Shot 2023-01-15 at 4.57.15 PM.png

Performing the same operation in regular time series based approach is challenging due to the large number of nodes in my actual project. 

Please provide some achievable solutions using Neo4j and applicable GDS Algorithms that I can implement for the above problem. 

8 REPLIES 8

glilienfield
Ninja
Ninja

I think you are going have difficulties with your data model, because you are storing the dates in a list. Instead, create a new relationship for each month a student participated in an activity, with the date as a relationship property. The new version of neo4j introduced indexing on relationship properties, so you can leverage that to find all interactions for a date or range of dates quickly.

Addressing your timeline requirement would be easier with my suggested data model. You can get the timeline data for a specific user as follows.

match(u:User{id: 100})

match(u)-[r:PERFORMS|STUDIES|PLAYS]->(e) 

return u.name as student, r.date as date, collect(e.name) as activities

order by date

the above will return a row for each date, with a list of activities the user participated in for that day. 

having the dates in a list will make difficult to search and sort by. 

this is just one options.  There are others, but the best is based on its ability to allow you to answer your analytic questions. 

Hi @glilienfield,

Thanks for the quick response. I have couple of doubts in your suggestion.

1. Can we have different relationships between same 2 nodes, but with different properties (i.e., 'date')?

2. While exploding the relationship property('month') from list to individual rows, does it affect the performance of the graph? And what is the max limitation of relationship count in the community edition?

Thanks in advance. 

 

Answers:

  1. 1. You can have as many relationships between the same two nodes as needed. they can exactly identical too. 
  2. 2. It will negatively impact in some scenarios, but positively impact in others. In your scenario, you will need the cypher to retrieve all the relationships of these types for a specific person and group them by date, so you can get the actives for each day. Below are solutions for each data model. The trouble you will have is searching and filtering by the data in queries, as the list has be iterated through each time to evaluate a filter predicate. 

The best solution depends on your needs.  Which gives you the ability to efficiently answer your analytic questions. 

  1.  
  2. Query for relationships with single date:
match(u:User{id: 100})
match(u)-[r:PERFORMS|STUDIES|PLAYS]->(e) 
return u.name as student, r.date as date, collect(e.name) as activities
order by date

Query for relationships with list of dates:

match(u:User{id: 100})
match(u)-[r:PERFORMS|STUDIES|PLAYS]->(e) 
with u, e, r
unwind r.dates as date
return u.name as student, date, collect(e.name) as activities
order by date

 

Hi @glilienfield ,

Apologies for getting back a little delayed on this.

Thanks for your suggestions. I have recreated the data structure and also modified the Graph Schema to address the same. Have added a NEXT relationship among the various events so they form a chain. 

As next step could you please help/point me to the GDS Algorithms that best address the Journey identification challenge. 

Thanks a lot for your help!!

Glad you have made progress.  I have to apologize; I am not a GDS user, so I am not familiar with the algorithms. You can find them with the link below. Maybe the node similarity algorithm would be a place to start. 

https://neo4j.com/docs/graph-data-science/current/algorithms/

I would think you would project a filtered version of your graph that only projects the entities that meet your time frame.

Thanks @glilienfield I will explore the GDS algorithms keep this thread updated with my work. 

As a first step taking hints from @alicia_frame's project from,
https://github.com/AliciaFrame/GDS_Patient_Journey

 

Hello @Dineshramk 😊

To tell you which algorithm to use, I need to know what you want to do. What questions are you trying to answer?

Regards,
Cobra

@Cobra 

I am trying to detect

1. communities (of students having similar pattern of activities throughout multiple years). 

2. Predict students to register for an activity based on similarities with other students prior activities.

3. Rank activities based on the amount of students taking part, leaving after certain time etc.

Kindly let me know if you require further details.