cancel
Showing results for 
Search instead for 
Did you mean: 

Head's Up! These forums are read-only. All users and content have migrated. Please join us at community.neo4j.com.

Which algorithm should I use to predict churn?

Hello,

I am currently writing my thesis about customer analysis with Neo4j.
One task is to predict churn, given an example dataset with customers, contracts, services and the info if a customer has churned or not.

The example dataset (~7000 records in a .csv-file) looks like this:

I created the data model as follows:
All services (phone, internet, ...) were created once as a Service node, e.g.

CREATE (ts:Service {type: 'Tech Support'})
CREATE (ps:Service {type: 'Phone Service'})
CREATE (dp:Service {type: 'Device Protection'})
...

All contract specific info was created once as:

CREATE (ctone:ContractType {type: 'One Year'})
CREATE (ctmtm:ContractType {type: 'Month To Month'})
CREATE (cttwo:ContractType {type: 'Two Year'})
CREATE (pmmc:PaymentMethod {type: 'Mailed Check'})
CREATE (pmec:PaymentMethod {type: 'Electronic Check'})
CREATE (pmcc:PaymentMethod {type: 'Credit Card'})
CREATE (pmbt:PaymentMethod {type: 'Bank Transfer'})
CREATE (churn:Churn {type: 'Churn'})

Then I've loaded the csv-file with LOAD CSV and created the nodes :Customer and :Contract and the matching relationships.
This leads to a graph like this:

So my question is, is there an Graph Algorithm to predict churn with this set of data.
I've read through the 'Graph Algorithms' book by Mark Needham and Amy Hodler but didn't find a matching algorithm for this specific use case.

Any small help or hint is appreciated and will surely help me alot.

Thanks in advance.

5 REPLIES 5

Hi,

welcome to the Community!

Have you looked at the link prediction algorithms? https://neo4j.com/docs/graph-algorithms/current/labs-algorithms/linkprediction/

Cause basically that is what you are looking for, isn't it? You are looking for a possible future link from a customer to the node churn.

You could also have a look at online videos where people do machine learning with Neo4j. Maybe this will give you a better idea how to translate your data into a model where you could do better ML.

Hi Elena,

thanks for the welcome and your reply.
Yes I've already looked at the link prediction algorithms but can't find a way to use them to predict the probability of a :Contract linking with the :Churn node in the future. (for example based on the subscribed services)

Check out this webinar from our Nodes conference on using algos for machine learning: https://youtu.be/jx1_oSl6Yow

There isn't a specific algorithm to predict churn, but rather classes of algorithms that tell you about the topology of the graph: community detection algorithms tell you which nodes are more connected to eachother than the rest of the graph, centrality algorithms tell you which nodes are important, etc.

What you'll want to try is running graph algorithms on monopartite or bipartite projections of your graph (eg. the customer to customer network, with weights based on shared attributes for something like a centrality algo, or a similarity algorithms between customers based on shared behaviours) and then export those features into a machine learning pipeline. So, for example, if you're building a classifier model that predicts whether a customer will churn, you would use the graph algorithms results as additional features (in addition to your standard tabular data), and then use variable selection to identify which features are most predictive.

For an example of building and ML model with graph features, check out chapter 8 in the graph algorithms book

You may also be interested in these papers that use graph based features for churn prediction:

Hi Alicia,

I watched your webinar, it was really interesting, thanks a lot.

So in the next step i'll try to run different graph algorithms on projections of the graph and analyze the results.
I already tried the Jaccard Similarity algorithm and got some interesting results.

Thanks for your help & your time.

Hi Moritz

I am working on a similar telecom churn dataset, but struggling to create relations and the prediction model. Were you able to create the prediction model? Can you please help me with the Cypher code that used for building the graphs in your post and prediction mode.

Thanks in advance.

Regards,
MR