Neo4j

dimespi · ‎08-05-2019

Hello,

I'm doing some analysis on Call Detail Records (CDR). My dataset is similiar to this: https://neo4j.com/blog/neo4j-call-detail-records-analytics/

Here are the fields from my dataset :

source (operator)
called_number
calling_number
calling_date
country_code_from
country_code_to
usage
service_name (SMS, DATA, VOICE)
- SMS-OUTGOING
- SMS-OUTGOING-ROAMING
- SMS-INCOMING
- DATA-OUTGOING
- DATA-OUTGOING-ROAMING
- VOICE-OUTGOING
- VOICE-OUTGOING-ROAMING
- VOICE-INCOMING
- VOICE-INCOMING-ROAMING

If the service_name is SMS, the usage value will be set to 1.
If the service_name is DATA, the called_number and country_code_to will be empty.

I'd like to apply some machine learning algorithms and predictions for fraud/anomaly detection. I'm wondering wich one would be best for my use case? Kmeans, RandomForest, NaiveBayes, TimeSeries?

I found this:

I'm using py2neo and MLlib.

jennifer_reif · ‎08-06-2019

What kinds of fraud or anomalies are you looking for in this data set? I think understanding a bit more about your use case would help me narrow down the better options.

Cheers,
Jennifer

rsagar4 · ‎09-06-2019

Hi dimespi, were u able to find any example code on CDR analytics.. Great if you share link for example... also do have sample dataset for this.. Thanks in advance..

Neo4j

CDR analysis