Friday, March 9, 2012

Decision Trees

I am studying the behavior of 200.000 clients. With the use of decision trees I would like to know if my clients will abandon our service or not. I use a training set of 21.822 clients and I use a predict variable "aband" wich is a discrete variable and it can be 0 or 1. In my training set i have 21.597 cases in which aband is 0 and 255 cases in which aband is 1. Looking at the classification matrix obtained using as input table a testing set (unselected data) I can see that my decision tree doesn't recognize the cases in which aband is 1. Here is the Classification Matrix:

Counts for Dati Training on [Aband]
Predicted 0 (Actual) 1 (Actual)
0 21597 225
1 0 0

What should I do?

Chiara

For this kind of problem, where you have around 1% support, you are generally not going to get any outright predictions for "Aband". What you will get is a probability for each client that they will abandon. You can get this probability by using the clause "PredictProbability(Aband, 1)" in your prediction query. You should still see a positive lift using the lift chart to target the state "1", no? This chart sorts the results by probability of the target and counts the capture rate - i.e. the rate at which it finds actual abandonments. You can also find the "most likely" to abandon with a query such as

SELECT TOP 100 t.CustomerID, PredictProbability(Aband, 1)
FROM MyChurnModel
PREDICTION JOIN
...
ORDER BY PredictProbability(Aband, 1) DESC

My recommendation for this problem is to balance your data from the original 200,000 records so that you have a higher incidence of abandonment in the training case. There are instructions how to do this at http://www.sqlserverdatamining.com/DMCommunity/TipsNTricks/2615.aspx (you can dump the results into a table for mining - you don't have to use the training transform). The main idea is to sample off some rows at the natural distribution rate for testing, and then create a balanced sample for the training set.

|||

Thanks a lot!

Chiara

No comments:

Post a Comment