The Mechanics of Predicting Customer Churn Series: When the Business Follows a Subscription Model

August 2, 2018

Customer churn is a typical dynamic in any business – for one reason or another, a customer who has previously purchased from a company, no longer purchases. However, to surface potential causes for churn that can inform mitigation activities, we need a more operational definition.

Churn can be defined in several different ways. If a business uses a subscription model (e.g., Netflix, Amazon Prime membership), churn can be defined as those customers who have cancelled their subscription. A subscription cancellation typically exists as an explicit field in a database (cancelled_subscription=True), or it may need to be derived in some way. In either case, there is a specific event, that is either explicitly captured, or can easily be derived from existing data points, that provides the definition for churn.

However, if a business does not use a subscription model, the definition of churn must be derived based on a change in the customer’s transactional behavior (purchases) over a certain amount of time.

There are various methods that can be used to predict churn. When there is a subscription model being utilized, we have a specific indicator available for our analysis – we can ‘label’ those who have cancelled their subscription – the analysis is relatively straightforward. The first step is to take a sample of internal customer data and split into two groups – those who have churned, and those who have not. A number of Machine Learning models (e.g., such as Logistic Regression, Random Forest, or Naïve Baysian) can then be ‘trained’ to learn which ‘features’ are most predictive that someone is likely to churn. A feature represents a piece of information that we know about a customer. Examples of features include age, gender, geographic location, and marital status. A hypothetical analysis might indicate for example, that for Netflix, geographic location is highly predictive that someone will churn, and that zip codes along the Gulf coast are most highly correlated with churn. A possible explanation could be that weather-related issues negatively impacted streaming services in those areas, causing many people to cancel their subscriptions.

So far, we’ve described how to predict customer churn for businesses that have a subscription model, where the definition of “churn” is straightforward – a subscription canceled equals a customer churned. Now let’s look at how to predict customer churn for businesses that do not rely on subscriptions.

The absence of an explicit churn ‘label’ in our data adds an additional level of computational complexity to the analysis – specifically there is need to develop a mechanism to define “churn” (rather than inherit it directly from an existing data element). To this end, we will leverage information about customers’ transactional behaviors to provide us with a definition for churn that we can use for building our model.

The first and most important step in building a model that will accurately predict propensity (likelihood) for a customer to churn is to assign each customer a label indicating whether the customer has churned or not based on their historical transaction data. Since there is not a specific field in the data that indicates if a customer in the database is still a buyer or not, we need to focus on the customer’s purchasing behavior. The most recent six or twelve months of transactions (depending on how much historical data you have access to) should be left out of the initial analysis and model development process and used to test, validate and refine the accuracy of the predictions produced by an initial model.

The remaining transaction data (purchases) will tell each customer’s “story” – specifically, the frequency of purchases and the time interval between purchases. Analyzing this behavior mathematically can be used for a definition for churn of that particular customer. Additionally, we can generate new features regarding customer’s buying attitude that might be helpful in predicting their point of churn in time.

An example of such a feature might be the buying frequency (e.g. customer buys once every 45 days). At the end of this process, the specific event which determines the label will be a statement regarding the frequency of purchases (e.g. “a customer has churned if there are no purchases during the last 45 days”). It is important to switch focus from company churn definition to individual customer level. Doing so will result in higher probability of assigning the right label.

Once we have divided the customers into churned and not churned, we can begin training Machine Learning Classification Models (like Logistic Regression, Random Forest, etc.) and follow the same process as for businesses that follow a subscription model. To evaluate the accuracy of the model, we use the transaction data that we set aside in the first step of our analysis, which allows us to see if the customers we predicted to churn (or not churn) have made any purchases. Finally, the selected model will give us the importance of each feature included in it as a coefficient score, which we can use to determine which piece of information about a customer – either given or derived – is more influential in predicting the churn.

Now let’s focus on how to ‘tune’ your churn definition after building a preliminary model.

When building any Classification or Predictive model, there are always multiple iterations – throughout each, we “tune” the model based on its performance on training data. All of the steps that we take, from segmenting the data, to feature engineering, building the model, and then evaluating the model, are typically repeated several times. For our churn model, it is critical to retrospectively evaluate model performance for each phase of development and identify things that can be modified to improve model accuracy.

The accuracy of a model is simply the ratio of the correct predictions to the total number of cases that have been evaluated. However, to improve model accuracy we need to find where the model missed and why it missed – did it predict someone would churn, but they did not? Or, more importantly for our model, did it predict that someone would not churn, but they did? By rigorously interrogating our model, the data will tell us the missing parts of the story and suggest ways to improve our model.

In churn modeling the first thing we need to check is the misclassified customers, specifically the ones that we were not able to “catch” before they churned. These cases are critical since the purpose of predicting churn is to have this information while there’s still time to do something – customer retention is easier and less costly for a business than the acquisition of new customers.

When we circle back and evaluate how we built the model, we need to look for the unseen – the reason underneath the churning of those customers the model predicted to have a low probability to churn. Performing cluster analysis (e.g. K-Means, Hierarchical Clustering, DBSCAN, etc.) will help us find common patterns for this customer group. It is likely that we will see at least some of the reasons they churned. For example, they might be “irregular” buyers with orders placed in wide and unpredictable time intervals. Therefore, either the mean frequency (e.g. mean frequency = customer buys every 45 days) alone might not be enough to identify the churning point, or it should be calculated differently for these types of customers. Another scenario might be that these customers had bad product experiences or unsatisfactory service. For example, if a group of customers who bought the same product churned, it is likely that customers who recently bought it could be at high risk of churning regardless of their normal buying frequency. Including the data that captures this information into the model will improve the quality of the model.

On the other hand, there are customers that were predicted to leave but didn’t. Although the previous ones are more important to prevent revenue loss, these customers are indeed loyal to the company. Retaining loyal customers is essential. Therefore, a business would not want to target them with aggressive campaigns or too many emails. It would be a poor allocation of resources and even worse, these customers might not like frequent contact at that point, and choose to disengage (e.g. not purchase) in the future. Consequently, loyal customers might need a more flexible calculation for their buying frequency that is leveraged by our churn model.

As we see, there are various aspects to be considered when we investigate customer purchase behavior. While much of this information is noise and complex models should be avoided, we must capture the most important elements in the churn definition.

All these kinds of findings are useful input for model improvement. We go back to the very first step to modify the definition of churn for each customer and assign the zero and one labels again. Then, as in the first time around, we train and evaluate the best model. Finally, we conclude that constant observation and improvement is the key to learn and predict customers’ behavior better. The more we know, the higher the probability is of building a powerful predictive model.