Churn Prediction in Salesforce Simplified – An Action Guide

Use this guide to learn how machine learning can predict which customers are at risk of switching to a different vendor, and why. A real-life example will highlight the motives for employing machine learning to make churn predictions and the results that were achieved. Step-by-step instructions are provided for use at your organization, without the need for data scientists or programmers. Squark operates separately from Salesforce Einstein with any version of Salesforce.  No license for Einstein is required.

Introduction

We all know that retaining existing customers should be as important as earning brand-new ones. In repetitive revenue subscription businesses, churn rate—the percentage of existing customers that leave each period—is the single most important metric for determining long-term success. Think of it this way: If you are losing even 15% of your customers, that’s nearly two months’ worth of annual revenue you’ll have to sell just to be at zero. Retention is typically far less costly than replacement, due to marketing and sales costs savings. Profits rise when churn drops.

Are you taking greatest advantage of your pre-primed business relationships to retain revenue streams? You are if you pay attention to churn and use retention tactics to keep the right kinds of customers in your book. Some customers are better than others, of course. If you have a way to identify churn risk for all of them, it is simple to aim resources at those known to signal high lifetime value. Likewise, accounts that will never become profitable due to high costs of implementation and support are probably worth letting go.

Learning why customers leave is the essence of what machine learning is doing to enable churn predictions. Your existing data contains the story of why customers stay or leave. Squark automated machine learning detects patterns that reveal those stories so you can act on them. Squark not only labels each customer’s likelihood to churn, it shows which factors are most predictive of churn. That means you can adjust personas in marketing and sales to attract customers who are less inclined to churn in the first place. Retention programs can also be designed and targeted to address the issues that, when addressed, could save a valued customer.

This guide shows a simple way to begin addressing churn more effectively using your current teams, existing data sources, and simple tools that do not require data science or programming knowledge.

A Case to Prove the Point

Let’s look at a real-life example of how one company uses existing information in CRM, marketing automation, and ecommerce systems to predict which customers are at risk of churn.

Scenario

As a telecommunications provider, the company had an interest in broadening its subscription base and winning market share from competitors. To achieve that goal, retaining subscribers once they are signed was critical. Knowing which specific customers are in danger of churning allows them to aim retention special offers to the right people.

Predicament

With millions of customers, it is just not possible to guess which might leave or why. With reports, charts, and traditional statistical analyses the best they could do was glean insight on general reasons for churn. Without knowing customer-by-customer how those trends applied, they were unable to target effectively. The option of promoting retention programs to everyone had serious downsides, including campaign costs, consumer fatigue, and revenue reduction from making good on retention offers for customers who would have stayed without them.

Solution

Using machine learning to determine automatically the “scent” of a customer who will probably churn is the breakthrough. AI algorithms have distinct advantages over conventional analyses for these kinds of tasks.

Unlike statistical methods, machine learning does not require any guessing at which variables will be predictive of behaviors. When customer records contain dozens or hundreds of features (fields or data columns), it is impossible custom code to evaluate the myriad of complex relationships with any reasonable amount of time and effort. Questions change before answers can be produced. Only machine learning-can handle the dynamics of high-velocity data.

AI algorithms look for patterns. That’s it—no presumptions of whether term, subscription level, location, or any other demographics are indicators of churn propensity. Machine learning looks at all the data to determine which are operative and which are unimportant in predicting the result.

Process

The telecom company proposed a simple question to start: “Which customers are in danger of churning?” They had vast amounts of data in Salesforce, so the data aggregation phase of the initial project was simple: organize and export every customer record as a flat file using Salesforce reports. Each row in the data represented a customer and every column contained data about the customer, their transactions, behaviors and responses. By using all available data, they solved a couple problems. First, there was no guessing of predictive factors and no elimination of factors that to human intuition might not have seemed predictive. Second, they did not have to spend excess time preparing the data and let the machine learning system do it automatically.

The telecom company connected Squark to Salesforce in order to generate two datasets. One dataset had rows containing everything they knew about people who had churned and who hadn’t. The data included demographic, transactional, behavioral, product, and other first party data about the customer and their history. Most importantly one column contained the “yes” or “no” to indicate churn status. This was the target for prediction (called the dependent variable). This training data was used by AI to learn the characteristics of both groups. The second table had rows containing identical information about all existing customers, but with the churn column blank. This dataset, called production data, is the data set for which they needed answers. Don’t worry if you have customers that appear in both data sets. It is the aggregate of all customers who have churned or not that will be used to build the model.

Now that Salesforce data was available via Squark’s connector, the next step was to select which column in the training data would be used as the target or outcome column we are looking to predict in the production data—in this case the column containing the churn status. Clicking “Start” sent Squark to work producing hundreds or thousands of models. Squark first prepares the training data to optimize its value for machine learning. Advanced data science automation called Feature Engineering and Feature Selection transform, impute, bin, flatten, factor the data to maximize its utility for machine learning. free form string fields are parsed to determine potential key words and are converted to columns where the impact of the key word will be factored into the analysis of each row. Dates can be factored into separate year/month/day/hour columns. Sparse columns (those with few rows containing data) could have values imputed and columns that were determined not to be valuable to prediction could be removed from the analysis—all completely automatically.

Squark then created hundreds of models uniquely configured across multiple machine learning algorithms. Each custom model was created dynamically and automatically to fit the training data. All models were automatically validated and cross-validated against test data held back for that purpose, and ranked according to the model’s ability to predict accurately. Once Squark converged on the best model (and underlying algorithm) it used that one to score predictions in the production data. Output from the run contains the entire production dataset with a predictive outcome of “yes” or “no” values added to the “churn” column in each row of the production file, along with the probability percentages for each customer. The entire process from connecting training data to receipt of the predictions table took minutes, not days or weeks.

Deployment

Output from Squark is available for use in Salesforce and contains actionable information—a listing of customer who were at risk for churn, in order of probability. This data was used to create an audience segment for custom email campaigns and outbound calling targeted at customers more than 65% likely to churn.

Results

It worked. On their very first campaign, the telecom company got double-digit increases in revenue retention compared to earlier campaigns. They emailed and called far fewer prospects, yet saved more customer relationships. Associated improvement in long-term customer satisfaction and likelihood to improve average customer lifetime value was anticipated, along with significantly reduced costs of sales and support. Retaining versus winning back lapsed customers would avoid de-installation, re-installation, and associated equipment handling costs.

How Make It Work At Your Organization

 

Scenario

Pick a churn opportunity with a sizeable segment of your customer base, if not all of it. Segment by product, industry, region, account size, or whatever aligns with your most important business goals.

Establish the costs associated with outreach programs that alert prospects to retention offers. Examples include direct mail, social media messages, directed advertisements, email. Don’t forget about opportunity costs such as unopened email and unsubscribes.

Predicament

Identify the campaign type you wish to improve, and the benefits you expect to reap when you do. Use only existing campaign types for which you have historical performance data. Make sure the selected campaign is consistent with previous ones in every way except the targeted segment. Adding more moving parts such as new creative could make it more difficult to assess performance.

Solution

If you do not already have access to Squark, get in touch so we can arrange a free churn prediction. If you already use Squark, great! Let’s go.

Process

  1. Connect to Salesforce

Squark connects to your accounts, contacts, leads, and opportunities objects and finds the data needed. Squark will create training data, or records for customers known to have churned or not. The other will contain records for all customers in the targeted retention segment.

More data columns may be more predictive or not, but err on the side of inclusion. Squark will figure out if columns can be removed without impacting predictive power, or if new columns should be engineered automatically to improve the results. Including a token, tag, or ID for the customer is a good idea, even though it won’t be predictive. Those could be useful for joining results to other data in your outbound content applications, for instance. Here are made-up examples of the kinds of information you may know about customers and their transactions:

Training Data Set

customer name
customer id
account number
city
date of birth
age
buyer type
buyer email
transaction type
billing preference
credit score
payment type
original lead source
campaign token 1
campaign token 2
campaign token 3
keyword 1
keyword 2 keyword 3
keyword 4
email opens
promo id
content version 1
content version 2
content version 3
content version 4
is buyer
email status
first purchase
buyer class
activity class
service date
first registration
support status
channel narrow
classification
date joined
order count lifetime
order value
total units
referrer utm
revenue lifetime
time to first purchase
first purchase amount
gender predicted
gender probability
is renewal
internet
phone
mobile
basic cable
premium cable 1
premium cable 2
premium cable 3
cord cutter
last purchase
buyer rank
partner name
discount offered
churned

Production Data Set

customer name
customer id
account number
city
date of birth
age
buyer type
buyer email
transaction type
billing preference
credit score
payment type
original lead source
campaign token 1
campaign token 2
campaign token 3
keyword 1
keyword 2 keyword 3
keyword 4
email opens
promo id
content version 1
content version 2
content version 3
content version 4
is buyer
email status
first purchase
buyer class
activity class
service date
first registration
support status
channel narrow
classification
date joined
order count lifetime
order value
total units
referrer utm
revenue lifetime
time to first purchase
first purchase amount
gender predicted
gender probability
is renewal
internet
phone
mobile
basic cable
premium cable 1
premium cable 2
premium cable 3
cord cutter
last purchase
buyer rank
partner name
discount offered

  1. Select the dependent variable

The dependent or target variable is the column that you want to predict. In the training file this will contain known outcomes, in this case “yes/no.”

  1. Select the independent variables and select Next

Squark gives you the opportunity to review the columns in your data sets and to remove them from the prediction calculation input simply by clicking on the column. This is useful to remove data that is non-predictive because it is unique to each row (such as customer account number) or that you wish to exclude for other reasons (such as gender). Squark suggests columns to remove by highlighting them in yellow. If you don’t know which columns you might want to include, don’t worry, during the feature engineer process the application will make these determinations for you.

  1. See results

Results will include an output file with prediction of churn (or not) as well as probabilities for every record in the production file, something like this:

In addition, you will see a leaderboard table showing which AI algorithm was used, and its performance accuracy compared to others:

In this example the Stacked Ensemble did best, achieving nearly 85% accuracy (AUC). The leaderboard is further explained in our glossary. Good news is that most of the terms in the leaderboard are meaningful only to data scientists and serve to illustrate the validity of Squark’s results when they are involved. Day-to-day use by business analysts will focus on AUC.

Variable Importance is also shown in the results, displaying which columns were most predictive of the outcome. As you can imagine, understanding which factors are most predictive of a customer retention could be very useful in both segmenting and targeting messages.

Deployment

Now you have a table with predictions and probabilities that can feed the outreach program offers. You will establish a cut-off point at which you feel a “yes” is likely enough to yield results and send the offer only to customers above that point.

Results

Monitor the results of your prediction-informed campaigns and compare the yield to previous campaigns. If the revenue improves by a significant margin and costs stay level or go down, you have a clear data for ROI calculations. Make decisions on future campaigns on this basis and experiment by expanding to different questions. For example, Squark could help you predict:

  • Which one of two or more offers would be most likely to be accepted by a customer?
  • Which advertising message on medium would attract highest lifetime value customers?
  • What is the lifetime value forecast for each customer?
  • Which customers would buy additional services if offered them?
  • What is an accurate total revenue forecast for next period?

Conclusion

Taking advantage of the power of AI is simple than you imagined. Squark makes the power of automated machine learning accessible to everyone, with no need to have a data science degree or know how to write programs. You can get started immediately with tabular data in familiar formats, or integrate closely with existing systems.