How Important Is Explainability?

How Important Is Explainability?

Understanding the inner workings of ML algorithms may distract from realizing benefits from good predictions.

Explainability Explained

With the rise of artificial intelligence has come skepticism. Mysteries of how AI works make questioning the “black box” natural. “Explainability” refers to being able to trace and follow the logic AI algorithms use to form their conclusions. In some cases—particularly in unsupervised learning—the answer is, “We don’t know.” How disconcerting is that? Whether or not the answers are valid, not being able to “show your work” engenders suspicions.

Supervised learning is different. Algorithms such as trees and linear regressions have clearly defined math that humans could follow and arrive at the same answers as automated machine learning (AutoML), if only they had time to work through millions of calculations. Nevertheless, unfamiliarity with the data science of supervised learning also causes doubts.

Explainability could be a concern when …

  • Making life-or-death decisions
  • Proving court-of-law-style certainty
  • Completely eradicating biases

Fortunately, most practical business decisions do not need to pass those tests.

How to Convince Decision Makers to Trust AutoML

First of all, emphasize the ROI of good predictions. In marketing, one of the most common use cases for AutoML, predictions only have to be a little better than a coin flip to deliver huge returns. Next, show the evidence:

  • Model accuracy is calculated by comparing AutoML predictions to known results. This same accuracy will prevail for new, unseen data.
  • Squark shows lists of the factors that were most predictive, which explains enough of the model’s logic to inspire confidence.
  • Data features can easily be added and removed to test biases and understand predictive behaviors.

A Useful Analogy

Before using Google Maps to find the best route to your destination, do you investigate which AI algorithms it used and why it chose that exact route? Of course not. The reasons are simple:

  • The algorithms are not understandable unless you are a data scientist.
  • The results are usually very good.
  • Time wasted checking the process delays reaching your goal.

Squark displays algorithm performance information along with prediction results. This shows how a multiplicity of models were evaluated to make sure the best one was utilized. Think of the lower-ranked models as the gray, “12 minutes slower” routes and start your journey with confidence.

How are Supervised and Unsupervised Learning Different?

Showing the way vs. stumbling in the dark – there are applications for both.

Supervised

Supervised Learning shows AutoML algorithms sets of known outcomes from which to learn. Think of classroom drills, or giving a bloodhound the scent.

Supervised learning relies on training data containing labels for data columns (features). Known outcomes must be included for the columns to be predicted on fresh data.

Use Supervised Learning for…

  • Performance-based predictions
  • Scoring the likelihood of things to happen
  • Forecasting outcomes

Classifications where there are two (binary) or more (multi-class) possibilities are use cases for supervised learning . Regressions—predicting scalar numerical values such as forecasts—are also suited to supervised learning.

Unsupervised

Unsupervised learning happens when AutoML algorithms are given unlabeled training data and must make sense of it without any instruction. Such machines “teach themselves” what result to produce.

Unsupervised learning algorithms look for structure in the training data, like finding which examples are similar to each other and grouping them into clusters.

Use Unsupervised Learning for…

  • Understanding co-occurrence
  • Detecting hidden data relationships
  • Extracting data

Clustering, market basket analyses, and anomaly detection are common use cases for unsupervised learning.

What Is A Confusion Matrix?

What Is A Confusion Matrix?

The most aptly named AI term is actually simple.

A Confusion Matrix is a table that shows how often an AI classifier gets confused predicting true and false conditions. Here is a simple example of a Confusion Matrix for a model that classifies whether a fruit is an orange or not, out of a sample of 166.

How Well Did My Model Do?

As you can see from the table, the classifier was pretty accurate overall. It was correct (true positives and true negatives) 155 times, or 93.37% of the time. It did very well at predicting when fruits were oranges – only one wrong (false negative), or about 99%. It was not as good at predicting when they were not oranges – 83.3% right and 16.7 % wrong (false positives).

Confusion matrices are especially informative when considering the consequences of false negatives versus false positives in your use cases.

Creating Marketing Value With Codeless AI

Earlier this year, Brett House of Nielsen and Judah Phillips from Squark presented at Analytics Nexus 2019. Here is a transcript of Brett’s remarks.

Watch the video

“Thank you, Judah. To quickly introduce myself, as Judah had mentioned, I run product marketing and SaaS demand generation for the Nielsen DMP and the Nielsen Marketing Cloud. I’ve done this for the last few years, and what we’re looking at here in this particular slide to give you a specific use case on how we’re using Squark, and predictive AI to better our ability to effectively generate revenue for our sales organization.

And this slide sort of sets the context for that. We’re looking at a B2B marketing demand funnel. From inquiry, which is sort of initial conversion, all the way through closed deals. We really needed to understand, as a marketing organization, how to better predict what drives prospects and new audiences through the demand funnel. What brings and initial inquiry to the Nielsen Marketing Cloud or the Nielsen DMP to actually being accepted as a sales lead and closed as a deal, which we can then attribute back to our own activities?

It really comes down to three, core criteria, which, very simply put, are understanding how we can predict WHO is most likely to become a sales qualified lead and a revenue point for the Nielsen Marketing Cloud, which comes down to personas. Who are we targeting? What is their title? What is the industry vertical that they are a part of? What is their function within their company? And which one of those attributes is most predictive of someone’s purchasing one of our products.

Number two is really WHAT. What kind of content are they engaging with from a B2B marketing perspective? We develop content that runs the gamut, from webinars, to owned events, to reports and white papers, to bylines and videos, to podcasts, and we’re tracking every touchpoint…

…across what you see as number three, WHERE these particular personas are interacting and engaging with our brand and with our product. And that’s really the channels of engagement. Media channels like pay-per-click advertising or display advertising; owned channels like nielsen.com product pages and solutions pages for our various products; social media, which could be earned content. And we’re tracking each one of these engagements through each one of these three criteria to understand who is most likely going to purchase a product from us and who is exhibiting purchase intent.

What Squark has really allowed us to do is to be more predictive of that flow from the top of the funnel, when someone downloads a white paper, or RSVPs to a webinar, or commits to an initial action, all the way through to the qualification phases which you see in the middle of the funnel, which are marketing qualified and sales qualified.

Marketing qualified we define as someone who fits two key metrics. One is an engagement metric, meaning how often have they engaged with us and where are they engaging with us? So, it’s really number two and number three – the what and the where. Whereas the fit metric, which is another way that we help predict purchase intent, is really number one – the who. Who is this person? What is their title? What is their function? And how does that help influence the purchase decision, and that path to purchase – all the way down to the bottom of the funnel?

At the end of the day, the whole purpose of this is to increase the size of the sales pipeline from a B2B marketing perspective, and it’s very reminiscent of what you see in B2C marketing as well. We look at every stage in that demand funnel. Our fundamental goal is really to build out the pipeline across all of those stages, because at the end of the day sales is a numbers game, and you want to ensure that you have enough people at the top of the funnel to feed those lower portions of the funnel, and to be able to better cater to those people’s needs as the interact with our brand. So, when they are coming to our product pages; when they are attending our events; when they are engaging with our content, we want to have a better understanding of where they are in the purchase cycle. Are they in the process of putting an RFP forth to other vendors? Can we get ahead of that curve, so that we are able to get in front of them before they release an RFP to the general public?

Squark is really what is allowing us to be more predictive – using a lot of data from a lot of disparate systems to enable us to understand who they are, what they like to see from a content perspective, and where they like to engage with us as a brand and as a product. And that helps us to optimize our media plan, optimize our media cross-channel strategy – whether it’s pay-per-click, or it’s owned email, or it’s what we are doing on nielsen.com. And optimize our audience segmentation so that we’re more effectively targeting the right people with the right content at the right time. And all of that, just like with B2C marketing, creates a better customer experience and enables us to more effectively and more inexpensively (most importantly) move people through that demand funnel, closer to that final, closed deal.

And AI is really what makes it faster; it makes it more accurate; and it helps us optimize our programs more effectively. Without that – and, believe me, I’ve worked in those circumstances without AI in the past – it’s a much more manual, much slower process, and your predictive capabilities are really diminished. Considering the amount of data; the amount of media inputs we see out there, it’s very difficult without putting your finger in the wind, to predict what’s going to be most effective in driving ROI for your marketing organization.

That brings me to this: We’ve implemented some of these predictive capabilities to get a baseline form which to develop our personas, our content, and our channel strategy. Now we really need to prove the impact of this – to measure the ROI and connect the dots between the media touch points – the engagement and conversion touchpoints that we are experiencing by distributing content across various channels – to our actual, marketing KPIs. As you saw in the earlier slide, those are connected to sales-accepted leads which, at the of the funnel, are closed deals and won revenue.

There is a wall between these two things. It’s very easy to track to the point of conversion – someone fills out a form; someone downloads something; someone attends an event – but to connect that to what’s going on in your CRM (and in our case we use Salesforce); what’s going on in your marketing automation platform (we use Pardot) is another challenge altogether. So you really need a system that’s able to integrate these various data sets and various platforms so that you can look at it holistically, and connect the dots between everything you are doing on the media and content side to how those things are impacting your most important KPIs that you present to the executive leadership.

So, this sort of summarizes it best. It’s how we attribute credit to the right channels and connect sales wins or conversions with tactics that is faster; that is more responsive; that is more intelligent than a small team would be able to do on its own. Even a large team with a large group of data scientists (and most marketing teams do not have the privilege of an army of data scientists) can’t really do things as fast as you need to be able to do them. To adjust creative. To adjust workflows in terms of they type of content we deliver and where we deliver it, based on consumer’s or customer’s past behavior or current behavior. All of those things need quick responsiveness, and AI is what helps us get to that point.

So, most importantly, Squark has really allowed us to quantify the marketing life-cycle. What I’ve depicted here, using the demand funnel I used in earlier slides, is some of the data inputs that are going into this decisioning engine, like marketing conversion rates, sales accepted rates, win rates, or average deal size. All of these things are helping this AI engine to be more predictive of sales and revenue and our core KPIs as a marketing organization.

The results have been phenomenal. And I attribute this to really the ability to automate a lot of the optimization that we need from both a personas perspective – who we’re targeting; a content perspective – what we’re targeting them with; and a channels perspective – what type of media, be it owned or earned, to engage with these consumers. By automating some of the decisioning around those three criteria, we’re better as a marketing organization. Better at predicting marketing qualified leads to sales qualified leads – the conversion point between those two. Which of our marketing leads are converting into the sales qualified category and why and how based on those three criteria? Which sales qualified leads are being accepted by the sales organization and followed up on for meetings? And finally, the ability to predict revenue, which serves obviously a few purposes – mainly our ability to forecast what we’re doing and the business results we’re, as a marketing organization, able to drive. This gives us buy-in with the sales organization. It gives us better integration with the sales organization for bringing these programs to life and closing the leads that we are generating. Without that, the system falls down from a B2B marketing perspective.

And finally, driving return on investment. So, we look at every dollar we spend across our paid media, across our content creation, etc. and we’ve seen, since implementing Squark, an 8X increase in our return on investment. So that ability to tell compelling and true stories about how your marketing investment and time is generating pipeline and revenue is essential for demand generation. That’s exactly what we are able to do more effectively with AI powering a lot of our decision making.”

What Is Overfitting and How Do I Avoid It?

What Is Overfitting?

Telltale Super-Accuracy on Training Data

When machine learning models show exceptional accuracy on training data sets, but perform poorly on new, unseen data, they are guilty of overfitting. Overfitting happens when models “learn” from noise in data instead of from true signal patterns.

How to Avoid Overfitting

Detecting overfitting is the first step. Comparing accuracy against a portion of training that was data set aside for testing will reveal when models are overfitting. Techniques to minimize overfitting include:

  • Tuning Hyperparameters – Hyperparameters are descriptions of data set properties—information about the data, not the data itself. Hyperparameters can be used to adjust settings for different families of machine learning algorithms so they perform well and do not overfit.
  • Cross-Validation – Cross-validation splits training data into additional train-test sets to tune hyperparameters iteratively, without disturbing the initial test set-aside data.
  • Early Stopping – Machine learning algorithm training generally improves model performance with more attempts—up to a point. Comparing model performance at each building iteration and stopping when accuracy no longer improves prevents overfitting.

Squark Seer automatically employs these and other approaches to minimize overfitting. As always, get in touch if you have questions about Overfitting or any other Machine Learning topic. We’re happy to help.

How Should Dates and Times be Modeled?

Factoring and Time Series

How date and time features are important in models.

Many data sets contain date-time fields which we hope will provide predictive value in our models. But date-time fields in the form of MM-DD-YYY HH:SS are essentially unique data points. In addition, the order in which events occur may have a bearing on outcomes.

Date Factoring

Date-time fields can be separated into component variables of Month, Date, Year, Time, Day of Month, Day of Week, and Day of Year. Pre-processing data sets to add columns for these individual variables may add predictive value when building models. (Squark automatically factors date-time fields before model building and ranking.)

Time Series Forecasting

Models that consider the sequence in which events occur are called time series analytics. This technique is essential to account for factors such as seasonality, weather conditions, and economic indicators. Sales forecasts and marketing projections are  classic use cases for time series forecasting.

As always, get in touch if you have questions about using date-time in your predictions, or any other Machine Learning topic. We’re happy to help.

Date Factoring

Date factoring is a feature engineering technique that splits date-time data into its component parts. For instance, a date-time field with a format of MM-DD-YYY HH:SS can be separated into variables of Month, Date, Year, Time, Day of Month, Day of Week, and Day of Year. Pre-processing data sets to add columns for these individual variables may add predictive value when building models.

Where the sequence in which events occur is important, regression models that forecast values based solely on discrete date/time factors may not provide useful predictions. Sales forecasting or market projections are classic examples. See Time-series Forecasting.

Time Series Forecasting

Time series forecasting is a particular way of handling date-time information in model building. It takes into account the sequence in which events occur. This technique is essential when modeling regressions where factors such as seasonality, weather conditions, and economic indicators may be predictive of future outcomes. Consequently, sales forecasts and marketing projections are classic use cases for time series forecasting. Time series analysis utilizes algorithms that are specially tuned to predict using relative date-time information.