Closing More at Lower Cost With Accessible AI Predictive Power

Business leaders aren’t looking for more raw information. They’re drowning in data.  Between 60% and 73% of all data within an enterprise goes unused for analytics according to Forrester.  Answers are what is in short supply. Those responsible for revenue growth have to balance the value of insights within that data, from the cost of extraction.  Here are the options …

  • Reports, Charts & Graphs
  • Purpose-built Tools 
  • Custom Data Science (AI & Machine Learning)  
  • AutoML (Automated Machine Learning)

Reports, Charts & Graphs are the meat & potatoes for most sales organizations.  While the attribution for the famous quote about managing only what you can measure is up for debate, the maxim is not. Identify a few key inputs, graph it over time to show you where you’ve been and how some stats relate to one another. This may be more or less automated by BI systems and dashboards, but the result is a clear look in the rear-view mirror. You bump into unexpected things driving that way.

Purpose-built Tools were quick to identify the biggest problem with AI & machine learning. Even though most users knew of AI’s power, few knew how to apply it to grow top-line revenue. These solutions were quick to grab an algorithm and apply it to a specific use-case like lead scoring, sales coaching, or identifying churn risk in your existing customer base. While these point products are often worth the investment, sales organizations are wary of yet another subscription to solve a problem that is rooted in data analysis. With AI becoming essential across the spectrum, the idea of buying multiple, AI-embedded layers in the sales and marketing tech stack is a frightening for several reasons:

  • Up to half of AI startups are actually not using any AI at all according to a study that evaluated 2,830 startups.
  • AI embedded in apps is often opaque. “Take our word for it, these are your hottest opportunities” is a big leap if you can’t see why they were ranked that way.
  • Layers of AI may produce conflicting answers, with no way to vet or normalize them.

Custom Artificial Intelligence & Machine Learning have dominated the conversation lately. Unfortunately, many tools built by and for data scientists and programmers are impossible for business people—even data-savvy analysts—to use. Coders gonna’ code. While large businesses can afford to throw massive resources at this, CXOs and sales leaders should bear in mind the prohibitive expense of staffing the “sexiest job in the 21st century”. Peek under the veneer of Google, Amazon, and Microsoft Ai and machine learning offerings and you’ll quickly discover eye-glazing references to GitHub and Jupyter notebooks and Python code.

AutoML Fortunately, there is an alternative for applying machine learning without using data scientists or their complicated tools of the trade. A new type of technology (AutoML) allows organizations to forgo much of the process by automating data preparation, feature engineering, and model creation. CROs, analysts, operations and sales enablement leaders can arrive at insights faster and provide their teams huge advantages.

Squark offer AutoML that can be applied across the entire sales operations and marketing decision making processes. Schedule a free assessment call with Squark to learn how we can help, or download our briefing, 14 Ways to Drive Sales Performance Using AutoML.

How AutoML Beats Scoring Formulas

AutoML’s ability to detect patterns and predict can out-perform algebraic formulas and Boolean logic in common tasks.

Anyone who has written a formula in Excel or adjusted parameters in an online application knows how even the smallest change can produce dramatically different results. The power of mass calculation is exactly what puts algebraic and Boolean logic at the core of nearly every productivity tool you use. But there are costs and risks.

“I just blew up my whole workbook.” “That one little setting changed the whole forecast?” “You should have told me that was an important criterion when we wrote the code.” Sound familiar? Programmed logic is limited by our ability to understand the problem and represent it in reliable formulas. Precision and timeliness are critical, so hard-coded logic requires careful maintenance too.

AutoML works by letting data speak for itself. By finding patterns and applying them to new data, machine learning can obviate coded logic and go straight to the answers. Here is just one example, lead scoring:

Formulas
CRM and marketing automation systems have features to rank leads by creating a “score.” Sales teams use lead score to prioritize activities. But scores are generated by applying weights that users set manually. How many points for a whitepaper download? How do page visits and time-on-page bump the number? Do three clicks on email calls to action mean I have a hot prospect? In practice, it is nearly impossible to fine-tune scoring models well and quickly enough for them to be useful. (More on this scoring example in this blog post.)

AutoML
By learning from patterns of known outcomes in existing data, AutoML can make predictions on new data. Lead tables contain tens or hundreds of columns of data, and AutoML discovers which are most predictive automatically. That means a prioritized list of leads can be generated in minutes based on the very latest trends with no coding. Squark AutoML even reports which variables were most important, so you learn buyer persona as a nice side benefit.

The takeaway: Find opportunities where AutoML can free you from the shackles of programming logic.

Feature Engineering

“Features” are the properties or characteristics of something you want to predict. Machine learning predictions can often be improved by “engineering”— adjusting features that are already there or adding features that are missing. For instance, date/time fields may appear as nearly unique values that are not predictive. Breaking the single date/time feature into year, month, day, day of the week, and time of day may allow the machine learning model to reveal patterns otherwise hidden.

You can engineer features on your own data sets, but automated feature engineering is now part of advanced machine learning systems. This ability to sense opportunities to improve data set values and do so automatically contributes vastly improved performance without the tedium of doing it manually.

Common kinds of feature engineering include:

  • Expansion, as in the date/time example
  • Binning – grouping variables with minor variations into fewer values, as in putting all heights from 5’7.51” to 5’8.49” into category 5’8”
  • Imputing values – adding, subtracting, or multiplying features that interact
  • Removing unused or redundant features
  • Text vectoring – deriving commonalities from repeated terms in otherwise unique strings

What Are Machine Learning Hyperparameters and How Are They Used?

Parameters are functions of training data. Hyperparameters are settings used to tune model algorithm performance.

In Automated Machine Learning (AutoML), data sets containing known outcomes are used to train models to make predictions. The actual values in training data sets never directly become parts of models. Instead, AutoML algorithms learn patterns in the features (columns) and instances (rows) of training data and express them as parameters that are the basis for the model’s predictions on new data. Parameters are always a funtion of the data itself, and are never set externally.

Hyperparameters are variables external to and not directly related to the training data. They are configuration variables that are used to optimize model performance. Think of them as instructions to the ML algorithms on how to approach model building. Each modeling algorithm can be set with hyperparameters appropriate to the particular classification or regression prediction problem.

Hyperparameter tuning in Squark is automatic. Squark makes multiple training passes, keeps track of the results of each trial run, and makes hyperparameter adjustments for subsequent runs. The progressive improvement in configuration values results in faster time to resolution of the best model accuracy.

The takeaway: Squark uses hyperparameters to learn how to learn better as it works through each model—inventing shortcuts and best practices—the way people do when attacking problems.

What is Bias in Machine Learning?

Bias occurs when ML does not separate the true signal from the noise in training data.

Biases in AI systems make headlines for results such as favoring gender in hiring, recommending loans based on ethnicity, or recognizing faces differently based on race. Some of these cases were due to biases baked into the algorithms written by (human) data scientists, but the majority merely learned from data that was itself biased.

How do you know if your business predictions are biased? Testing against broader sets of known outcomes is the best way. Since you don’t necessarily know which factors may be introducing bias, examination of the predictive importance placed on data features can help reveal them. Squark shows lists of Variable Importance for the models it generates. Click on the model name link In the Squark Leaderboard to see them. Different algorithms can produce different ranks for variable importance, which may lend insight.

Bias in Training Data
Selecting training data wisely is the best way to reduce bias. For instance, if the training data set you select is dominated by outcomes that you expect, it should be no surprise that the model will include confirmation bias

Bias in Algorithms
Algorithmic bias occurs when model building takes too few training variables into account. In data sets with large numbers of features (columns), algorithms that can handle only fixed or limited numbers of training variables show high bias and result in underfitting. Certain algorithms such as Linear Regression, Linear Discriminant Analysis, and Logistic Regression are prone to high bias.

The takeaway: If you think your predictions may show bias, experiment. Go back to the variable selection and select/deselect suspicious columns. Iterate as many times as you need to understand your data. At that point you may decide to revise the training and production files to reflect reality with less of a “thumb on the scale.”

Why AutoML Beats Lead Scoring Formulas for B2B

CRM and Marketing Automation systems have offered lead scoring features for decades. The notion of using arithmetic to turn qualification criteria and behavioral data into a simple-to-consume number makes sense. Business development and sales teams always appreciate guidance on which leads to follow next to maximize productivity. In practice, few organizations do excellent lead scoring, and it isn’t their fault.

Traditional lead scoring first assumes that you understand all the moving parts that indicate buying intent. BANT-style rating of readiness is important. Responses to emails, website visits, and social media tell a story. External databases of purchase sentiment are valuable. How much does each contribute? That is really difficult to determine in advance, so you take guesses and refine the model based on experience.

Scoring models also require constant tweaking. Elements calculated by legacy scoring models are increasingly complicated to monitor during the customer journey. Information that you already track changes rapidly, and new data appears that wasn’t counted before. Since scoring model updates are largely a manual process, iterations are time-consuming and error-prone. Even if you understand the interplay of variables perfectly, writing Boolean logic to express them accurately is nearly impossible. Tracing scoring model improvements back to KPIs is also tricky, which makes prioritizing maintenance effort an even greater challenge.

Automated Machine Learning (AutoML) is much better at ranking leads because it takes a cold, unbiased look at all the data and builds accurate models no matter how many elements there are. There are no formulas to build and maintain. The data tells its own story. Acting on the latest data, AutoML automatically takes the newest information into account and disregards factors that no longer predict success. It isn’t magic or mystery. Here’s how it works…

To train itself, AutoML ingests sets of data that represent known outcomes, such as lead tables for the previous period.  Lead tables typically show the basics—whether or not a lead converted to a different stage, became an opportunity, or closed won or lost. They may also contain many more values related variously to industry, company size, competitive status, time and date, inbound and outbound activity, or hundreds of other traits. Clicking a column in the table tells the AutoML which variable you want to predict. Will the lead convert from MQL to SQL? Is likely CLV forecast higher or lower than target? Which one of five possible messages will elicit a response? Think of any yes-no, in-out, or how much question.

The next step is to upload the new data, the leads on which you want to make predictions. The AutoML system learns the patterns and builds scores of models, testing them against one another to find the best. Running the new leads through the winning the model then happens quickly—minutes, typically. Output is a table with predictions appended, including probabilities for each record. Now you have a prioritized list of prospects that is richer than any scoring formula could provide.

In addition, AutoML shows which variables were most important to its predictions. This ranked list of variable importance amounts to a description of your ideal prospect persona. You will confirm what you knew, and maybe discover parameters that surprise you. This level of actionable insight is invaluable in targeting outreach to accelerate your pipeline. The world is too complex to rely solely on algebra. Taking advantage of the remarkable ability of AutoML to see relationships across vast data stores is the way to go. Give AutoML a try to see how forgetting to update scoring formulas could be your new, top priority.

Automated Machine Learning (AutoML)

Automated Machine Leaning (AutoML) refers to systems that build machine learning models with some degree less manual coding than a data science programmer would do building models from scratch.

At Squark, AutoML means absolutely no coding or scripting of any kind. This is the strongest definition of AutoML. All of the steps in making predictions with machine learning models – import of training and production data, variable identification, feature engineering, classification or regression algorithm selection, hyperparameter tuning, leaderboard explanation, variable importance listing, and export of prediction data set – through a SaaS, point-and-click interface.

Various other implementations of machine learning are dubbed AutoML, but actually require extensive knowledge of data science and programming. For example, you may need to select algorithm type, pick hyperparameter ranges, launch from a Jupyter notebook, know Python, or use other processes that are not familiar.

Hyperparameter

Hyperparameters are variables external to and not directly related to data sets of know outcomes that are used to train Machine Learning models. hyperparameter is a configuration variable that is used to optimize model performance.

Automated Machine Learning (AutoML) systems such as Squark tune hyperparameters automatically. Data scientists who build models manually can write code that controls hyperparameters to seek ways to improve model performance.

Examples of hyperparameters are:

  • Learning rate and duration
  • Latent factors in matrix factorization
  • Leaves, or depth, of a tree
  • Hidden layers in a deep neural network
  • Clusters in a k-means clustering
  • The k in k-nearest-neighbors