Statistics and machine learning differ in method and purpose. Which is superior depends upon your goals. Statistics is a subset of mathematics that interprets relationships among variables in data sets. Statisticians make inferences and estimate values based solely on data collected
How Important Is Explainability? Understanding the inner workings of ML algorithms may distract from realizing benefits from good predictions. Explainability Explained With the rise of artificial intelligence has come skepticism. Mysteries of how AI works make questioning the “black box”
Showing the way vs. stumbling in the dark – there are applications for both. Supervised Supervised Learning shows AutoML algorithms sets of known outcomes from which to learn. Think of classroom drills, or giving a bloodhound the scent. Supervised learning
What Is A Confusion Matrix? The most aptly named AI term is actually simple. A Confusion Matrix is a table that shows how often an AI classifier gets confused predicting true and false conditions. Here is a simple example of
What Is Overfitting? Telltale Super-Accuracy on Training Data When machine learning models show exceptional accuracy on training data sets, but perform poorly on new, unseen data, they are guilty of overfitting. Overfitting happens when models “learn” from noise in data
Factoring and Time Series How date and time features are important in models. Many data sets contain date-time fields which we hope will provide predictive value in our models. But date-time fields in the form of MM-DD-YYY HH:SS are essentially unique
What Is Feature Engineering? Data sets can be made more predictive with a little help. “Features” are the properties or characteristics of something you want to predict. Machine learning predictions can often be improved by “engineering”— adjusting features that are
There Is this “F1” Thing Why 50% probability isn’t always always the prediction cut-off. Say you are classifying 100 examples of fruit and there are 99 oranges and one lime. If your model predicted all 100 are oranges, then it
Do your models seem too accurate? They might be. Feature leakage, a.k.a. data leakage or target leakage, causes predictive models to appear more accurate than they really are, ranging from overly optimistic to completely invalid. The cause is highly correlated
Your Data Does Not Have to Be Big In fact, certain algorithms work well with smaller datasets. Some models do require big datasets to deliver significant predictive power. But don’t assume that you need hundreds of feature columns or millions of