Feature leakage, a.k.a. data leakage or target leakage, causes predictive models to appear more accurate than they really are, ranging from overly optimistic to completely invalid. The cause is highly correlated data – where the training data contains information you are trying to predict.
If you are a Squark user, you’ll be happy to know that our AutoML identifies and removes highly correlated data before building models. Squark uses cross-validation and holds back a validation data set as well. Squark always displays accuracy and variables of importance for each model.
Squark is a no-code predictive analytics SaaS that automatically analyzes business data you work with every day to predict customer outcomes, uncover new and unforeseen business opportunities, and mitigate risk. It is a simple and easy-to-use AI software that doesn’t require data science nor technical expertise. In a matter of clicks, make confident, data-backed decisions about what your customers will do next and understand why to increase your business impact.