Do your models seem too accurate? They might be.

Feature leakage, a.k.a. data leakage or target leakage, causes predictive models to appear more accurate than they really are, ranging from overly optimistic to completely invalid. The cause is highly correlated data – where the training data contains information you are trying to predict.

How to Minimize Feature Leakage:

  1. Remove data that could not be known at the time of prediction.
  2. Perform data cross-validation.
  3. If you suspect a variable is leaky, remove it and run again.
  4. Hold back a validation data set.
  5. Consider near-perfect model accuracy a warning sign.
  6. Check variables of importance for overly predictive features.

If you are a Squark user, you’ll be happy to know that our AutoML identifies and removes highly correlated data before building models. Squark uses cross-validation and holds back a validation data set as well. Squark always displays accuracy and variables of importance for each model.

Recent Posts

Why You're Ready for Predictive Insights
Squark 1.0 Announcement

Squark is a no-code predictive analytics SaaS that automatically analyzes business data you work with every day to predict customer outcomes, uncover new and unforeseen business opportunities, and mitigate risk. It is a simple and easy-to-use AI software that doesn’t require data science nor technical expertise.  In a matter of clicks, make confident, data-backed decisions about what your customers will do next and understand why to increase your business impact. 

Copyright © 2021 Squark. All Rights Reserved | Privacy Policy