New Search

If you are not happy with the results below please do another search

25 search results for:

21

How Should Dates and Times be Modeled?

Factoring and Time Series How date and time features are important in models. Many data sets contain date-time fields which we hope will provide predictive value in our models. But date-time fields in the form of MM-DD-YYY HH:SS are essentially unique data points. In addition, the order in which events occur may have a bearing on […]

22

What Is Feature Engineering?

What Is Feature Engineering? Data sets can be made more predictive with a little help. “Features” are the properties or characteristics of something you want to predict. Machine learning predictions can often be improved by “engineering”— adjusting features that are already there or adding features that are missing. For instance, date/time fields may appear as […]

23

Why 50% Isn’t Always The Prediction Cut-off

There Is this “F1” Thing Why 50% probability isn’t always always the prediction cut-off. Say you are classifying 100 examples of fruit and there are 99 oranges and one lime. If your model predicted all 100 are oranges, then it is 99% accurate. But the high accuracy veils the model’s inability to detect the difference […]

24

Feature Leakage – Causes and Remedies

Do your models seem too accurate? They might be. Feature leakage, a.k.a. data leakage or target leakage, causes predictive models to appear more accurate than they really are, ranging from overly optimistic to completely invalid. The cause is highly correlated data – where the training data contains information you are trying to predict. How to […]

25

Your Data Does Not Have to Be Big

Your Data Does Not Have to Be Big In fact, certain algorithms work well with smaller datasets. Some models do require big datasets to deliver significant predictive power. But don’t assume that you need hundreds of feature columns or millions of rows. We’ve seen surprisingly usable accuracy from as few as a hundred rows and a […]