F1 is a score between 1 (best) and zero (worst) that shows how well a classification algorithm did at training on your dataset. It is a check different from accuracy that measures how well the model performed at identifying the differences among groups. For instance, if you are classifying 100 types of wine – 99 red and one white – and your model predicted 100 are red, then it is 99% accurate. But the high accuracy veils the model’s inability to detect the difference between red and white wines.
F1 is particularly revelatory when there are imbalances in class frequency, as in the wine example. F1 calculations consider both Precision and Recall in the model:
Precision = How likely is a positive classification to be correct? = True Positives/(True Positives + False Positives)
Recall = How likely is the classifier to detect a positive? = True Positives/(True Positives + False Negatives)
F1 = 2 * ((Precision * Recall) / (Precision + Recall))
Max F1 is the cut-off point for probabilities in predictions. When a row’s P1 (will occur) value is at or above the Max F1, the outcome will be predicted to happen in the future. If a row’s P0 (won’t occur) value is below the Max F1, the outcome will be predicted not to happen. This explains why the cutoff point is not always 50% as you might expect.