F1 is a score between 1 (best) and zero (worst) that shows how well a classification algorithm did at training on your dataset. It is a check different from accuracy that measures how well the model performed at identifying the differences among groups. For instance, if you are classifying 100 types of wine – 99 red and one white – and your model predicted 100 are red, then it is 99% accurate. But the high accuracy veils the model’s inability to detect the difference between red and white wines.
F1 is particularly revelatory when there are imbalances in class frequency, as in the wine example. F1 calculations consider both Precision and Recall in the model:
Precision = How likely is a positive classification to be correct? = True Positives/(True Positives + False Positives)
Recall = How likely is the classifier to detect a positive? = True Positives/(True Positives + False Negatives)
F1 = 2 * ((Precision * Recall) / (Precision + Recall))
Max F1 threshold is the cut-off point for probabilities in predictions. When a row’s P1 (will occur) value is at or above the Max F1, the outcome will be predicted to happen in the future. If a row’s P0 (won’t occur) value is below the Max F1, the outcome will be predicted not to happen. This explains why the cutoff point is not always 50% as you might expect.
The optimal F1 threshold is approximately 1/2 the F1 score that it achieves. This gives you some intuition. The optimal threshold will never be more than .5. If your F1 is .5 and the threshold is .5, then you should expect to improve F1 by lowering the threshold. On the other hand, if the F1 were .5 and the threshold were .1, you should probably increase the threshold to improve F1.
Remember the max f1 threshold is not the same as the model’s f1 score (found in the advanced view). They are related but different as described above.