Machine Learning vs. Statistics

Statistics and machine learning differ in method and purpose. Which is superior depends upon your goals.

Statistics is a subset of mathematics that interprets relationships among variables in data sets. Statisticians make inferences and estimate values based solely on data collected during a specific period, a rearward-looking view. Understanding how data was collected and the distributions of populations must be considered in model building. Statistics are useful where assumptions and probabilities must be mathematically auditable, such as when publishing a scientific paper on experimental observations.

Machine Learning (specifically, supervised learning) is a subset of computer science that uses past data to predict the future. The forward-looking view relies on training models using data sets of known outcomes and testing accuracy against test sets sequestered from the training data. The hold-back process proves that predictions on future data will be similarly accurate. Machine learning excels when there are large numbers of variables and records in data sets.

Conclusion: Use statistics for “court of law” explanations of what happened in the past. Use machine learning to make record-by-record predictions of future outcomes.