Data Mining describes patterns, correlations, and anomalies in data.
Mines are not the best analogies for the processes referred to as Data Mining. Never mind that we call data storage places bases, warehouses, and lakes. Extraction of raw data material is not the goal of data mining, but rather identifying characteristics within data sets that can be used to make decisions and predictions.
Think of Data Mining as applying statistics to make it easier for humans to understand past events recorded in data. By making assumptions and testing them, insights may be generated to help make decisions or predict general behavior in the future. Since all its variables are known and static, data mining itself cannot predict specific behavior on new variables.
Data Mining Processes
Here are some of the commonly used terms for tasks in data mining:
- Anomaly Detection – identifying records that are different enough from others to be checked as errors or outliers.
- Dependency Modelling – Identifying relationships among variables, such as market basket analysis for items frequently bought together.
- Clustering – Identifying characteristics of groups of records that are more similar to each other than to other groups.
- Classification – Calculating the probability that a record matches one or more sets of variables.
- Regression – Estimating the relationship among an independent variable and one or more dependent variables.
- Summarization – Creating a shortened example set of data, including reports and graphical representations.
Data Mining is good for preparing data and understanding variables that may be useful for predictions. The constraints of time and human analytical capacity to query, join, parse, and process large data sets makes Data Mining ill-suited to production predictive analysis.
Machine Learning to the Rescue
Automated Machine Learning (AutoML) automatically makes assumptions and iterates the models until it understands patterns—without the need for human intervention. This means that programming to account for every possible data relationship is unnecessary. The speed of results—even for large data sets—is remarkable. Best of all, the AI models can be applied to fresh data automatically, which is the essence of prediction.
The take-away: Data mining is useful to gain insights and to prepare data for predictive analytics, including AutoML. Machine Learning uses data patterns to predict future outcomes for new records.