SMOTE

SMOTE stands for Synthetic Minority Over-sampling Technique. Oversampling is a technique used to manage class imbalance in data sets. Data set imbalance occurs when the category you are targeting is very rare in the population, or where the data might simply be difficult to collect. SMOTE is helpful when the class you want to analyze is under-represented.

SMOTE works by generating new instances from existing minority cases that you supply as input. SMOTE does not change the number of majority cases.

New instances are not just copies of existing minority class instances. SMOTE synthesizes new minority instances between existing (real) minority instances. The algorithm takes samples of the feature space for each target class and its nearest neighbors, and generates new examples combining the features of the target case with features of its neighbors. This approach increases the features available to each class and makes the samples more general.