Area Under the Curve or AUC

AUC (in Binary Classification Only) is used to evaluate how well a binary classification model is able to distinguish between true positives and false positives. An AUC of 1 indicates a perfect classifier, while an AUC of .5 indicates a poor classifier, whose performance is no better than random guessing.

Artificial Intelligence (AI)

Computer systems patterned after human intelligence in their ability to learn and recognize so that previously unseen information can be acted on in ways that produce useful results. The foundations of AI include logic, mathematics, probability, decision theory, neuroscience, and linguistics.

Artificial Neural Networks (ANNs)

Algorithms loosely modeled after the human brain, with layers of connected elements that send information to each other in the way human neurons interact.

Big Data

Data sets that are so large or complex that traditional data processing applications are inadequate to deal with them.

Black Box Algorithms

Algorithms with output and decision-making processes that cannot readily be explained by developers or the computer itself.


Classification algorithms enable machines to assign a category to a data point based on the model built from training data. Binary classifications predict “yes-no” or “in-out” for each row when there are only two choices. Multi-variate, or multinomial, classifications predict the probability that a data set member is in one of three or more classes.


Clustering algorithms let machines group data points or items into groups with similar characteristics.


Coefficients indicate the relationship of independent variables to the dependent variable in a model. Positive coefficients show that as the independent variable moves upwards, so does the dependent variable. Negative coefficients indicate that as the coefficient goes down, so does the dependent variable.

Computer Vision

Use of AI to examine and interpret images to define or recognize them like the way humans see.

Confusion Matrix

A Confusion Matrix, if calculated, is a table depicting performance of prediction models on false positives, false negatives, true positives, and true negatives. It is so named because it shows how often the model confuses the two labels. The matrix is generated by cross-validation – comparing predictions against a benchmark hold-out of data.