Area Under the Curve or AUC

AUC (in Binary Classification Only) is used to evaluate how well a binary classification model is able to distinguish between true positives and false positives. An AUC of 1 indicates a perfect classifier, while an AUC of .5 indicates a poor classifier, whose performance is no better than random guessing.

Artificial Intelligence (AI)

Computer systems patterned after human intelligence in their ability to learn and recognize so that previously unseen information can be acted on in ways that produce useful results. The foundations of AI include logic, mathematics, probability, decision theory, neuroscience, and linguistics.

Artificial Neural Networks (ANNs)

Algorithms loosely modeled after the human brain, with layers of connected elements that send information to each other in the way human neurons interact.

Automated Machine Learning (AutoML)

Automated Machine Leaning (AutoML) refers to systems that build machine learning models with some degree less manual coding than a data science programmer would do building models from scratch.

At Squark, AutoML means absolutely no coding or scripting of any kind. This is the strongest definition of AutoML. All of the steps in making predictions with machine learning models – import of training and production data, variable identification, feature engineering, classification or regression algorithm selection, hyperparameter tuning, leaderboard explanation, variable importance listing, and export of prediction data set – through a SaaS, point-and-click interface.

Various other implementations of machine learning are dubbed AutoML, but actually require extensive knowledge of data science and programming. For example, you may need to select algorithm type, pick hyperparameter ranges, launch from a Jupyter notebook, know Python, or use other processes that are not familiar.

Bias

Bias is the characteristic of models to learn from some variables and not others. Some bias is essential, since machine learning must predict based on data features that are more predictive than others.

High bias occurs when model training uses too few variables, due either to limited training data features or restrictions on the number of variables and algorithm is able to consider. High bias results in underfitting.

Low bias desirable, but is a trade-off with variance in algorithm performance.

Big Data

Data sets that are so large or complex that traditional data processing applications are inadequate to deal with them.

Black Box Algorithms

Algorithms with output and decision-making processes that cannot readily be explained by developers or the computer itself.

Classification

Classification in statistical or machine learning models refers to description of the relationship between a dependent variable (outcome variable) and independent variables (features) in data sets when comparing discrete values (integers, enumerations, strings, text vectors, etc.), as opposed to scalar (continuously variable) real numbers.

Machine learning classification algorithms assign categories to data set members based on the models built from training data. Binary classification models predict “yes-no” or “in-out” for each row when there are only two choices (classes) of independent variable. Multi-variate, or multinomial, classification models predict the probability that a data set member is in one of three or more classes.

Clustering

Clustering algorithms let machines group data points or items into groups with similar characteristics.

Coefficients

Coefficients indicate the relationship of independent variables to the dependent variable in a model. Positive coefficients show that as the independent variable moves upwards, so does the dependent variable. Negative coefficients indicate that as the coefficient goes down, so does the dependent variable.