This is a glossary of Machine Learning terms commonly used in the industry. We will add more terms related to machine learning, data science, and artificial intelligence in the coming future. Meanwhile, if you want to suggest adding more terms, please let us know through the message below.
Accuracy is used to evaluate any classification model. It is defined as the percentage of the total number of correct predictions. Mathematically it is represented as:
In Machine Learning, an algorithm is a procedure applied to the data to create a machine learning model. e.g., Linear Regression, Decision Trees.
Process of assigning labels to the unlabeled data. For example, in the handwritten digit recognition task, if we assign a value of 8 to the image of 8.
ANNs are Machine Learning algorithms inspired by biological neural networks that constitute animal brain cells.
An aspect of an instance. If we talk about the structured data and store the values in a tabular format, the columns represent Attributes. For example, suppose we want to estimate today's atmospheric temperature, and for that, we recorded the atmospheric pressure, wind speed, and other essential properties. These properties are known as attributes.
Area Under ROC Curve represents the classification model's aggregate performance for all classification thresholds. The ROC curve represents the variation in the true positive rate with respect to the false-positive rate.
Bias helps generalize the results by making our models less or more sensitive toward any feature or data point. Bias is considered a systematic error in the machine learning model due to incorrect ML process assumptions.
Error caused by algorithm's tendency to consistently learn the wrong thing by not taking all the data's information into account.
The right-side diagram below shows that points scatter all around the circle's center, hence having a lower bias.
But there is a high bias in the right diagram, as scatter happens only in a particular direction.
Low Bias vs. High Bias
Classification is a problem statement in Machine learning where models try to predict the output category. There can be two types of classification:
It is the limiting value based on which a particular decision is made. Suppose a machine learning model predicts a cat's presence in any image with the surety of X%. We have set criteria that if confidence > 60%, then that will be a valid prediction. Then the threshold value is 60 for the classification.
A type of unsupervised learning where the model groups the input data into different buckets based on some inherent data features. Generally, clusters consist of items having similar characteristics. The most commonly used clustering algorithms are K-Means, Hierarchical Clustering, and Affinity Clustering.
A metric for performance measurement of machine learning classification problem where output can be two or more classes. It groups the prediction into four categories,
A state during training a machine learning model when a change in the loss values becomes less between consecutive epochs. More specifically, if the change in the cost of the loss function is very minute, then it could be said that the model has found the minima, or its position will not change further, i.e., it has converged.
A subfield of machine learning that deals with algorithms based on Artificial Neural Networks and is capable of understanding the temporal and spatial dependencies. It is also known as deep structured learning.
Dimension in machine learning means the number of features that have been used as Inputs for the machine learning algorithms.
A type of regulariser that is used to prevent over-fitting by dropping out hidden or visible units while training neural networks.
1 Epoch = 1 iteration over the entire dataset.
A type of estimation beyond the original observation range.
Mathematically it is calculated as:
Features are known as attributes and values (finally used for training). Temperature is Attribute, and Temperature = 25°C is a feature.
A feature vector lists all the features fed to the ML model.
The loss function value reached a minimum globally over the entire loss function domain. It is the smallest overall value of a function over its entire range.
Layers in-between the Input and Output Layers in a neural network are hidden layers.
A parameter whose value is used to control the learning process. E.g., The number of hidden layers in a Neural Network.
A sample, row of feature values in the dataset. It is also called Observation.
Itmeanseach random variable of the sample has the same probability distribution, and all are mutually independent.
The output data is used in training a supervised learning model. e.g., To train a Cat Classifier Model, we need to prepare a dataset in which we label the image by saying whether it is a cat or not cat.
A tuning parameter in any optimizing problem determines the step size at each epoch while moving towards any loss function's minima (Global / Local ).
In simple terms, Loss = (Actualvalue) - (Predictedvalue). It's the same as error; hence, the Lower the loss value, the better the model (Unless overfitted)
The value of the loss function becomes minimum at that point in a local region. It is a point where the function value is smaller than nearby but possibly greater than at a distant point.
A computer science field that gives computers the ability to learn without being explicitly programmed.
Model is the output of any ML algorithm run on the data. It is a data structure that stores weight and bias matrices containing the learned parameters.
Machine Learning algorithms are inspired by biological neural networks, which constitute animal brain cells.
Rescaling feature values to constrain dataset values to a standard range in any regression problem. It improves computation speed.
Additional meaningless information present in the data.
Accuracy can be achieved by directly predicting the most frequent class in any classification problem.
A sample, row of feature values in the dataset. It is also called an instance.
Methods that change the value of parameters so that losses reach the minimum. They are used to solve optimization problems by minimizing the cost function. E.g., Gradient Descent
Data Samples that differ significantly from other observations.
A situation when the model training error becomes significantly less than the model testing error. In this case, the model performs very well on training data but poorly on test data.
Variables whose value we learn from training any machine learning model. e.g., Weights of neural networks.
Precision tries to answer the question, What portion of True Positive is actually correct?
Recall tries to answer the question of, What portion of Positive is identified correctly?
A type of machine learning in which prediction output is continuous.
A technique that is used to combat the problem of overfitting.
A subset of machine learning in which learning is based upon maximizing the reward based on the actions taken by the agents acting in an environment.
A graph of True Positive Rate vs. False Positive Rate is used to check a classification model's performance at different classification thresholds.
Training a machine learning model under the supervision of a labeled dataset.
Data samples are used to check the generalisability of the machine learning model. These sets are unseen to the model.
Dataset used in training the machine learning model.
A method in which a machine learning algorithm picks the weights of already trained models and fine-tunes them as per the problem's requirements.
Same as Recall.
Same as False Positive
Same as False Negative
A situation in which the machine learning model does not learn the variation present in the data.
For ANN, if a Model is trained for the input range of (a,b), the model would be expected to perform well on the test data set, which lies within (a,b) only.
A class of machine learning in which training is based upon an unlabelled dataset. E.g., Dimensionality Reduction, Clustering.
A dataset that is used to validate the trained model while training by checking the generalisability of tuned parameters.
An error from sensitivity to small fluctuations in the training dataset. It can be of two types:
Predictions are hitting the bull's eye in the left diagram below while testing the model. Hence it has low variance. And in the right diagram, predictions are scattered and failed to converge while testing; hence it has high variance.
low Variance High Variance
A learnable parameter in machine learning.
Standardization is also known as Z-mean normalization.
Additional Reference to Explore: https://developers.google.com/machine-learning/glossary