All types of Machine learning models can be classified into three categories based on the nature of the algorithm used to build that model. These are Statistical Machine Learning, Artificial Neural Networks, and Deep Learning. Traditional Machine Learning approaches failed to learn complex non-linear relationships in the data, and ML engineers wanted an approach to tackle these challenges.
That's when Artificial Neural Networks (ANNs), also called Neural Networks, got discovered after deriving inspiration from the human brain. While the comparison between ANNs and the human brain is superficial, it helps us understand ANNs more simply. It is a supervised learning algorithm capable of solving classification and regression tasks. This article will set a foundation to dive deeper into this field.
After going through this blog, we will be able to understand the following things:
Let's start the journey toward diving deeper into the domain of Machine Learning.
The human brain is a complex system, and its working is still a mystery in medical science. It learns and adapts through the self or others' past experiences. Neurons, the brain's basic building blocks, play a crucial role in this process. Billions of such neurons form connections inside our brains and store the learned experiences as memories. When our sensory organs, such as our eyes, skin, and ears, encounter similar situations, the brain uses those memories and responds similarly.
An example of this is learning to drive a car. Our brains experience various situations on the road and learn how to respond. Once learned, the brain uses signals from our eyes, nose, and ears to control the vehicle's various parts. When we encounter new situations, the brain adapts and modifies the stored learnings. We can correlate this scenario to learning a mapping function on input and output signals and give responses whenever similar information comes.
This process is mimicked in neural networks, where these algorithms use interconnected neurons to map the complex non-linear functions on the input and output data.
If we define neural networks in plain English, we can say:
“Artificial Neural Networks are user-defined nested mathematical functions with user-induced variables that can be modified in a systematic trial-and-error basis to arrive at the closest mathematical relationship between given pair of input and output.”
Let's know each of the terms used in the above definition, and then we will define it in our terms.
In any field of machine learning, machines try to map mathematical functions on input and output pairs. But there can be infinite possibilities to search on which function would be best-fit on input and output sets. Our ML algorithm may take forever to find the perfect function. Hence, it raises the question of how machines decide this should be the mapped function.
Machines need help here. We need to provide some starting point to machines by defining a mathematical function that includes adjustable parameters. With this, the machine's search becomes limited, allowing for more efficient and effective learning.
To understand this better, correlate the scenario with what we did while developing the Linear Regression model. We need to precisely define that this degree of polynomial should be fitted on the given dataset. For example, for fitting a three-degree polynomial, we started with this function: Y_predicted = θ3* X³ + θ2* X² + θ1* X + θ0. Machines try to find the optimal values of θs.
These are the θs in the above equation. ML models attempt to find the optimal values for the parameters that make the final function as similar as possible to the actual function. But what's the benefit of tuning the parameters?
Neural networks are supervised learning algorithms, meaning the algorithm will have information about the true output while training. By tuning the parameters, ML algorithms will ensure that their predictions should be as close as possible to the true values.
These parameters, also known as weights, assign importance (weightage) to the input parameters. For example, in the function Output = 2*Input + 3, the value 2 represents the weight applied to the input. If the input has multiple features, machines will set the importance for each feature. In simple terms, if the input vector is multi-dimensional, the weight vector will also be multi-dimensional.
These weights are trainable parameters and are modified based on the input and output samples during the learning process.
Machine Learning models represent the relationship between input and output pairs as mathematical functions. These functions have two main components:
In the diagram below, there is a single neuron in the hidden layer, but in practice, there can be multiple neurons in one or multiple hidden layers between the input and output layers. Schematically, the input of the first hidden layer is the input layer, and the output of the first hidden layer serves as the input for the second hidden layer. The final output from the output layer is the desired output from the neural network.
So, if a function exists between the Input layer and the first hidden layer, the output for this function will be used as input to the function present between the first and second hidden layer, and so on. When we look at the overall process in Neural Network, it will seem like a complex composite function, which is nothing but a nested mathematical function: f1(f2(…fn(x))..).
The error function measures the difference between the ML model's predicted and true values. When we average this error over all the data samples, it becomes the cost function. The goal of the ML algorithm is to minimize this cost function by adjusting the values of user-induced variables.
In neural networks, this cost function can have multiple minimum values. Out of all these minimum values, we want to reach the position of global minimum of the cost function. Hence, adjusting the parameters gradually ensures that the cost function reaches a global minimum.
If the user-defined mathematical function cannot capture the complex patterns in the data, a different form of the mathematical function may be used, or the algorithm's complexity can be increased. This process of adjusting the function is done systematically to ensure that the model should learn what is required from the data.
One question must be coming to our mind, Why do machines not learn the exact values for the parameters so that they predict exactly equal to the true values?
Our objective in machine learning is to give predictions as close as possible to the actual values. But, Sometimes, with the constraints of computing power and time limit, we settle with the closest function rather than trying continuously to make error zero.
For example, a user-defined mathematical function may be aX + b, where X is the input variable and [a and b] are user-induced variables that the machine can adjust while finding the best mathematical relationship. However, suppose the actual dataset follows the mathematical function 2X + 3. In that case, the machine may only be able to find a mathematical relationship close to the actual one, such as 1.9X + 3.2, based on the given samples and the number of iterations it was allowed to perform. This result is not an exact match, but it is the closest approximation the machine could find based on the given conditions.
Graphs are very popular in Data Structures and Algorithms. If we observe the Neural Networks closely, it is nothing but a particular type of graph where every node is directed towards another node in an acyclic manner. In DSA, this type of graph is popularly known as DAG (Directed Acyclic Graph). When we define Neural Networks in our programs, we need to define each node of the graph to form a directed acyclic graph.
A directed graph is where every connection between the nodes (vertices) is directed from one node to another. If one follows this direction, one will not revisit the same node. This property is also quite prevalent in Neural Networks, as data samples move from the input layer toward the output layer.
With all the above information, we might have understood what exactly is present in any Neural Network. Now let's learn how exactly this nestedness work.
When designing the structure of Neural Networks, it is crucial to consider the following:
Some of the key benefits of using Neural Networks are:
While Neural Networks have many advantages, there are also some drawbacks. Some of them are:
Neural Networks are particularly well suited for those problem statements where the datasets exhibit high levels of non-linearity. For example:
Artificial Neural Networks is a supervised learning algorithm that solves classification and regression problems. This is considered a fundamental concept in Deep Learning, and we learned the basics in this article. We learned about the information required to design an ANN and its advantages and disadvantages. In upcoming blogs, we will dive deeper into the components of ANN.
Enjoy Learning!