Neural networks are a highly sought-after topic in the software industry today. In a previous article, we discussed the fundamentals of neural networks (NNs). However, understanding the components that make up an neural network is crucial for gaining a comprehensive understanding of the concept. In this article, we will delve deeper into the anatomy of an neural network and examine the importance of each element, including input layers, hidden layers, output layers, neurons, connections, weights, biases, activation functions, and cost functions.
Before moving any further, let's first look at the schematic diagram of NN.
The input layer is the first layer of any Neural Network and represents the input data to the network. Each neuron, represented as small circular nodes (x1, x2, …, xn) in the diagram above, corresponds to one feature of the dataset. For example, in a house price prediction model, the input layer may have neurons for the house's size, distance from the railway station, and distance from the market. Understanding the input layer and its role in the neural network is crucial for designing and training efficient models.
The output layer of a Neural Network represents the final predictions generated by the network. The number of neurons in this layer corresponds to the number of outputs desired for a given input. In a regression problem, where a single output value is expected, there will be one neuron in the output layer. However, in classification tasks, where multiple output classes are possible, there will be multiple neurons, one for each class. For example, in a handwritten digit recognition task, there will be 10 neurons corresponding to the 10 possible classes (0-9).
Sometimes, the input and output pairs can have complex relationships, and to decode these relations, hidden layers exist between the input and output layers. Hidden layers also contain neurons; every Neuron connects to every other Neuron in adjacent layers. For example, neurons in hidden layer 1 will be connected to every Neuron in the input layer and hidden layer 2.
Why not use immensely deeper networks to learn all complex relations in the data? There is a tradeoff between accuracy and latency. With the increment in the number of layers and nodes, we might achieve better accuracy, but that will cost us more computations, more power, and, ultimately, more money. Hence, while designing any NN, we must consider finding the answers to the following:
The third one is interesting as we also drop some neuron connections to ensure generalization and reduce the problem of overfitting. Overfitting is a problem for ML models where the model learns everything present in the training data sample and fails with a significant margin on unseen datasets, i.e., test datasets.
Neurons play a crucial role in the functioning of a Neural Network, as they constitute every layer, including the Input, Output, and Hidden layers. Similar to the nucleus of brain cells, each neuron, except those in the Input layer, contains a bias parameter that the Neural Network learns and adjusts during the training process. These bias values are typically initialized with random numbers, and the Neural Network fine-tunes them to minimize the difference between the computed and actual output.
The connections between neurons in a Neural Network are crucial for the learning process. Each neuron in one layer is connected to every neuron in the adjacent layers. These connections are represented by a weight value, which determines the importance of that connection. The weight values are the trainable parameters that the Neural Network learns by iterating over the training dataset. The optimization of these weight values is crucial for the overall performance of the Neural Network and is a key aspect of the learning process.
Fully connected Neural Networks, also known as feedforward networks, are a popular choice for basic neural network architecture. These networks connect every neuron in one layer to every neuron in the adjacent layers, making them highly versatile for various datasets. In deep learning, other forms of connected networks, such as convolutional neural networks, are also commonly used."
The weight matrix, a combination of weight and bias values, is a crucial aspect of Neural Networks. It represents the learnable parameters of the network, and it helps the model make predictions. The weight matrix is used to map the input to the output, and it is in the form of a matrix, which makes it easy to compute and update during the training process. Understanding the weight matrix and its role in Neural Networks is essential for designing and optimizing machine learning models.
In our brain, Neurons get activated based on the signals received through various sensory organs. As these signals can be related to different tasks, different neurons activate and provide the required responses. Similar to this, in Neural Networks, we have activation functions for Neurons of every layer. We define the activation function for a layer, and all the neurons in that layer follow that same activation function.
Neurons of every layer get the input from the previous layer multiplied by weight values. Based on the relationship we want to maintain between weighted input and the corresponding output from the Neuron, we can divide activation functions into two types:
We will dive deeply into these activation functions in our subsequent blogs.
Loss and Cost functions are one of the most vital components of any Machine Learning algorithm. Machines only understand numbers; hence we need to convey our objectives in numbers. If our framed objectives in the form of numbers do not represent what we want our machines to learn, the fault will be ours that machines will fail to learn. In our loss and cost function in ML blog, we discussed how we define a regression problem using the cost functions MAE, MSE, RMSE, etc. Similarly, we define the classification problems in the form of Binary cross-entropy and Categorical cross-entropy based on the number of categories in the data.
Machines try to optimize these costs by changing the parameter values and achieving the best suitable ones for which the cost becomes minimum. Some of the most common things that we need to keep in mind while designing a cost function for NN:
Optimizing the cost function in Machine Learning and Neural Networks is a crucial task, as trying all possible values for the weight matrix would take an excessive amount of time, even with the use of supercomputers. To aid in this process, various optimization algorithms such as Gradient Descent, Gradient Descent with momentum, Stochastic Gradient Descent, and Adam (Add a momentum) are utilized. Understanding these optimization algorithms is a common topic in Machine Learning interviews, thus, we will delve into each of these algorithms in separate articles. It is important to note that these optimization algorithms are only applied to the output layer, as the cost function is defined for this layer only.
When working with Neural Networks, there are two types of parameters to consider: trainable parameters and hyperparameters.
Hyperparameters (Untrainable Parameters): When building an Artificial Neural Network (ANN), it is important to set certain fixed values, known as hyperparameters, which are then fine-tuned through experimentation to achieve the lowest possible cost value. Understanding and optimizing these hyperparameters plays a crucial role in effectively training and utilizing ANNs for a wide range of Machine Learning tasks, and helps to ensure the best possible performance and accuracy in predictions.
What's the difference between trainable and untrainable parameters?
When training an Artificial Neural Network (ANN), it is important to understand the difference between trainable and hyperparameters. Trainable parameters, such as weights and biases, are continuously updated during an experiment to minimize the cost function. On the other hand, hyperparameters, such as the number of hidden layers or the learning rate, remain mostly fixed or are modified in a systematic manner across several experiments to ensure the overall minimum cost function. For example, the learning rate, which controls the magnitude of updates to trainable parameters, can be kept constant during one experiment and adjusted through multiple experiments to find the best value for the specific dataset and problem. Hyperparameters are also of two types:
Hence, there is a concept of batch learning where we update the weight matrix values based on the cost averaged over samples in the batch. This type of learning can eliminate the impact of outliers present in the dataset as it gets averaged over all the samples in that batch. Some of the standard batch sizes we see in general experiments are 64, 32, 16, etc.
There can be other hyperparameters that we will see as progress toward deep-learning algorithms.
In this article, we have examined the inner workings of Artificial Neural Networks (ANNs) in depth. We have broken down all the components that make up this popular Machine Learning technique, including the Input Layer, Hidden Layer, Output Layer, Weight Matrix, Cost Function, Parameters, and Hyperparameters. We hope you enjoyed reading the article.