Neural Networks-not so Grey Matter
Artificial Neural Networks are inspired by the human brain and is considered as the base of the recent advancements made in the field of Artificial Intelligence Like: Voice Recognition, Self-drive cars, Experts systems, Chatbots etc.
Before jumping onto Artificial Neural Networks lets first try to understand different components of neuron and how they work in our brain.
Neurons are the computational units in our brain and is responsible for receiving the input from external activities and formulating our response for ex: Instruction to our muscles.
Each neuron is connected to one another through a structure called ‘Synapse’ . Neurons can also take inputs from each other through ‘Dendrites’. These inputs are been taken and processed in ‘Soma’ according to their importance and moved through ‘Axons’ and send to another Neuron
Artificial Neural Networks:
ANN works in the similar way where we have artificial neurons responsible for processing the input and send the output to other neurons for further processing.
Lets take an example to understand it better. Suppose we have multiple inputs x1, x2 ,x3 and x4 based upon their importance they are being multiplied by their weights β1, β2, β3,β4 and are summed together like, We add a value called bias b:
z = β1*x1 + β2*x2 + β3*x3 + β4*x4+b
This equations looks very familiar with the linear regression equation. In neurons we introduce non-linearity to our result (z) by applying a function to our output known as ‘Activation function’
Activation functions are used to introduce non-linearity in neural network. Without activation function, neural network is going to resemble with ‘Linear regression’
Single artificial neuron is shown below :
Activation function can be of many types like : Sigmoid, tanh, Rectified linear unit, Swish function etc.
Moving forward we will be focusing on How does ANN learn, but if you are interested in knowing more about activation function refer to this link
Layers of ANN:
Our brain can’t perform a task just with a single neuron. This is the reason our brain has billions of neurons forming a network. Similarly, we can’t use single neuron to solve a complex business problem. We need to use multiple decision layers consisting of more then one neuron to carry a complex task.
ANN consists of three layer :
- Input Layer : As going by the name its responsible for feeding the input to the network. Number of neurons in the input layer is equal to the number of inputs we are giving to our network. It is used for passing the information to the hidden layers as no computation is being performed in hidden layers
- Hidden Layer : Layers between the input and output layer is call hidden layer. Its responsible for processing the inputs received from input layer and find patterns in the dataset to achieve the output. There can be any number of hidden layers and the number of hidden layers are dependent upon the nature of problem we are trying to solve
- Output Layer : Hidden layer sends its output to the Output layer after all the processing is been done on the inputs received from input layer. Number of Neurons in the output layer depends on the problems we are trying to solve for ex: If its a regression problem then the number of neurons in the output layer is going to be one i.e. continuous value.
A simple ANN is shown below:
As we can see in the above image neurons in one layer interact will all the neurons in the other layer. However, neurons in the same layer never interacts with each other
I hope now we all have some understanding about what are the different components of neural networks and how it differs from statistical learning techniques like : Linear regression.
As mentioned above we need to assign weights to our neural network so that it can learn and the difference between the predicted output and the actual output is minimum.
The process of sending signals between neurons is knows as propagation. Similarly in ANN with the help of Forward and Backward propagation we can optimize the weight and reduce the error.
- Forward Propagation: As shown above (Fig 1.1) we have 3 inputs. So, the number of neurons in input layer are also going to be three. Now all the inputs must be multiplied with their corresponding weights and a bias must be added to them and pass it to a hidden layer where the activation function should be applied.
As we don’t know which features to give importance to and by how much. We, start with random initialization of weights between the Input →hidden layer. Once we apply activation function to the first hidden layer random weights is being assigned and a bias is been added between the 1st → 2nd hidden layer and the activation function is being applied and the similar process is being carried out between the Second hidden layer → Output Layer as shown below :
Once we have the output. We define one more function called cost function that can help us in measuring the performance of our neural network. We can use Mean Absolute error which can be defined as the Mean Absolute difference between the Actual and Predicted output as shown :
Once we have our cost function our objective is to minimize our cost function and to minimize the cost function we need to optimize our weights and bias that we have randomly initialized in the start.
To optimize the weights and bias we use backward propagation that can help us in minimizing the cost function
2. Backward Propagation : Propagating from the output layer to the input layer and updating all the weights between the output and input layer to minimize the cost hence minimizing the error is known as backward propagation
We can use Gradient Descent to find optimal values of the randomly initialized weights that can help us in minimizing the error.
Lets consider an example to understand this in details. Imagine we are on the top of hill and we want to reach the lowest point there can be more then one way to reach the lowest point the quickest way to reach the lowest point is to find the steepest side of the hill to reach the point. Similarly we can represent cost function as a plot of cost against weight as shown below :
As shown in the above plot the solid dark point is the randomly initialized weights. To minimize the cost functions we need to move this point downwards. Gradients(partial derivatives) are used for moving initial weights from one point to the other which is ∂j/∂β
Gradients are the derivatives that are actually the slope of a tangent line. So, by calculating the derivatives we can reach the lowest point where cost is minimum.
We need to update the weight for every neuron present in between the Output→ Input layer with the help of backward propagation by leveraging Gradient Descent algorithm to get the optimal weights and getting the minimum cost.
We will be updating our old weight by the following
β = β-α*∂j/∂β
Updated_weight = weight-α*gradients
α -Is known as learning rate
This overall process of updating the weights by backpropagating from the output layer to the input layer of the Network is called Back Propagation.
The purpose of this article is to understand the fundamentals of deep neural networks step-by-step.
Below are the frequently used terminologies:
- Forward Pass : It implies forward propagating from the input layer to the output layer
- Backward Pass : It implies back propagating from the output layer to the input layer
- Epoch : It specifies the number of times neural network sees our whole training data. So, one epoch is equal to one Forward and Backward pass for all training examples
- Batch Size : It specifies the number of training samples we use in one forward and one backward pass.