Let’s dive right in.
“Convolution”,”pooling”,”ReLu”,”Fully connected”. These are some of the basic building blocks of deep learning. These are the only one’s that I’m going to be explaining here
Convolution is basically in layman terms applying filters to an image.In traditional object classification algorithms we extract some features from the image and train it in a classifier.these features maybe edges or some specific features of the object.But in deep learning these features are learned automatically by the algorithm itself.For example in a 3×3 convolution with stride 1,we take the 3×3 matrix filter,multiply it with the image’s first 3 rows and 3 columns and sum them up.Then we move the 3×3 matrix be one pixel and do the same operation again across the whole image.Thus what we are left is with a completely new image with completely new dimensions.this is what we call a feature map.We generally apply more than one filter thus giving us more than one feature map.
Pooling or specifically max-pooling is used in deep learning.Here we can talk about 3×3 max pooling.This is when we take take a 3×3 patch in the image and find the maximum value in the matrix.Similar to convolution we slide it across the whole image and keep on taking the maximum value.This also changes the dimensions of the imageThis is usually done to reduce the parameters.
the two operations convolution and max-pooling is said to be one hidden layer
We also do a rectifier non-linear operation on each element in the feature map.This is done to avoid vanishing gradients during back propagation
many hidden layers are stacked together according to our needs.Simple neural networks can have one or two hidden layers while ImageNet winning algorithms like Resnet have approximately 150 layers.
the output from these hidden layers are connected to a fully connected layer.A fully connected layer is one where each element in the previous layer is connected to each element in the current layer.Naturally these layers take up a lot of parameters but they are essential and vital for the network to learn.Usually we use a couple of fully connected layers with the last layer neurons being equal to the number of classes in our images
We generally do softmax regression or any multiple class regression in the last layer from the features extracted from the previous layers.And we are done.Our classifier is ready.
This is one of the most important and first convolution network made.As explained above it is a series of convolution and max-pooling(Sub-sampling) followed by fully connected layers
generally we do not do all these convolution or pooling on our own.we make use of some higher level libraries like tensorflow or caffe where these operations are done with the highest efficiency