The World through Convolutional Neural Networks' Eyes

Convolutional Neural Networks (CNNs) are a particular kind of Artificial Neural Network. Based on biological processes, CNNs have been applied mainly in digital image processing and intelligent image analysis. In this article, I cover the main differences between Artificial Neural Networks and Convolutional Neural Networks. Join me to see the world through CNNs' eyes.

Bianka Tallita Passos | 5 de agosto de 2021

dev

If you keep up to date about technology trends, you certainly have at least heard about artificial intelligence and machine learning. But how does Convolutional Neural Networks (CNNs) fit into this context?

CNNs can be defined as deep learning algorithms, a subfield of machine learning. More specifically, CNNs are a kind of artificial neural network, inspired by how the visual cortex works in the brain. They are mainly used to perform classification, identification, and detection of image elements.

I am about to explain this in detail.

What is a Convolutional Neural Network?

Goodfellow, Bengio, and Courville (2016), describe Convolutional Neural Networks as similar to Artificial Neural Networks, with increased layers and operations. On a Convolutional Network, each layer takes responsibility to extract determined information from the input data. The information flows through each layer of the network, and the output from the earlier layer is provided as input to the next layer of the network.

CNNs have the potential to simulate human cognitive capacity, by identifying faces, individuals, and other elements in an image. Thus, Convolutional Networks can be used to classify and recognize objects in images - for example.

Convolutional Neural Networks Layers

CNNs can be divided into two main sections (modules): convolution and classification.

The convolution module is responsible for extracting the elements that describe an image’s content. The classification module is responsible for classifying the extracted data obtained by the convolution module. The classification results are delivered to network outputs.

As we’ ve seen, the main difference between Convolutional Neural Networks and standard Artificial Neural Networks is the number of intermediary layers (or hidden layers). In short, each Convolutional Network layer is responsible for a subtask. The principal layers that compose an CNN are:

- Convolution layer: responsible for extracting and mapping image contents, by transforming them into data. This process is carried out by using small blocks, known as filters, that allow the acquisition of images sub-blocks information;

- Pooling: the pooling layer takes the blocks that contain the information extracted by the convolution layer. It reduces the information by summarizing the image sub-block data into a single value and feeds it forward to a fully connected layer;

- Fully connected layer: where the classifying process is started in order to classify the information extracted by the previous layers. The fully connected layer flattens the sub-block containing the extracted data, in other words, the block is transformed into a single row containing all the extracted information.

In addition with the convolution, pooling and the fully connected layers, there are two other fundamental elements composing a Convolutional Neural Network: the dropout layer and the activation function.

The dropout layer has the responsibility of reducing network overfitting. This occurs when a network memorizes the learning data and cannot use learned relations against new data. It is commonly said that under this circumstance the trained network is unable to generalize. Therefore it fails when placed in production.

The activation function is responsible for the network learning, and also for the relationship between the variables, by deciding which neurons will be activated.

What does the Convolutional Neural Network see?

You, like me, may be wondering about what Convolutional Networks see when applying the filters - which takes place at the convolution layer.

We already know that each layer of the network is responsible for extraction of a certain image characteristics. Besides that, we also know that the image is broken into sub blocks and that the filters allow feature extraction from those blocks. However, what are these filters and how does this process work?

The filters (Figure 1) are matrices that allow the RNCs to recognize image patterns, such as edges, shapes, textures, curves, horizontal and vertical lines, corners, colors, and parts of a particular image object.

Screen Shot 2021-08-06 at 17.18.46.png

Figure 1. Convolutional Layer filters

Source: Keras (2016).

The deeper the network layer, the more advanced filters are employed. Finally, at even deeper layers, these filters can detect objects such as dogs, cats and birds - Figure 2.

Screen Shot 2021-08-06 at 17.18.26.png

Figure 2. Objects seen by the convolutional neural network

Source: Keras (2016).

Conclusions

In this post, we understood the differences between Artificial Neural Networks and Convolutional Neural Networks. We saw the world through CNNs eyes and comprehended how they work.

There are many Convolutional Neural Network architectures, used to solve digital image processing and computer vision problems. Each architecture has its own specifications which are more suitable for a given task. Thus, it is the role of a specialized person in the field to determine the most appropriate architecture for each case, and also to identify if a deep learning based method is necessary to provide the desired solution.

After defining the appropriate method, it is fundamental to develop a good image dataset to train the networks and perform accuracy evaluations. It is important to observe that any digital image processing solution begins by defining the image dataset.

References:

GOODFELLOW, I.; BENGIO, Y.; COURVILLE, A. Deep Learning. MIT Press: 2016.
KERAS. How convolutional neural networks see the world, 2016. Accessed on 15th June 2021.

Bianka Tallita Passos
Software Engineer | Msc. Visão Computacional. Entusiasta de Inteligência Artificial. Gosto de aprender e compartilhar conhecimento.