de en

Types of Artificial Neural Networks

In our real-world example, we used a “feed-forward neural network” to recognise handwritten numbers. This is probably the most basic form of a NN. In reality, however, there are hundreds of types of mathematical formulas that are used – beyond addition and multiplication – to compute steps in a neural network, many different ways to arrange the layers, and many mathematical approaches to train the network.

And as mentioned, in most cases a specific type of neural network (or a reasonable combination of several architectures) is necessary for the task at hand. Therefore, in this article we want to introduce some of the more commonly used NN architectures and shed some light on their most common use cases.

Feed Forward Neural Networks / Fully Connected Neural Networks

This is practically the “bread-and-butter NN”. It’s usually found as part of larger architectures, often in the transition from one part of the architecture to another. In our previous article we have explained in detail how it works, how it is constructed and how it is applied.

If you use it by itself without other, more complex alternatives, it is usually well suited for less difficult problems. Nowadays it is often crucial in order to connect blocks in more complex architectures. Or at the end of a complex architecture, where it allows extracting a result from the “preliminary work” of specialized architectures.

Convolutional Neural Networks (CNN): Image/Voice recognition by convolution

CNN are the hotshots when it comes to image and speech recognition. They give much better results than the simple feed-forward networks from our previous article. Convolutional Neural Networks are (very roughly) inspired by structures in the visual cortex of vertebrates. Mathematically speaking, they use the so-called convolution operation for their calculations. Electrical engineers will feel at home here: CNNs are basically trainable filters in 1D, 2D or 3D.

With a CNN, the original structure of the input (mostly images) is preserved, i.e. the values are still perceived as a 2D arrangement of pixels (far left). Instead of simple addition and multiplication, a mathematical operation called “convolution” is performed (represented by the small red rectangles, 2nd from left). Each convolution creates a new variation of the image, filtering out certain parts and amplifying others. Source:

Recurrent Neural Networks (RNN): Sequences and LSTM

Recurrent neural networks can process sequence data. The best-known example of this class of Neural Networks is the Long-Short-Term-Memory (LSTM). Very often, when data has an ambiguous, varying length (movies, text, audio recordings, stock market prices) RNNs are used. Most of the time they are combined with another network type. For example, a CNN that can handle images can operate together with an RNN on movies.

LSTMs are much more complex than feed-forward NNs. The input to an LSTM block is not only the input (green, marked with “x”), but also the output of an LSTM block from a previous step (blue). The internal structure allows an LSTM to learn what previously seen information should contribute to the newly formed output (“o”, in pink). This way a kind of memory is simulated. It is possible to process information from the past as well as information from directly previous inputs. Hence the name “Long-Short-Term-Memory”. Source:

Recurrent neural networks should not be confused with recursive neural networks. The latter exist as well, but at the moment they are rather an academic curiosity and function quite differently than recurrent neural networks.

Autoencoder: Unsupervised learning without Output data

Autoencoders are a class of neural networks that do not need fixed labels for learning, so they are particularly suitable for unsupervised learning in neural networks. Autoencoders are a specific way to build and arrange neural networks. In general, any kind of neural network can be transformed into an autoencoder. The advantage of auto-encoders is that they do not need “target data”, so a lot of pre-processing work is saved. The disadvantage of autoencoders is that it is much harder for them to learn something and that there is no guarantee of the learned model being useful.

Diagram of an autoencoder. The special feature of an autoencoder structure is that the layers of the NN become very narrow in the middle and then widen up again. The output is therefore again as large as the input. This forces the autoencoder to compress the input data on its own, i.e. forces it to remove unnecessary information. Hence the name: the autoencoder encodes information automatically. Source:

Transformer: Text recognition by Attention Layer

Transformers are still fairly new and the latest big thing when it comes to text processing. Transformers are built from so-called attention layers, which allow the network to understand which parts of the input refer to each other. When one thinks of language, this means sentence fragments referring to each other and complex syntax. Here it quickly becomes evident why transformers are an enormous improvement for the field of text comprehension (and possibly text generation!).

A new layer type, called “Attention”, allows Transformers to selectively correlate inputs.

Exemplary Transformer Architecture. Again, you can see at first glance that the architecture is much more complex than with a simple NN. The Feed-Forward-NNN already familiar to us is just one of many components here (in blue). This shows very nicely that we have distanced ourselves from biological models: In contrast to e.g. CNN, transformers are not inspired by nature, they are all about linear algebra, no longer about “neurons”. Source: Vaswani et. at. „Attention is all you need“

Neural Networks: Often a combination of different architectures

Ultimately, however, one can say that one architecture rarely comes alone. Most state-of-the-art neural networks combine several different technologies in layers, so that one usually speaks of layer types instead of network types. For example, one can combine several CNN layers, a fully connected layer and an LSTM layer. Maybe even in a way that results in the whole construct to work as an auto-encoder. What is important here – the networks do not grow. The structure is set in stone by a programmer and then trained. The network can’t determine that one layer is superfluous. Neither can it “optimize” itself by removing it.

This was merely a small number of examples – there are still hundreds, if not thousands, of other types of neural network. But the selection presented here is by far the most common one used in practice at the moment. Even if you are not aware of it, you have probably had contact with each of these types of deep learning systems several times already.