Understanding the basics of LSTM-units

3 min readJun 3, 2021

Long Short-Term Memory (LSTM) is one of the most successful recurrent neural networks in modern real world applications because of its clever use of gates to keep or discard long and short-term information in its memory.

It was introduced in the Long short-term memory paper by Hochreiter & Schmidhuber (1997), and later on refined in the paper by Gers Felix et al. (2000) with the addition of a gate to the weights, making LSTM cells compatible with different sequence lengths. An LSTM unit as seen in figure 2 differs from a standard RNN unit figure 1 in a lot of ways.

Figure 1: Recurrent hidden unit. Based on graphic from (Goodfellow et al. 2016, p. 378)

LINK TO ARTICLE ABOUT RNN

Figure 2: Long Short-Term Memory unit. Based on graphic from (Goodfellow et al. 2016, p. 409)

LSTM mitigates the issue a standard RNN has with long-term dependencies, by having dedicated long and short-term in and output.

Figure 2 shows a graphical representation of an LSTM unit. The LSTM unit have several neural network layers inside (dark gray boxes) the layers are all labelled with their activation functions (σ; tanh). The connection is represented by the arrows and the pointwise operations (addition, multiplication) are represented with their respective mathematical sign. A LSTM cell have several outputs, one output hₜ to the layer ahead of it and another output hₜ and cₜ to the next LSTM unit in the temporal space t.

The job of an LSTM unit is to decide what information to remember and what to forget. One can look at an LSTM as a set of steps within the cell. The steps below explains what happens inside the LSTM in figure 2.

There are too many mathematical notations to write directly as text on Medium, so here is a screenshot

Even though, LSTM networks is very successful when used for timeseries applications, the network still suffers from the vanishing and exploding gradients problem explained in LINK TO RNN ARTICLE. This means that the problem must still be addressed when building an LSTM network.

Fixing vanishing and exploding gradients in RNN-networks

Any neural network struggles with vanishing or exploding gradients when the computational graph becomes too deep. This…

optimizemydayjob.medium.com

References

Gers Felix, A., Jurgen, S. & Cummins, F. (2000), `Learning to forget: Continual prediction with lstm’, Neural computation 12(10), 2451{2471.

Hochreiter, S. & Schmidhuber, J. (1997), `Long short-term memory’, Neural computation 9(8), 1735-1780.

Goodfellow, I., Bengio, Y. & Courville, A. (2016), Deep Learning, MIT Press.

Understanding the basics of LSTM-units

Fixing vanishing and exploding gradients in RNN-networks

Any neural network struggles with vanishing or exploding gradients when the computational graph becomes too deep. This…

Written by Optimize My Day Job