Читать книгу Computational Statistics in Data Science - Группа авторов - Страница 86
6.3 Long Short‐Term Memory Networks
ОглавлениеTo solve the problem of losing remote information, researchers proposed long short‐term memory (LSTM) networks. The idea of LSTM was introduced in Hochreiter and Schmidhuber [19], but it was applied to recurrent networks much later. The basic structure of LSTM is shown in Figure 9. It solves the problem of the vanishing gradient by introducing another hidden state , which is called the cell state.
Since the original LSTM model was introduced, many variants have been proposed. Forget gate was introduced in Gers et al. [20]. It has been proven effective and is standard in most LSTM architectures. The forwarding process of LSTM with a forget gate can be divided into two steps. In the first step, the following values are calculated:
(12)
where and are weight matrix and bias, and is the sigmoid function.
The two hidden states and are calculated by
(14)
where represents elementwise product between matrices. In Equation (13), the first term multiplies with , controlling what information in the previous cell state can be passed to the current cell state. As for the second term, stores the information passed from and , and controls how much information from the current state is preserved in the cell state. The hidden state depends on the current cell state and , which decides how much information from the current cell state will be passed to the hidden state .
Figure 9 Architecture of long short‐term memory network (LSTM).
In LSTM, if the loss is evaluated at , the gradient w.r.t. calculated via backpropagation can be written as
(15)
where represents other terms in the partial derivative calculation. Since the sigmoid function is used when calculating the values of , this implies that they will be close to either 0 or 1. When is close to 1, the gradient does not vanish, and when it is close to 0, it means that the previous information is not useful for the current state and should be forgotten.