RNN

Updated: December 1, 2022

RNN

Vanila RNN

Context? use the notion of recurrence.
Two outputs: Prediction and State (state of recurrent neural network)

Learning goals

Let r = input vector dimension, s = hidden state dimension, and t = output dimension.

U_{s\times r}, W_{s\times s}, V_{t\times s}\\ s_i = f(U_i^T x + W_i^T s_i + b_i)

BPTT: Backpropagation through time is a slight variation.
And due to vanishing/exploding gradient, it is sensitive to the length of the sequence. For text data: So we set a maximum length and if input is longer than that, we truncate it and if it is shorter, we pad it with zeros.

LSTM

Why not Vanila RNN? Transition Matrix necessarily weakens signal, need a structure that can leave some dimensions unchanged over many steps.
Augment with gate units
- Input Gate
- Forget Gate
- Output Gate
$W_{s\times r}, W_{s\times s}, W_{s\times t}, b_{s\times 1}, b_{s\times 1}, b_{s\times 1}\\ s_i = f(W_i^T s_{i-1} + U_i^T x + b_i)$

\begin{align} i &= \sigma(W_i^T s_{i-1} + U_i^T x + b_i)\\ f &= \sigma(W_f^T s_{i-1} + U_f^T x + b_f)\\ o &= \sigma(W_o^T s_{i-1} + U_o^T x + b_o)\\ g &= W_g^T s_{i-1} + U_g^T x + b_g\\ c_i &= f_i \odot g_i + s_{i-1}\\ h_i &= o_i \odot \sigma(c_i) \end{align}

GRU

Removd cell state
Augment with reset gate
- Reset gate
- Update gate
Perform similary, with less computation than LSTM

\begin{align} r &= \sigma(W_r^T s_{i-1} + U_r^T x + b_r)\\ g &= W_g^T s_{i-1} + U_g^T x + b_g\\ c_i &= f_i \odot g_i + s_{i-1}\\ h_i &= o_i \odot \sigma(c_i) \end{align}

Comparison

RNN vs LSTM

RNN:
- Does not have memory
- No gates
LSTM:
- Has memory
- Has gates
- Can be used for sequence to sequence learning

RNN vs GRU

RNN:
- Has memory
- Has gates
- Can be used for sequence to sequence learning
GRU:
- Has memory
- No gates
- Can be used for sequence to sequence learning

RNN vs Convolutional Neural Network

RNN:
- Can be used for sequence to sequence learning
Convolutional Neural Network:
- Can be used for image recognition
- Can be used for video recognition

Additional topics

Beam Search

Beam Search introduced to solve Greddy Inference. In seq2seq, the decoder predict sequence of words one by one. If it produces one wrong word, we may end up completely wrong sequence of words.
So Beam Search produces multiple different hypothesis to produce and then see which full sentence is most likely

Attention

Look to see how close the vector in one language is to the word in out decoder. This works well when order of words are different.

Share on

Twitter Facebook LinkedIn

Younghun Lee

RNN

RNN

Vanila RNN

Learning goals

LSTM

GRU

Comparison

RNN vs LSTM

RNN vs GRU

RNN vs Convolutional Neural Network

Additional topics

Beam Search

Attention

Share on

Leave a comment

You may also enjoy

Policy Gradient Method

Machine Learning on Apple stock daily return2

Machine Learning on Apple stock daily return

Reinforcement Learning