RNN
Tags: Deep Learning, IBM, Week4
Categories: IBM Machine Learning
Updated:
RNN
Vanila RNN
- Context? use the notion of recurrence.
- Two outputs: Prediction and State (state of recurrent neural network)
Learning goals
- Let r = input vector dimension, s = hidden state dimension, and t = output dimension.
-
BPTT: Backpropagation through time is a slight variation.
And due to vanishing/exploding gradient, it is sensitive to the length of the sequence. For text data: So we set a maximum length and if input is longer than that, we truncate it and if it is shorter, we pad it with zeros.
LSTM
- Why not Vanila RNN? Transition Matrix necessarily weakens signal, need a structure that can leave some dimensions unchanged over many steps.
-
Augment with gate units
- Input Gate
- Forget Gate
- Output Gate
GRU
- Removd cell state
-
Augment with reset gate
- Reset gate
- Update gate
- Perform similary, with less computation than LSTM
Comparison
RNN vs LSTM
-
RNN:
- Does not have memory
- No gates
-
LSTM:
- Has memory
- Has gates
- Can be used for sequence to sequence learning
RNN vs GRU
-
RNN:
- Has memory
- Has gates
- Can be used for sequence to sequence learning
-
GRU:
- Has memory
- No gates
- Can be used for sequence to sequence learning
RNN vs Convolutional Neural Network
-
RNN:
- Can be used for sequence to sequence learning
-
Convolutional Neural Network:
- Can be used for image recognition
- Can be used for video recognition
Additional topics
Beam Search
Beam Search introduced to solve
Greddy Inference. In seq2seq, the decoder
predict sequence of words one by one. If it produces one wrong
word, we may end up completely wrong sequence of words.
So Beam Search produces multiple different hypothesis to produce
and then see which full sentence is most likely
Attention
Look to see how close the vector in one language is to the word in out decoder. This works well when order of words are different.
Leave a comment