Rico's Nerd Cluster

「离开世界之前 一切都是过程」

Deep Learning - Bert

Introduction Bert (BiDirectional Encoder Representation Transformer) is great for tasks like question-answering, NER (Named Entity Recognition), sentence classification, etc. Bert is not a transla...

Deep Learning - Neural Machine Translation

Hands-On Attention Project

Introduction And Data Preparation The goal of the project is experimenting with date translations, i.e., (“25th of June, 2009”) into machine-readable dates (“2009-06-25”). We need to truncate data...

Deep Learning - Speech Recognition Hands On

GRU-Based Trigger Word Detection

Trigger Word Detection Goal: we can the word “activate” and hear a chime. Data: the data was recorded at various venues such as libraries, cafes, restaurants, homes, and offices. It has a positive...

Deep Learning - Speech Recognition

Audio Signal Processing, Spectogram

Overview In speech recognition, initially scientists thought that phonemes, like the invididual sounds in words (like “g” “v” in “give”) were the best way to represent audio words. This was becaus...

Deep Learning - Neural Machine Translation

Hands-On Attention Project

Introduction And Data Preparation The goal of the project is experimenting with date translations, i.e., (“25th of June, 2009”) into machine-readable dates (“2009-06-25”). We need to truncate data...

Deep Learning - Transformer Series 5 - Transformer Hands On

Hands-On Transformer Training and Validation

Tasks and Data It’s common practice to pad input sequences to MAX_SENTENCE_LENGTH. Therefore, the input is always [batch_size, max_sentence_length] NUM_KEYS = NUM_QUERIES = max_sentence_leng...

Deep Learning - Transformer Series 4 - Transformer All Together

Encoder, Decoder

Overview We’ve seen that RNN and CNN has a longer maximum path length. CNN could have better computational complexity for long sequences, but overall, self attention is the best for deep architect...

Deep Learning - Transformer Series 3 - Multi-Head and Self Attention

Multi-Head Attention, Self Attention, Comparison of Self Attention Against CNN, RNN

Multi-Head Attention To learn a richer set of behaviors, we can instantiate multiple attentions jointly given the same set of queries, keys, and values. Specifically, we are able to capture variou...

Deep Learning - Transformer Series 2 Vanilla Attention Mechanism

Attention Intuition, Query-Key-Value, Bahdanau Attention, Scaled-Dot Attention

Attention Intuition Imagine we are sitting in a room. We have a red cup of coffee, and a notebook in front of us. When we first sit down, the red cup stands out. So it attracts our attention “invo...

Deep Learning - Transformer Series 1 - Embedding Pre-Processing

Positional Encoding, Padding Mask, Look-ahead Mask, Tokenization

What is Positional Encoding In natural languange processing, it’s common to have 1 sentence ("I love ice cream") -> token ("I", "love", "ice", "cream") -> embedding(100, 104, 203, 301) ->...