Rico's Nerd Cluster

「离开世界之前 一切都是过程」

Deep Learning - Speech Recognition

Audio Signal Processing, Spectogram

Overview In speech recognition, initially scientists thought that phonemes, like the invididual sounds in words (like “g” “v” in “give”) were the best way to represent audio words. This was becaus...

Deep Learning - Neural Machine Translation

Hands-On Attention Project

Introduction And Data Preparation The goal of the project is experimenting with date translations, i.e., (“25th of June, 2009”) into machine-readable dates (“2009-06-25”). We need to truncate data...

Deep Learning - Transformer Series 5 - Transformer Hands On

Hands-On Transformer Training and Validation

Tasks and Data It’s common practice to pad input sequences to MAX_SENTENCE_LENGTH. Therefore, the input is always [batch_size, max_sentence_length] NUM_KEYS = NUM_QUERIES = max_sentence_leng...

Deep Learning - Transformer Series 4 - Transformer All Together

Encoder, Decoder

Overview We’ve seen that RNN and CNN has a longer maximum path length. CNN could have better computational complexity for long sequences, but overall, self attention is the best for deep architect...

Deep Learning - Transformer Series 3 - Multi-Head and Self Attention

Multi-Head Attention, Self Attention, Comparison of Self Attention Against CNN, RNN

Multi-Head Attention To learn a richer set of behaviors, we can instantiate multiple attentions jointly given the same set of queries, keys, and values. Specifically, we are able to capture variou...

Deep Learning - Transformer Series 2 Vanilla Attention Mechanism

Attention Intuition, Query-Key-Value, Bahdanau Attention, Scaled-Dot Attention

Attention Intuition Imagine we are sitting in a room. We have a red cup of coffee, and a notebook in front of us. When we first sit down, the red cup stands out. So it attracts our attention “invo...

Deep Learning - Transformer Series 1 - Embedding Pre-Processing

Positional Encoding, Padding Mask, Look-ahead Mask, Tokenization

What is Positional Encoding In natural languange processing, it’s common to have 1 sentence ("I love ice cream") -> token ("I", "love", "ice", "cream") -> embedding(100, 104, 203, 301) ->...

Deep Learning - Sequence to Sequence Models

seq2seq, encoder-decoder architecture, beam model, Bleu Score

Sequence to Sequence Models: The Encoder - Decoder Architecture Machine Translation Early sequence models use two RNN/LSTM cells to create an encoder-decoder architecture for machie translation. ...

Deep Learning - Word Emojifier Using Dense and LSTM Layers

Emojifier

Introduction When using word vectors, you’ll see that even if your training set explicitly relates only a few words to a particular emoji, your algorithm will be able to generalize and associate a...

Deep Learning - Hands-On Embedding Similarity

Similarity and Debiasing

This blog post is a summary of the Coursera Course on Sequence Models Embedding Similarity and Debiasing embeddings are very computationally expensive to train, most ML practitioners will load a ...