Rico's Nerd Cluster

「离开世界之前 一切都是过程」

Deep Learning - Transformer Series 4 - Transformer All Together

Encoder, Decoder

Overview We’ve seen that RNN and CNN has a longer maximum path length. CNN could have better computational complexity for long sequences, but overall, self attention is the best for deep architect...

Deep Learning - Transformer Series 3 - Multi-Head and Self Attention

Multi-Head Attention, Self Attention, Comparison of Self Attention Against CNN, RNN

Multi-Head Attention To learn a richer set of behaviors, we can instantiate multiple attentions jointly given the same set of queries, keys, and values. Specifically, we are able to capture variou...

Deep Learning - Transformer Series 2 Vanilla Attention Mechanism

Attention Intuition, Query-Key-Value, Bahdanau Attention, Scaled-Dot Attention

Attention Intuition Imagine we are sitting in a room. We have a red cup of coffee, and a notebook in front of us. When we first sit down, the red cup stands out. So it attracts our attention “invo...

Deep Learning - Transformer Series 1 - Embedding Pre-Processing

Positional Encoding, Padding Mask, Look-ahead Mask, Tokenization

What is Positional Encoding In natural languange processing, it’s common to have 1 sentence ("I love ice cream") -> token ("I", "love", "ice", "cream") -> embedding(100, 104, 203, 301) ->...

Deep Learning - Sequence to Sequence Models

seq2seq, encoder-decoder architecture, beam model, Bleu Score

Sequence to Sequence Models: The Encoder - Decoder Architecture Machine Translation Early sequence models use two RNN/LSTM cells to create an encoder-decoder architecture for machie translation. ...

Deep Learning - Word Emojifier Using Dense and LSTM Layers

Emojifier

Introduction When using word vectors, you’ll see that even if your training set explicitly relates only a few words to a particular emoji, your algorithm will be able to generalize and associate a...

Deep Learning - Hands-On Embedding Similarity

Similarity and Debiasing

This blog post is a summary of the Coursera Course on Sequence Models Embedding Similarity and Debiasing embeddings are very computationally expensive to train, most ML practitioners will load a ...

Deep Learning - PyTorch Versioning And Memory Allocation

In-Place and Out-of_Place Matrix Ops, Gradient Checkpointing

PyTorch Versioning Is Necessary Because We Have In-Place and Out-of_Place Matrix Ops Takeaways: - x.add_()/multiply_() is to do in-place addition, and updates the gradient. - x+something a...

Deep Learning - Hands-On Dinosour Name Generator Using RNN

Character-Level Dinosour Name Generation Build a character-level text generation model using an RNN. The vocabulary looks like: 1 2 3 4 5 6 7 8 { 0: '\n', 1: 'a', 2: 'b', 3: 'c...

Deep Learning - Word Embeddings, Word2Vec

Word Representation

Word Representation A feature of vocabulary is a vector element that represents an attribute, such as the concept of “fruits”, “humans”, or more abstract concepts like “dry products”, etc. One iss...