Rico's Nerd Cluster

「离开世界之前 一切都是过程」

Deep Learning - Transformer Series 1 - Embedding Pre-Processing

Positional Encoding, Padding Mask, Look-ahead Mask, Tokenization

What is Positional Encoding In natural languange processing, it’s common to have 1 sentence ("I love ice cream") -> token ("I", "love", "ice", "cream") -> embedding(100, 104, 203, 301) ->...

Deep Learning - Sequence to Sequence Models

seq2seq, encoder-decoder architecture, beam model, Bleu Score

Sequence to Sequence Models: The Encoder - Decoder Architecture Machine Translation Early sequence models use two RNN/LSTM cells to create an encoder-decoder architecture for machie translation. ...

Deep Learning - Word Emojifier Using Dense and LSTM Layers

Emojifier

Introduction When using word vectors, you’ll see that even if your training set explicitly relates only a few words to a particular emoji, your algorithm will be able to generalize and associate a...

Deep Learning - Hands-On Embedding Similarity

Similarity and Debiasing

This blog post is a summary of the Coursera Course on Sequence Models Embedding Similarity and Debiasing embeddings are very computationally expensive to train, most ML practitioners will load a ...

Deep Learning - PyTorch Versioning And Memory Allocation

In-Place and Out-of_Place Matrix Ops, Gradient Checkpointing

PyTorch Versioning Is Necessary Because We Have In-Place and Out-of_Place Matrix Ops Takeaways: - x.add_()/multiply_() is to do in-place addition, and updates the gradient. - x+something a...

Deep Learning - Hands-On Dinosour Name Generator Using RNN

Character-Level Dinosour Name Generation Build a character-level text generation model using an RNN. The vocabulary looks like: 1 2 3 4 5 6 7 8 { 0: '\n', 1: 'a', 2: 'b', 3: 'c...

Deep Learning - Word Embeddings, Word2Vec

Word Representation

Word Representation A feature of vocabulary is a vector element that represents an attribute, such as the concept of “fruits”, “humans”, or more abstract concepts like “dry products”, etc. One iss...

Deep Learning - RNN Part 3 LSTM, Bi-Directional RNN, Deep RNN

LSTM LSTM came out in 1997 and GRU is a simplification of it. In LSTM, we have the “forget gate”, $\Gamma_r$, the output gate $\Gamma_o$, and the update gate $\Gamma_u$. We do NOT have $\Gamma_r$ ...

Deep Learning - RNN Part 2 GRU

Vanishing Gradients of RNN, GRU

The Vanishing Gradient Problem of RNN RNN can doesn’t handle long range dependencies well. One example is in speech recognition, “The cat which ate, slept, played and had a good day … , was full” ...

Deep Learning - RNN

Sequence Models, RNN Architectures

Sequence Models Some common sequence models include: DNA sequencing, audio clips, sentiment classification, etc. Another example is name indexing, where names in news for a past period of time wil...