Rico贾若童的博客

Deep Learning - Word Emojifier Using Dense and LSTM Layers

Emojifier

Introduction When using word vectors, you’ll see that even if your training set explicitly relates only a few words to a particular emoji, your algorithm will be able to generalize and associate a...

Posted by Rico's Nerd Cluster on March 23, 2022

Deep Learning - Hands-On Embedding Similarity

Similarity and Debiasing

This blog post is a summary of the Coursera Course on Sequence Models Embedding Similarity and Debiasing embeddings are very computationally expensive to train, most ML practitioners will load a ...

Posted by Rico's Nerd Cluster on March 22, 2022

Deep Learning - PyTorch Versioning And Memory Allocation

In-Place and Out-of_Place Matrix Ops, Gradient Checkpointing

PyTorch Versioning Is Necessary Because We Have In-Place and Out-of_Place Matrix Ops Takeaways: - x.add_()/multiply_() is to do in-place addition, and updates the gradient. - x+something a...

Posted by Rico's Nerd Cluster on March 21, 2022

Deep Learning - Hands-On Dinosour Name Generator Using RNN

Character-Level Dinosour Name Generation Build a character-level text generation model using an RNN. The vocabulary looks like: 1 2 3 4 5 6 7 8 { 0: '\n', 1: 'a', 2: 'b', 3: 'c...

Posted by Rico's Nerd Cluster on March 21, 2022

Deep Learning - Word Embeddings, Word2Vec

Word Representation

Word Representation A feature of vocabulary is a vector element that represents an attribute, such as the concept of “fruits”, “humans”, or more abstract concepts like “dry products”, etc. One iss...

Posted by Rico's Nerd Cluster on March 18, 2022

Deep Learning - RNN Part 3 LSTM, Bi-Directional RNN, Deep RNN

LSTM LSTM came out in 1997 and GRU is a simplification of it. In LSTM, we have the “forget gate”, $\Gamma_r$, the output gate $\Gamma_o$, and the update gate $\Gamma_u$. We do NOT have $\Gamma_r$ ...

Posted by Rico's Nerd Cluster on March 15, 2022

Deep Learning - RNN Part 2 GRU

Vanishing Gradients of RNN, GRU

The Vanishing Gradient Problem of RNN RNN can doesn’t handle long range dependencies well. One example is in speech recognition, “The cat which ate, slept, played and had a good day … , was full” ...

Posted by Rico's Nerd Cluster on March 12, 2022

Deep Learning - RNN

Sequence Models, RNN Architectures

Sequence Models Some common sequence models include: DNA sequencing, audio clips, sentiment classification, etc. Another example is name indexing, where names in news for a past period of time wil...

Posted by Rico's Nerd Cluster on March 9, 2022

Deep Learning - PyTorch Model Training

Checkpointing, Op Determinisim, 🤗 HuggingFace Trainer

Checkpointing Checkpointing is a technique to trade compute for memory during training. Instead of storing all intermediate activations (outputs layers) for backprop, which consumes a lot of memor...

Posted by Rico's Nerd Cluster on March 6, 2022

Deep Learning - Ensemble

Ensemble

Ensemble An ensemble is a group of models (a.k.a base learners, weak learners) that are trained and combined to have better prediction, increased stability, and improved generalization compared to...

Posted by Rico's Nerd Cluster on March 3, 2022

Rico's Nerd Cluster