Rico's Nerd Cluster

「离开世界之前 一切都是过程」

Deep Learning - PyTorch Versioning And Memory Allocation

In-Place and Out-of_Place Matrix Ops, Gradient Checkpointing

PyTorch Versioning Is Necessary Because We Have In-Place and Out-of_Place Matrix Ops Takeaways: - x.add_()/multiply_() is to do in-place addition, and updates the gradient. - x+something a...

Deep Learning - Hands-On Dinosour Name Generator Using RNN

Character-Level Dinosour Name Generation Build a character-level text generation model using an RNN. The vocabulary looks like: 1 2 3 4 5 6 7 8 { 0: '\n', 1: 'a', 2: 'b', 3: 'c...

Deep Learning - Word Embeddings, Word2Vec

Word Representation

Word Representation A feature of vocabulary is a vector element that represents an attribute, such as the concept of “fruits”, “humans”, or more abstract concepts like “dry products”, etc. One iss...

Deep Learning - RNN Part 3 LSTM, Bi-Directional RNN, Deep RNN

LSTM LSTM came out in 1997 and GRU is a simplification of it. In LSTM, we have the “forget gate”, $\Gamma_r$, the output gate $\Gamma_o$, and the update gate $\Gamma_u$. We do NOT have $\Gamma_r$ ...

Deep Learning - RNN Part 2 GRU

Vanishing Gradients of RNN, GRU

The Vanishing Gradient Problem of RNN RNN can doesn’t handle long range dependencies well. One example is in speech recognition, “The cat which ate, slept, played and had a good day … , was full” ...

Deep Learning - RNN

Sequence Models, RNN Architectures

Sequence Models Some common sequence models include: DNA sequencing, audio clips, sentiment classification, etc. Another example is name indexing, where names in news for a past period of time wil...

Deep Learning - PyTorch Model Training

Checkpointing, Op Determinisim, 🤗 HuggingFace Trainer

Checkpointing Checkpointing is a technique to trade compute for memory during training. Instead of storing all intermediate activations (outputs layers) for backprop, which consumes a lot of memor...

Deep Learning - Ensemble

Ensemble

Ensemble An ensemble is a group of models (a.k.a base learners, weak learners) that are trained and combined to have better prediction, increased stability, and improved generalization compared to...

Deep Learning - Neural Style Transfer

What Conv Net Is Learning

What Do Conv Nets Learn? For the ease of explanation, below I will use an example, where a conv layer has 3 input channels, and 5 output channels. As a recap: this layer has 3 filters, each filter...

Deep Learning - Face Recognition

Siamese Network, Deep Face

Introduction Face verification (easier) vs face recognition (harder) Face verfifaction takes input image, name and an ID. Then, it ouptuts if the image corresponds to the ID. Face recogniti...