Rico's Nerd Cluster

「离开世界之前 一切都是过程」

Deep Learning - PyTorch Basics

Neural Network Model Components, Common Operations

Data Type Conversions Common Data Types torch.arange(start, stop, step) can take either float or int values torch.range(start, stop, step) is deprecated because its signature is dif...

Deep Learning - TensorFlow Basics

Nothing Fancy, Just A Basic TF Network

Basic Operations Max Operations Immutable (tf.constant) vs Variable (tf.Variable), notice the different capitalization: tf.math.reduce_max(): find the max along certain dimension(s). 1 2 3...

Deep Learning - Softmax And Cross Entropy Loss

Softmax, Cross Entropy Loss, and MLE

Softmax When we build a classifier for cat classification, at the end of training, it’s necessary to find the most likely classes for given inputs. The raw unnomalized ...

Deep Learning - Hyper Parameter Tuning

It finally comes down to how much compute we have, actually...

How To Sample For Single Parameter Tuning Generally, we need to try different sets of parameters to find the best performing one. In terms of number layers, it could be a linear search: Defin...

Deep Learning - Layer Normalization

Normalization For Sequential Data

Layer Normalization Batch normalization has two main constraints: When batch size become smaller, it performs bad? Nowadays, we tend to have higher data resolution, especially in large NLP tra...

Deep Learning - Batch Normalization (BN)

Internal Covariate Shift

Batch Normalization Among many pitfalls of ML, statistical stability is always high on the list. Model training is random: the initialization, even the common optimizers (SGD, Adam, etc.) are stoc...

Deep Learning - Optimizations Part 1

Momentum, RMSProp, Adam, AdamW, Learning Rate Decay, Local Minima, Gradient Clipping

Introduction Deep learning is still highly empirical, it works well in big data where there’s a lot of data, but its theories are not set in stone (at least yet). So use below optimization techniq...

Deep Learning - Exploding And Vanishing Gradients

When in doubt, be courageous, try things out, and see what happens! - James Dellinger

Why Exploding & Vanishing Gradients Happen In a very deep network, output of each layer might diminish / explodes. This is mainly because layer outputs are products of $W_1W_2…x$ (ignoring act...

Deep Learning - Overfitting

Bias, Variance, Overfitting, Regularization, Dropout

A Nice Quote 💡 Before we delve in, I’d like to quote from James Dellinger that really hits home: I think the journey we took here showed us that this knee-jerk response of feeling of intimidat...

Deep Learning - Batch Gradient Descent

Batch Gradient Descent, Mini-Batch

A Neuron And Batch Gradient Descent A Neuron, has multiple inputs and a single output. First it gets the weighted sum of all inputs, then feeds it into an “activation function”. Below, the activat...