Rico's Nerd Cluster

「离开世界之前 一切都是过程」

Deep Learning - CNN Basics

Filters, Padding, Convolution and Its Back Propagation, Receptive Field

Filters Filters (aka kernels): “Pattern Detectors”. Each filter is a small matrix, which you can drag along an image and multiply pixel values with (convolution). They can detect edges, corners, a...

Deep Learning - Start Easy, Things I Learned From Training Small Neural Nets

Basic Torch Network With Some Notes on Syntax

Introduction To gain some insights into how hyper parameters impacts training, I created a simple neural network using PyTorch to learn 2D input data. Specifically, I’m interested in exploring the...

Deep Learning - PyTorch Basics

Neural Network Model Components, Common Operations

Data Type Conversions Common Data Types torch.arange(start, stop, step) can take either float or int values torch.range(start, stop, step) is deprecated because its signature is dif...

Deep Learning - TensorFlow Basics

Nothing Fancy, Just A Basic TF Network

Basic Operations Max Operations Immutable (tf.constant) vs Variable (tf.Variable), notice the different capitalization: tf.math.reduce_max(): find the max along certain dimension(s). 1 2 3...

Deep Learning - Softmax And Cross Entropy Loss

Softmax, Cross Entropy Loss, and MLE

Softmax When we build a classifier for cat classification, at the end of training, it’s necessary to find the most likely classes for given inputs. The raw unnomalized ...

Deep Learning - Hyper Parameter Tuning

It finally comes down to how much compute we have, actually...

How To Sample For Single Parameter Tuning Generally, we need to try different sets of parameters to find the best performing one. In terms of number layers, it could be a linear search: Defin...

Deep Learning - Layer Normalization

Normalization For Sequential Data

Layer Normalization Batch normalization has two main constraints: When batch size become smaller, it performs bad? Nowadays, we tend to have higher data resolution, especially in large NLP tra...

Deep Learning - Batch Normalization (BN)

Internal Covariate Shift

Batch Normalization Among many pitfalls of ML, statistical stability is always high on the list. Model training is random: the initialization, even the common optimizers (SGD, Adam, etc.) are stoc...

Deep Learning - Optimizations Part 1

Momentum, RMSProp, Adam, AdamW, Learning Rate Decay, Local Minima, Gradient Clipping

Introduction Deep learning is still highly empirical, it works well in big data where there’s a lot of data, but its theories are not set in stone (at least yet). So use below optimization techniq...

Deep Learning - Exploding And Vanishing Gradients

When in doubt, be courageous, try things out, and see what happens! - James Dellinger

Why Exploding & Vanishing Gradients Happen In a very deep network, output of each layer might diminish / explodes. This is mainly because layer outputs are products of $W_1W_2…x$ (ignoring act...