Rico's Nerd Cluster

「离开世界之前 一切都是过程」

Deep Learning - Optimizations Part 1

Momentum, RMSProp, Adam, AdamW, Learning Rate Decay, Local Minima, Gradient Clipping

Introduction Deep learning is still highly empirical, it works well in big data where there’s a lot of data, but its theories are not set in stone (at least yet). So use below optimization techniq...

Deep Learning - Exploding And Vanishing Gradients

When in doubt, be courageous, try things out, and see what happens! - James Dellinger

Why Exploding & Vanishing Gradients Happen In a very deep network, output of each layer might diminish / explodes. This is mainly because layer outputs are products of $W_1W_2…x$ (ignoring act...

Deep Learning - Overfitting

Bias, Variance, Overfitting, Regularization, Dropout

A Nice Quote 💡 Before we delve in, I’d like to quote from James Dellinger that really hits home: I think the journey we took here showed us that this knee-jerk response of feeling of intimidat...

Deep Learning - Batch Gradient Descent

Batch Gradient Descent, Mini-Batch

A Neuron And Batch Gradient Descent A Neuron, has multiple inputs and a single output. First it gets the weighted sum of all inputs, then feeds it into an “activation function”. Below, the activat...

Deep Learning - Loss Functions

Mean Squared Error, Mean Absolute Error, Hinge Loss, Huber Loss, L1 Loss, Cross Entropy Loss, NLL Loss, Sparse Entropy, IoU Loss, Dice Loss, Focal Loss, Cauchy Robust Kernel

Regression Losses Mean Squared Error (MSE) \[\text{MSE} = \frac{1}{n}\sum_i (y_i - \hat{y}_i)^2\] Disadvantages: Sensitive to outliers because errors are squared. Assumes Gau...

Deep Learning - Activation

Sigmoid, ReLU, GELU Tanh

Activation Functions Early work observed that the Rectified Linear Unit (ReLU) often trains faster than sigmoid-like activations because it avoids saturation for positive inputs and has a simple g...

Deep Learning - Auto Differentiator From Scratch

Auto Diff Is The Dark Magic Of All Dark Magics Of Deep Learning

Introduction Gradients here refer to scalar to matrix gradient. We need to accumulate gradients for mini-batch training. Elementwise Multiplication gradients: A * B = C del C / del A_ij...

Deep Learning - Gradient Checking

First Step To Debugging A Neural Net

How To Do Gradient Checking In calculus, we all learned that a derivative is defined as: \[\begin{gather*} f'(\theta) = \frac{f(\theta + \epsilon) - f(\theta)}{\epsilon} \end{gather*}\] ...

Deep Learning - Introduction

Why Do We Even Need Deep Neuralnets? Data Partition, ML Ops, Data Normalization

Why Do We Need Deep Learning Any bounded continuous function can be approximated by an arbitrarily large single layer. W hy? The idea is roughly that the linear combinations of activation function...

Computer Vision - Pinhole Camera Model

This Blog Shows How A Small Magic Peek Hole Captures The World

Introduction Camera is intriguing. There have been many different types with different types of lenses (such as fisheye, wide angle Lens). However, the most original (and the simplest) form of cam...