Deep Learning - Common Oopsies

Underflow

Posted by Rico's Nerd Cluster on May 17, 2022

Underflow

  • torch.softmax(X) X is zero due to underflow.

Sizing

  • Be careful with the last batch if you want to initialize any tensor that’s specific to each batch’s sizes, because it could be smaller than the commonly defined BATCH_SIZE since the batch could be truncated.

Weight Manipulation

Weight Copying Without torch.no_grad()

This is because we are directly updating the parameters. We don’t want gradient tracking.

1
2
3
4
5
with torch.no_grad():
    in_proj_weight = torch.cat(
        [my_mha.Wq.weight, my_mha.Wk.weight, my_mha.Wv.weight], dim=0
    )
    torch_mha.in_proj_weight.copy_(in_proj_weight)