Deep Learning - Common Oopsies - Rico贾若童的博客

Underflow

torch.softmax(X) X is zero due to underflow.

Sizing

Be careful with the last batch if you want to initialize any tensor that’s specific to each batch’s sizes, because it could be smaller than the commonly defined BATCH_SIZE since the batch could be truncated.

Weight Manipulation

Weight Copying Without `torch.no_grad()`

This is because we are directly updating the parameters. We don’t want gradient tracking.

with torch.no_grad():
    in_proj_weight = torch.cat(
        [my_mha.Wq.weight, my_mha.Wk.weight, my_mha.Wv.weight], dim=0
    )
    torch_mha.in_proj_weight.copy_(in_proj_weight)

Deep Learning - Common Oopsies

Underflow, Weight Manipulation

Underflow

Sizing

Weight Manipulation

Weight Copying Without `torch.no_grad()`

CATALOG

FEATURED TAGS

FRIENDS

Underflow

Sizing

Weight Manipulation

Weight Copying Without torch.no_grad()

CATALOG

FEATURED TAGS

FRIENDS

Weight Copying Without `torch.no_grad()`