Data Type Conversions
Common Data Types
torch.arange(start, stop, step)
can take either float or int valuestorch.range(start, stop, step)
is deprecated because its signature is different from that of python’srange()
torch.tensor(int, dtype=torch.float32)
. We can’t pass an int right intotorch.sqrt()
. We must transform it into a tensor.- Note, we are using the function
torch.tensor()
, not the classtorch.Tensor()
- Or alternatively, use
math.sqrt()
- Note, we are using the function
-
to invert a bool mask:
~key_padding_mask
- datatype check:
tensor.dtype
, nottype(tensor)
Convertions Between An Numpy Array And Its Torch Tensor
1
2
3
4
5
6
7
8
9
10
11
12
13
torch_tensor = torch.from_numpy(np_array)
# convert from float64 to float32
torch_tensor_float32 = torch_tensor.float()
# Create the tensor on CUDA
torch_tensor_gpu = torch_tensor.to('cuda')
# This is production-friendly
torch_tensor = torch_tensor.to('cuda' if torch.cuda.is_available() else 'cpu')
# Explicitly creating a cpu based tensor
tensor = tensor.cpu()
# back to numpy array:
np_array = tensor.detach().numpy()
Float to Bool
- If an
np_array
is offloat64
, then to convert it to other datatypes usingtorch_tensor.float()
- Datatypes: if we need to convert a matrix into
int
when seeing errors like"bitwise_and_cuda" not implemented for 'Float'
, we can domatrix.bool()
1
2
predicted_test = torch.where(outputs_test > 0.4, 1, 0).bool()
local_correct = (predicted_test & labels_test).sum().item()
Device Related
-
It’s crucial to add
.to(X.device)
to a custom model -
In places like
torch.autocast(device_type=device_type, dtype=torch.float16)
, we need to pass a string in.- Solution:
device_type = str(device)
- Solution:
Neural Network Model Components
Data Loading
1
train_sampler = RandomSampler(train_data)
Make a conv-batch-relu module that optionally have components
1
2
3
layers = [nn.Conv2d(), nn.Conv2d() ...]
layers.append(component) # if necessary
nn.Sequential(*layers)
1
- `nn.Sequential()` is a sequential container that takes in modules. It has a `forward()` function, and it will pass it on to the first module, then the chain starts.
nn.Sequential()
does not support input args inseq_layers(X, other_args)
. In that case, usenn.ModuleList()
and manually iterate through the layers
1
2
3
4
5
6
7
encoder_layers = torch.nn.ModuleList([
EncoderLayer(embedding_dim=self.embedding_dim, num_heads=num_heads, dropout_rate=dropout_rate)
for _ in range(encoder_layer_num)
])
for encoder_layer in self.encoder_layers:
X = encoder_layer(X, attn_mask=attn_mask, key_padding_mask=key_padding_mask)
- Softmax layer:
1
2
3
4
5
6
7
8
9
10
11
12
13
import torch
import torch.nn as nn
# Example softmax layer
softmax_layer = nn.Softmax(dim=1)
# Example input (logits)
logits = torch.tensor([[2.0, 1.0, 0.1],
[1.0, 3.0, 0.1]])
# Apply the softmax layer
softmax_output = softmax_layer(logits)
print(softmax_output)
- Focal Loss:
nn.softmax()
andTensor.gather()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
import torch
# (m, class_num, h, w)
model_preds = torch.Tensor([[
# two channels
[[0.1, 0.4, 0.2]],
[[0.3, 0.6, 0.7]],
]])
# label (m, h, w), only 1 correct class
targets = torch.Tensor([
[[0, 1, 0]]
]).long()
probs = torch.nn.functional.softmax(inputs, dim=1)
# See
# tensor([[[[0.4502, 0.4502, 0.3775]],
# [[0.5498, 0.5498, 0.6225]]]])
print(probs)
# See
# tensor([[[[0.4502, 0.5498, 0.3775]]]])
probs.gather(1, targets.unsqueeze(1))
1
2
3
4
5
6
- `softmax` creates a softmax across these two channels.
- `tensor.gather(dim, indices)` here will select the softmax values at the locations indicated in targets. `targets` cleverly stores indices of one-hot vecotr as class labels.
- `LazyLinear` dims are initialized during first pass
- `optimizer.zero_grad()` should always come before the backward pass
- ` with torch.autograd.set_detect_anomaly(True):` can be used to print a stack trace
- indexing: "arr_2d[:, 0] = arr_1d"
Common Operations
Math Operations
torch.bmm(input, mat2)
: Batch-Matrix-Multiplication-
If input is a (b×n×m) tensor, mat2 is a (b×m×p) tensor, out will be a (b×n×p) tensor.
1 2 3
``` outi=inputi@mat2i ```
-
tensor.numel()
calculates the total number of elements. Returnsbatch_size * height * width
.torch.manual_seed(42)
set a seed in the RPNG for both CPU and CUDA.torch.var(unbiased=False)
this is to calculate biased variance. It’s useful in batch norm calculation.torch.Tensor()
’s singleton dimensions are the dimensions with only 1 element.- Masking
1
2
3
4
a = torch.ones((4))
mask = torch.tensor([1,0,0,1]).bool()
a = a.masked_fill(mask, float("-inf"))
a # see tensor([-inf, 1., 1., -inf])
NaN
comparison:torch.allclose()
does not handle nan. This is to replaceNaN
with sentinel
1
2
3
4
def allclose_replace_nan(tensor1, tensor2, rtol=1e-05, atol=1e-08, sentinel=0.0):
tensor1_replaced = torch.where(torch.isnan(tensor1), torch.full_like(tensor1, sentinel), tensor1)
tensor2_replaced = torch.where(torch.isnan(tensor2), torch.full_like(tensor2, sentinel), tensor2)
return tensor1_replaced, tensor2_replaced
Reshaping
1
2
3
4
5
6
7
8
9
10
import torch
tensor_a = torch.randn(2, 3, 4) # Tensor with some shape
tensor_b = torch.randn(6, 4) # Another tensor with a different shape
# Reshape tensor_b to match the shape of tensor_a
reshaped_tensor = tensor_b.reshape(tensor_a.shape)
# reshape using -1, which means "inferring size"
tensor_a = torch.randn(2, 3, 4)
tensor_a.reshape(3, -1).shape # 3, 8
tensor_b.reshape(-1).shape # see 24.
-
-1
means “inferring size”. -
Checking for unique values:
1
print(torch.unique(target))
-
There’s no difference between
tensor.size()
andtensor.shape
-
Transpose
-
tensor.transpose(dim1, dim2)
: swaps the 2 dims in a matrix. Equivalent to applyingpermute()
if we only re-arrange two dims there1
a.transpose(1, 3)
-
tensor.t()
can only transpose a 2D matrix.1 2 3
a = torch.rand(2, 3) print(a.shape) print(a.t().shape)
-
Note that after
transpose()
, the data did not change but thestrides
andshape
changed.Tensor.view()
requires contiguous data, so beforeview()
one needs to callcontiguous()
.1
a1 = a.transpose(1, 3).contiguous().view(4, 3 * 32 * 32)
-
Broadcasting
- dimensions with
1
can be expanded implicitly through broadcasting. E.g., for matrix addition(3, 1, 4) + (1, 2, 4) = (3, 2, 4)
. Here is how it works:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
[
[[a, b, c, d]],
[[e, f, g, h]],
[[i, j, k, l]]
]
+
[
[
[m, n, o, p],
[q, r, s, t]
]
]
=
[
[
[a, b, c, d] + [m, n, o, p],
[a, b, c, d] + [q, r, s, t]
],
[
[e, f, g, h] + [m, n, o, p],
[e, f, g, h] + [q, r, s, t]
],
[
[i, j, k, l] + [m, n, o, p],
[i, j, k, l] + [q, r, s, t]
],
[]
]
Misc
- Printing full tensors
1
2
torch.set_printoptions(profile="full") # Set print options to 'full'
print(predicted_test)
- Model summary: there are two methods
model = print(model) # Your model definition
- torchsummary only supports passing an input tensor of float() into the model, then trace the model structure
- It’s better to use
torchinfo
a package, due to this issuesummary(model, input_size, batch_dim=batch_dim)
input_size
should NOT contain batch size.torchinfo
will unsqueeze it in batch_dim.- One can pass in input data directly as well.
Advanced Topics
In a custom module, write code for training mode and eval mode
1
2
3
4
class MyDummy(torch.nn.Module):
def forward(self):
if self.training:
...
To make variables learnable parameters
1
2
3
4
5
6
7
8
9
10
class MyModule(torch.nn.Module):
def __init__(self, ) -> None:
super().__init__()
self.my_param = torch.nn.Parameter(torch.Tensor([1,2,3]))
def forward(self, x):
return x*self.my_param
m = MyModule()
print(m.my_param, m.my_param.requires_grad, m.my_param.data)
m(torch.Tensor([1,2,3]))
torch.nn.Parameter is to represent a learnable param in a neural network.
torch.nn.Parameter
autoregisters a parameter in the module’s parameter list. It will then be used for computational graph building for gradient descent.
Register Buffers
A buffer:
- Is NOT a parameter in a module, so it cannot be learned, and no gradient is computed for them.
- A buffer’s value can be loaded with the module’s dictionary. So a buffer can persist between runs.
- Once
model.to()
is called, the register buffer will be moved over as well as part of it.
1
self.register_buffer('running_mean', torch.zeros(num_features))