Data Type Conversions
Common Data Types
torch.arange(start, stop, step)can take either float or int valuestorch.range(start, stop, step)is deprecated because its signature is different from that of python’srange()
torch.tensor(int, dtype=torch.float32). We can’t pass an int right intotorch.sqrt(). We must transform it into a tensor.- Note, we are using the function
torch.tensor(), not the classtorch.Tensor() - Or alternatively, use
math.sqrt()
- Note, we are using the function
-
to invert a bool mask:
~key_padding_mask - datatype check:
tensor.dtype, nottype(tensor)
Convertions Between An Numpy Array And Its Torch Tensor
1
2
3
4
5
6
7
8
9
10
11
12
13
torch_tensor = torch.from_numpy(np_array)
# convert from float64 to float32
torch_tensor_float32 = torch_tensor.float()
# Create the tensor on CUDA
torch_tensor_gpu = torch_tensor.to('cuda')
# This is production-friendly
torch_tensor = torch_tensor.to('cuda' if torch.cuda.is_available() else 'cpu')
# Explicitly creating a cpu based tensor
tensor = tensor.cpu()
# back to numpy array:
np_array = tensor.detach().numpy()
Float to Bool
- If an
np_arrayis offloat64, then to convert it to other datatypes usingtorch_tensor.float() - Datatypes: if we need to convert a matrix into
intwhen seeing errors like"bitwise_and_cuda" not implemented for 'Float', we can domatrix.bool()
1
2
predicted_test = torch.where(outputs_test > 0.4, 1, 0).bool()
local_correct = (predicted_test & labels_test).sum().item()
Device Related
-
It’s crucial to add
.to(X.device)to a custom model -
In places like
torch.autocast(device_type=device_type, dtype=torch.float16), we need to pass a string in.- Solution:
device_type = str(device)
- Solution:
Neural Network Model Components
Data Loading
1
train_sampler = RandomSampler(train_data)
Make a conv-batch-relu module that optionally have components
1
2
3
layers = [nn.Conv2d(), nn.Conv2d() ...]
layers.append(component) # if necessary
nn.Sequential(*layers)
1
- `nn.Sequential()` is a sequential container that takes in modules. It has a `forward()` function, and it will pass it on to the first module, then the chain starts.
nn.Sequential()does not support input args inseq_layers(X, other_args). In that case, usenn.ModuleList()and manually iterate through the layers
1
2
3
4
5
6
7
encoder_layers = torch.nn.ModuleList([
EncoderLayer(embedding_dim=self.embedding_dim, num_heads=num_heads, dropout_rate=dropout_rate)
for _ in range(encoder_layer_num)
])
for encoder_layer in self.encoder_layers:
X = encoder_layer(X, attn_mask=attn_mask, key_padding_mask=key_padding_mask)
- Softmax layer:
1
2
3
4
5
6
7
8
9
10
11
12
13
import torch
import torch.nn as nn
# Example softmax layer
softmax_layer = nn.Softmax(dim=1)
# Example input (logits)
logits = torch.tensor([[2.0, 1.0, 0.1],
[1.0, 3.0, 0.1]])
# Apply the softmax layer
softmax_output = softmax_layer(logits)
print(softmax_output)
- Focal Loss:
nn.softmax()andTensor.gather()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
import torch
# (m, class_num, h, w)
model_preds = torch.Tensor([[
# two channels
[[0.1, 0.4, 0.2]],
[[0.3, 0.6, 0.7]],
]])
# label (m, h, w), only 1 correct class
targets = torch.Tensor([
[[0, 1, 0]]
]).long()
probs = torch.nn.functional.softmax(inputs, dim=1)
# See
# tensor([[[[0.4502, 0.4502, 0.3775]],
# [[0.5498, 0.5498, 0.6225]]]])
print(probs)
# See
# tensor([[[[0.4502, 0.5498, 0.3775]]]])
probs.gather(1, targets.unsqueeze(1))
1
2
3
4
5
6
- `softmax` creates a softmax across these two channels.
- `tensor.gather(dim, indices)` here will select the softmax values at the locations indicated in targets. `targets` cleverly stores indices of one-hot vecotr as class labels.
- `LazyLinear` dims are initialized during first pass
- `optimizer.zero_grad()` should always come before the backward pass
- ` with torch.autograd.set_detect_anomaly(True):` can be used to print a stack trace
- indexing: "arr_2d[:, 0] = arr_1d"
Common Operations
Math Operations
torch.bmm(input, mat2): Batch-Matrix-Multiplication-
If input is a (b×n×m) tensor, mat2 is a (b×m×p) tensor, out will be a (b×n×p) tensor.
1 2 3
``` outi=inputi@mat2i ```
-
tensor.numel()calculates the total number of elements. Returnsbatch_size * height * width.torch.manual_seed(42)set a seed in the RPNG for both CPU and CUDA.torch.var(unbiased=False)this is to calculate biased variance. It’s useful in batch norm calculation.torch.Tensor()’s singleton dimensions are the dimensions with only 1 element.- Masking
1
2
3
4
a = torch.ones((4))
mask = torch.tensor([1,0,0,1]).bool()
a = a.masked_fill(mask, float("-inf"))
a # see tensor([-inf, 1., 1., -inf])
NaNcomparison:torch.allclose()does not handle nan. This is to replaceNaNwith sentinel
1
2
3
4
def allclose_replace_nan(tensor1, tensor2, rtol=1e-05, atol=1e-08, sentinel=0.0):
tensor1_replaced = torch.where(torch.isnan(tensor1), torch.full_like(tensor1, sentinel), tensor1)
tensor2_replaced = torch.where(torch.isnan(tensor2), torch.full_like(tensor2, sentinel), tensor2)
return tensor1_replaced, tensor2_replaced
Reshaping
1
2
3
4
5
6
7
8
9
10
import torch
tensor_a = torch.randn(2, 3, 4) # Tensor with some shape
tensor_b = torch.randn(6, 4) # Another tensor with a different shape
# Reshape tensor_b to match the shape of tensor_a
reshaped_tensor = tensor_b.reshape(tensor_a.shape)
# reshape using -1, which means "inferring size"
tensor_a = torch.randn(2, 3, 4)
tensor_a.reshape(3, -1).shape # 3, 8
tensor_b.reshape(-1).shape # see 24.
-
-1means “inferring size”. -
Checking for unique values:
1
print(torch.unique(target))
-
There’s no difference between
tensor.size()andtensor.shape -
Transpose
-
tensor.transpose(dim1, dim2): swaps the 2 dims in a matrix. Equivalent to applyingpermute()if we only re-arrange two dims there1
a.transpose(1, 3)
-
tensor.t()can only transpose a 2D matrix.1 2 3
a = torch.rand(2, 3) print(a.shape) print(a.t().shape)
-
Note that after
transpose(), the data did not change but thestridesandshapechanged.Tensor.view()requires contiguous data, so beforeview()one needs to callcontiguous().1
a1 = a.transpose(1, 3).contiguous().view(4, 3 * 32 * 32)
-
Broadcasting
- dimensions with
1can be expanded implicitly through broadcasting. E.g., for matrix addition(3, 1, 4) + (1, 2, 4) = (3, 2, 4). Here is how it works:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
[
[[a, b, c, d]],
[[e, f, g, h]],
[[i, j, k, l]]
]
+
[
[
[m, n, o, p],
[q, r, s, t]
]
]
=
[
[
[a, b, c, d] + [m, n, o, p],
[a, b, c, d] + [q, r, s, t]
],
[
[e, f, g, h] + [m, n, o, p],
[e, f, g, h] + [q, r, s, t]
],
[
[i, j, k, l] + [m, n, o, p],
[i, j, k, l] + [q, r, s, t]
],
[]
]
Misc
- Printing full tensors
1
2
torch.set_printoptions(profile="full") # Set print options to 'full'
print(predicted_test)
- Model summary: there are two methods
model = print(model) # Your model definition- torchsummary only supports passing an input tensor of float() into the model, then trace the model structure
- It’s better to use
torchinfoa package, due to this issuesummary(model, input_size, batch_dim=batch_dim)input_sizeshould NOT contain batch size.torchinfowill unsqueeze it in batch_dim.- One can pass in input data directly as well.
Advanced Topics
In a custom module, write code for training mode and eval mode
1
2
3
4
class MyDummy(torch.nn.Module):
def forward(self):
if self.training:
...
To make variables learnable parameters
1
2
3
4
5
6
7
8
9
10
class MyModule(torch.nn.Module):
def __init__(self, ) -> None:
super().__init__()
self.my_param = torch.nn.Parameter(torch.Tensor([1,2,3]))
def forward(self, x):
return x*self.my_param
m = MyModule()
print(m.my_param, m.my_param.requires_grad, m.my_param.data)
m(torch.Tensor([1,2,3]))
torch.nn.Parameter is to represent a learnable param in a neural network.
torch.nn.Parameterautoregisters a parameter in the module’s parameter list. It will then be used for computational graph building for gradient descent.
Register Buffers
A buffer:
- Is NOT a parameter in a module, so it cannot be learned, and no gradient is computed for them.
- A buffer’s value can be loaded with the module’s dictionary. So a buffer can persist between runs.
- Once
model.to()is called, the register buffer will be moved over as well as part of it.
1
self.register_buffer('running_mean', torch.zeros(num_features))