Pickle
A Python serialization format for saving objects to disk and loading them back later. Common use cases include models, dictionaries, lists, pandas DataFrames, preprocessing objects, and intermediate results. It is Python-specific and not human-readable.
1
2
3
4
5
6
7
8
9
10
11
12
13
import pickle
data = {"name": "Rico", "value": 42}
# Save to .pkl
with open("data.pkl", "wb") as f:
pickle.dump(data, f)
# Load from .pkl
with open("data.pkl", "rb") as f:
loaded_data = pickle.load(f)
print(loaded_data)
.pt / .pth
PyTorch’s checkpoint format for storing model weights. You can save weights only:
1
2
3
4
5
6
7
8
import torch
# Save weights only
torch.save(model.state_dict(), "model.pt")
# Load weights only
model.load_state_dict(torch.load("model.pt"))
model.eval()
Or a full checkpoint:
1
2
3
4
5
6
torch.save({
"model_state_dict": model.state_dict(),
"optimizer_state_dict": optimizer.state_dict(),
"epoch": epoch,
"loss": loss,
}, "checkpoint.pt")
model_state_dict: all learned weights and biases, stored as{layer_name: tensor}.optimizer_state_dict: internal optimizer state (momentum buffers, moving averages, learning rate, step counters), enabling exact resumption of training. Stored as:
1
2
3
4
{
"state": {...},
"param_groups": [...]
}
epochandlossare stored as an integer and a float, respectively.
To resume training, load the checkpoint and restore each component:
1
2
3
checkpoint = torch.load("checkpoint.pt")
model.load_state_dict(checkpoint["model_state_dict"])
optimizer.load_state_dict(checkpoint["optimizer_state_dict"])
.onnx (Open Neural Network Exchange)
A portable neural network format designed for deployment outside of training frameworks. It supports PyTorch, TensorFlow, ONNX Runtime, and more, with fast, hardware-optimized, cross-platform inference.
Export from PyTorch:
1
torch.onnx.export(model, dummy_input, "model.onnx")
Run inference:
1
2
3
4
import onnxruntime as ort
session = ort.InferenceSession("model.onnx")
outputs = session.run(None, {"input": x})
ONNX Graph
An ONNX graph represents the neural network as a computational graph. Connections are connected together to show how data flows from input -> output.
ONNX is commonly used for C++ deployment on edge devices. One such graph typically includes:
| Component | Meaning |
|---|---|
| Nodes | Operations (Conv, MatMul, ReLU, etc.) |
| Edges / tensors | Data flowing between nodes |
| Inputs | Model inputs |
| Outputs | Model outputs |
| Initializers | Learned weights (parameters) |
For example:
1
y = ReLU(Wx + b)
becomes
1
2
3
4
5
6
7
8
9
Input (x)
↓
MatMul (W * x)
↓
Add (+ b)
↓
ReLU
↓
Output (y)
Each box is a node in the graph. Graph engines can optimize the graph. E.g., Conv → BatchNorm → ReLU may become FusedConv which is much faster. ️Also, a graph lets runtime detect independent operations and run them in Parallel.
A graph lets runtimes detect independent operations and run them in parallel.
Caution: Not All Components Are ONNX-Exportable
Not every part of a model can be traced into an ONNX graph. The table below summarizes exportability for a typical point-cloud compression model:
| Component | ONNX-exportable? | Reason |
|---|---|---|
model.decoder |
✅ Yes | Pure Conv1d/Conv2d/reshape — traced cleanly |
model.pre_conv |
✅ Yes | Conv1d + GroupNorm + ReLU |
model.latent_xyzs_synthesis |
✅ Yes | Conv1d stack |
model.encoder |
❌ No | Uses pointops.furthestsampling + knnquery_heap — CUDA custom ops, not traceable |
model.feats_eblock.compress/decompress |
❌ No | compressai range coder — pure Python entropy coding, not a torch graph |
model.feats_eblock.forward |
⚠️ With mock | Can be traced if __round__ is mocked, as visualize_model.py already does |
Hard blocker — the entropy coder. EntropyBottleneck.compress() / .decompress() are Python-level range coders that produce byte strings, not tensors. There is no torch graph to export, so they cannot be represented in ONNX.
The realistic deployment split is to keep the encoder and entropy coding in Python/CUDA, and export only the decoder side to ONNX:
1
2
3
4
5
┌─ decode (ONNX-exportable) ────────────────────────┐
│ feats_eblock.decompress() → latent_feats │
│ decoder → reconstructed output │
└───────────────────────────────────────────────────┘
Q: Could a C++ reimplementation of compress() be ONNX-exported?
It doesn’t need to be. If compress() / decompress() are reimplemented natively in C++, they are called directly from C++ code — completely outside the ONNX graph. The ONNX model only needs to cover the neural network computations (i.e., the decoder). The entropy coder lives alongside it as a separate C++ component, not inside the graph.
Within the encoder, EntropyBottleneck is a pure python loop with learnable parameters to represent a Cumulative Distribution Function (CDF).
1
2
3
4
5
6
EntropyBottleneck
├── forward() → tensor ops on CDF params → ✅ ONNX (training path only)
├── compress() → range encoder loop → ❌ not a torch graph
└── decompress() → range decoder loop → ❌ not a torch graph
└── uses _quantized_cdf (learned, stored) → these weights must travel
with the C++ entropy coder
So you can export the CDF tables as raw tensors, not as an ONNX graph
1
2
3
4
5
torch.save({
"quantized_cdf": model.feats_eblock._quantized_cdf,
"cdf_lengths": model.feats_eblock._cdf_lengths,
"offsets": model.feats_eblock._offset,
}, "feats_eblock_cdf.pt")
Then in C++ you load those tables and feed them into a C++ range coder — ryg-rans or the one compressai itself ships in compressai/lib/.
C++ decoder-side skeleton:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
#include <onnxruntime_cxx_api.h>
#include <vector>
// ── 1. Load ONNX decoder ────────────────────────────────────────────────────
Ort::Env env(ORT_LOGGING_LEVEL_WARNING, "decoder");
Ort::SessionOptions opts;
opts.SetIntraOpNumThreads(1);
Ort::Session session(env, "decoder.onnx", opts);
// ── 2. Entropy decode (C++ range coder) ─────────────────────────────────────
// byte_strings + CDF params (loaded from a .json or .bin sidecar) → latent_feats
// e.g. using a C++ arithmetic coding library or a hand-ported compressai coder
std::vector<float> latent_feats = entropy_decode(byte_strings, cdf_params);
std::vector<float> latent_xyzs = decode_xyzs(xyz_byte_strings);
// ── 3. Run ONNX decoder ──────────────────────────────────────────────────────
auto memory_info = Ort::MemoryInfo::CreateCpu(OrtArenaAllocator, OrtMemTypeDefault);
std::array<int64_t, 3> feats_shape{1, C, N};
std::array<int64_t, 3> xyzs_shape{1, 3, N};
Ort::Value inputs[] = {
Ort::Value::CreateTensor<float>(memory_info,
latent_feats.data(), latent_feats.size(),
feats_shape.data(), feats_shape.size()),
Ort::Value::CreateTensor<float>(memory_info,
latent_xyzs.data(), latent_xyzs.size(),
xyzs_shape.data(), xyzs_shape.size()),
};
const char* input_names[] = {"latent_feats", "latent_xyzs"};
const char* output_names[] = {"reconstructed"};
auto outputs = session.Run(Ort::RunOptions{},
input_names, inputs, 2,
output_names, 1);
float* reconstructed = outputs[0].GetTensorMutableData<float>();
The CDF parameters learned during training (used by the entropy coder) are not part of the ONNX graph — they are exported separately (e.g., as a .json or .bin sidecar file) and loaded by the C++ entropy decoder at runtime.
.json
Used for model metadata, label mappings, experiment configurations, and similar structured data.