ML

[ML] Data Formats

Pickle, pth, onnx

Posted by Rico's Nerd Cluster on January 23, 2026

Pickle

A Python serialization format for saving objects to disk and loading them back later. Common use cases include models, dictionaries, lists, pandas DataFrames, preprocessing objects, and intermediate results. It is Python-specific and not human-readable.

1
2
3
4
5
6
7
8
9
10
11
12
13
import pickle

data = {"name": "Rico", "value": 42}

# Save to .pkl
with open("data.pkl", "wb") as f:
    pickle.dump(data, f)

# Load from .pkl
with open("data.pkl", "rb") as f:
    loaded_data = pickle.load(f)

print(loaded_data)

.pt / .pth

PyTorch’s checkpoint format for storing model weights. You can save weights only:

1
2
3
4
5
6
7
8
import torch

# Save weights only
torch.save(model.state_dict(), "model.pt")

# Load weights only
model.load_state_dict(torch.load("model.pt"))
model.eval()

Or a full checkpoint:

1
2
3
4
5
6
torch.save({
    "model_state_dict": model.state_dict(),
    "optimizer_state_dict": optimizer.state_dict(),
    "epoch": epoch,
    "loss": loss,
}, "checkpoint.pt")
  • model_state_dict: all learned weights and biases, stored as {layer_name: tensor}.
  • optimizer_state_dict: internal optimizer state (momentum buffers, moving averages, learning rate, step counters), enabling exact resumption of training. Stored as:
1
2
3
4
{
    "state": {...},
    "param_groups": [...]
}
  • epoch and loss are stored as an integer and a float, respectively.

To resume training, load the checkpoint and restore each component:

1
2
3
checkpoint = torch.load("checkpoint.pt")
model.load_state_dict(checkpoint["model_state_dict"])
optimizer.load_state_dict(checkpoint["optimizer_state_dict"])

.onnx (Open Neural Network Exchange)

A portable neural network format designed for deployment outside of training frameworks. It supports PyTorch, TensorFlow, ONNX Runtime, and more, with fast, hardware-optimized, cross-platform inference.

Export from PyTorch:

1
torch.onnx.export(model, dummy_input, "model.onnx")

Run inference:

1
2
3
4
import onnxruntime as ort

session = ort.InferenceSession("model.onnx")
outputs = session.run(None, {"input": x})

ONNX Graph

An ONNX graph represents the neural network as a computational graph. Connections are connected together to show how data flows from input -> output.

ONNX is commonly used for C++ deployment on edge devices. One such graph typically includes:

Component Meaning
Nodes Operations (Conv, MatMul, ReLU, etc.)
Edges / tensors Data flowing between nodes
Inputs Model inputs
Outputs Model outputs
Initializers Learned weights (parameters)

For example:

1
y = ReLU(Wx + b)

becomes

1
2
3
4
5
6
7
8
9
Input (x)
   ↓
MatMul (W * x)
   ↓
Add (+ b)
   ↓
ReLU
   ↓
Output (y)

Each box is a node in the graph. Graph engines can optimize the graph. E.g., Conv → BatchNorm → ReLU may become FusedConv which is much faster. ️Also, a graph lets runtime detect independent operations and run them in Parallel.

A graph lets runtimes detect independent operations and run them in parallel.

Caution: Not All Components Are ONNX-Exportable

Not every part of a model can be traced into an ONNX graph. The table below summarizes exportability for a typical point-cloud compression model:

Component ONNX-exportable? Reason
model.decoder ✅ Yes Pure Conv1d/Conv2d/reshape — traced cleanly
model.pre_conv ✅ Yes Conv1d + GroupNorm + ReLU
model.latent_xyzs_synthesis ✅ Yes Conv1d stack
model.encoder ❌ No Uses pointops.furthestsampling + knnquery_heap — CUDA custom ops, not traceable
model.feats_eblock.compress/decompress ❌ No compressai range coder — pure Python entropy coding, not a torch graph
model.feats_eblock.forward ⚠️ With mock Can be traced if __round__ is mocked, as visualize_model.py already does

Hard blocker — the entropy coder. EntropyBottleneck.compress() / .decompress() are Python-level range coders that produce byte strings, not tensors. There is no torch graph to export, so they cannot be represented in ONNX.

The realistic deployment split is to keep the encoder and entropy coding in Python/CUDA, and export only the decoder side to ONNX:

1
2
3
4
5
┌─ decode (ONNX-exportable) ────────────────────────┐
│  feats_eblock.decompress() → latent_feats         │
│  decoder → reconstructed output                  │
└───────────────────────────────────────────────────┘

Q: Could a C++ reimplementation of compress() be ONNX-exported? It doesn’t need to be. If compress() / decompress() are reimplemented natively in C++, they are called directly from C++ code — completely outside the ONNX graph. The ONNX model only needs to cover the neural network computations (i.e., the decoder). The entropy coder lives alongside it as a separate C++ component, not inside the graph.

Within the encoder, EntropyBottleneck is a pure python loop with learnable parameters to represent a Cumulative Distribution Function (CDF).

1
2
3
4
5
6
EntropyBottleneck
├── forward()           tensor ops on CDF params    ONNX  (training path only)
├── compress()          range encoder loop           not a torch graph
└── decompress()        range decoder loop           not a torch graph
     └── uses _quantized_cdf  (learned, stored)      these weights must travel
                                                       with the C++ entropy coder

So you can export the CDF tables as raw tensors, not as an ONNX graph

1
2
3
4
5
torch.save({
    "quantized_cdf":     model.feats_eblock._quantized_cdf,
    "cdf_lengths":       model.feats_eblock._cdf_lengths,
    "offsets":           model.feats_eblock._offset,
}, "feats_eblock_cdf.pt")

Then in C++ you load those tables and feed them into a C++ range coder — ryg-rans or the one compressai itself ships in compressai/lib/.

C++ decoder-side skeleton:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
#include <onnxruntime_cxx_api.h>
#include <vector>

// ── 1. Load ONNX decoder ────────────────────────────────────────────────────
Ort::Env env(ORT_LOGGING_LEVEL_WARNING, "decoder");
Ort::SessionOptions opts;
opts.SetIntraOpNumThreads(1);
Ort::Session session(env, "decoder.onnx", opts);

// ── 2. Entropy decode (C++ range coder) ─────────────────────────────────────
// byte_strings + CDF params (loaded from a .json or .bin sidecar) → latent_feats
// e.g. using a C++ arithmetic coding library or a hand-ported compressai coder
std::vector<float> latent_feats = entropy_decode(byte_strings, cdf_params);
std::vector<float> latent_xyzs  = decode_xyzs(xyz_byte_strings);

// ── 3. Run ONNX decoder ──────────────────────────────────────────────────────
auto memory_info = Ort::MemoryInfo::CreateCpu(OrtArenaAllocator, OrtMemTypeDefault);

std::array<int64_t, 3> feats_shape{1, C, N};
std::array<int64_t, 3> xyzs_shape{1, 3, N};

Ort::Value inputs[] = {
    Ort::Value::CreateTensor<float>(memory_info,
        latent_feats.data(), latent_feats.size(),
        feats_shape.data(), feats_shape.size()),
    Ort::Value::CreateTensor<float>(memory_info,
        latent_xyzs.data(), latent_xyzs.size(),
        xyzs_shape.data(), xyzs_shape.size()),
};

const char* input_names[]  = {"latent_feats", "latent_xyzs"};
const char* output_names[] = {"reconstructed"};

auto outputs = session.Run(Ort::RunOptions{},
    input_names, inputs, 2,
    output_names, 1);

float* reconstructed = outputs[0].GetTensorMutableData<float>();

The CDF parameters learned during training (used by the entropy coder) are not part of the ONNX graph — they are exported separately (e.g., as a .json or .bin sidecar file) and loaded by the C++ entropy decoder at runtime.

.json

Used for model metadata, label mappings, experiment configurations, and similar structured data.