[ML] D-PCC Decoder Layers

isohedron, losses

Posted by Rico's Nerd Cluster on February 6, 2026

Decoder

We need multiple upsampling block, instead of a major one. Like instead of doing upsampling x3 -> x3 -> x3, we just do x27. Why?

  1. upsampling x27 in training might require large gradient changes, which might introduce numerical instabilities
  2. this would require input features to capture extremely fine details.
  3. receptive field is the same, but more intermediate non-linearity between combinations of input elements is learned.

This is similar to super pixel:

1
2x -> 2x -> 2x

Instead of doing it in one shot:

1
->8x

Icosahedron

An icosahedron is a regular solid with 20 triangular faces and 12 vertices. In this project, icosahedron2sphere(level) uses it to generate nearly uniformly distributed directions on a sphere — these serve as candidate upsampling directions when reconstructing point clouds.

A unit icosahedron has all edges of equal length. This holds if and only if its 12 vertices are:

\[(0, \pm 1, \pm \varphi), \quad (\pm 1, \pm \varphi, 0), \quad (\pm \varphi, 0, \pm 1)\]

where $\varphi$ is the golden ratio:

\[\varphi = \frac{1 + \sqrt{5}}{2} \approx 1.618\]

Using any other value would produce unequal edge lengths.

icosahedron2sphere(level) works as follows:

  1. Project the icosahedron’s 12 vertices onto a unit sphere.
  2. If level > 1, subdivide each triangular face by inserting a new vertex at the midpoint of each edge, then project those new vertices back onto the sphere.
  3. Return the resulting directions, which are nearly uniformly distributed over the sphere.
Icosahedron
Icosahedron (20 faces, 12 vertices)
Icosahedron Sphere Points Level 1
Vertices projected onto sphere — Level 1

The 12 base vertices are not perfectly uniform, but each subdivision level makes the distribution increasingly uniform. As the return values of this stage, we return

  • vertices as 3D coordinates
  • triangles’ vertex indices in verticex coordinates above

Below is Level 2 — the midpoints of all edges are added and re-projected:

Uniform Directions Level 2 Angular Projection
Uniform directions — Level 2 angular projection
Uniform Directions Level 2 Mesh Quiver
Uniform directions — Level 2 mesh quiver visualization

Sub-Pixel / Sub point Convolution

Assume we want to upsample an input by upsample factor r

1
H x W x C -> rH x rW x C

TODO: Traditionally, this is done by interpolation. However, this tends to yield similar features Why would they duplicate in the first place??? Because these features have small variations, the resultant points are clustered.

Periodic shuffle is

1
H x W x C -> H x W x (C*r^2) ---shuffle---> rH x rW x C 

You simply interpret the same data in a different shape, like

TODO: Why it works? Channel shuffling + piecing together increase the possiblity of variation of the feature space?? Then this helps gradient descent will optimize parameters with more variation.

Sub-pixel Convolution is similar

Feature Dimension is 4.

1
N x c = 2 x 4

wuith upsampling factor = r: after convolution:

1
N x c x r

Then periodic shuffle:

1
(r * N) x c

Loss

mean distance, number of upsampled points, Chamfer loss per downsample stage is fed into the loss function, so they are directly penalized:

  • Upsampling ratioL: L1(predicted upsample_num vs ground truth downsample_num)
  • predicted mean distance vs true mean distance