ViT

Posted by Rico's Nerd Cluster on March 8, 2026

ViT in short:

Image -> 16x16 patches [16 x 16 x 3] -> project to [256/768] token embedding TODO