We compare different decoding methods to show the quality and efficiency of MAGVIT COMMIT decoding. We train a base transformer model with the same 3D-VQ tokenizer for each method.
MAGVIT COMMIT decoding 12 steps (ours)
Autoregressive decoding 1024 steps
MaskGIT (Chang et al. 2022) MTM decoding 12 steps
Comparaing VQ Tokenizers on UCF-101
We compare different VQ tokenizers to demonstrate the superior reconstruction quality of MAGVIT 3D-VQ. These models are only trained on 9.5K training videos of the small UCF-101 dataset. See Perceptual Compression for large real-world examples of MAGVIT 3D-VQ.