Sana's speed comes from what Nvidia calls a “deep compression autoencoder” that squeezes image data down to 1/32nd of its original size—while keeping all the details intact. The model pairs this with the Gemma 2 LLM to understand prompts, creating a system that punches well above its weight class on modest hardware.
If the final product is as good as the public demo, Sana promises to be a brand new image generator built to run on less demanding systems, which will be a huge advantage for Nvidia as it tries to reach even more users.