Better than JPEG? The researcher discovers that stable diffusion can compress images

Zoom / These jagged colored blocks are exactly what the concept of image compression sounds like.

Bing Edwards / Ars Technica

Last week, Swiss software engineer Matthias Pullman Discover That famous photomontage model stable spread It can compress existing bitmaps with fewer visual artifacts than JPEG or WebP at high compression ratios, although there are significant caveats.

Stable spread is a file Artificial intelligence photomontage model which typically generate images based on text descriptions (called “claims”). The AI ​​model learned this ability by studying millions of images taken from the Internet. During the training process, the model makes statistical associations between images and related words, making a much smaller representation of basic information about each image and storing them as “weights,” which are mathematical values ​​that represent what the AI ​​image model knows, so they occur.

When stable diffusion analyzes and “compresses” the images into a weight form, they reside in what researchers call a “latent space,” a way of saying it exists as a kind of blurry potential that can be perceived in the images once they are decoded. With Stable Diffusion 1.4, the weights file is roughly 4GB, but it’s knowledge of hundreds of millions of images.

Examples of using Stable Diffusion to compress images.
Zoom / Examples of using Stable Diffusion to compress images.

While most people use Stable Diffusion with text prompts, Bühlmann clipped the text encoder and instead forced his images through the Stable Diffusion image encoding process, which takes a low-resolution 512×512 image and converts it to a higher resolution 64×64 latent representation of the space. At this point, the image exists with a much smaller data size than the original image, but it can still be expanded (decoded) to a 512×512 image with fairly good results.

See also  The Elden Ring gameplay trailer has been confirmed for February 21

While running tests, Bühlmann found that images compressed with Stable Diffusion look subjectively better at higher compression ratios (smaller file size) than JPEG or WebP. In one example, it shows an image of a candy store compressed to 5.68 KB using JPEG, 5.71 KB using WebP, and 4.98 KB using Stable Diffusion. The stable diffusion image appears to have finer details and less clear compression results than those compressed in other formats.

Demo examples of using Stable Diffusion to compress images.  SD results at the far right.
Zoom / Demo examples of using Stable Diffusion to compress images. SD results at the far right.

Bühlmann’s method currently comes with significant limitations, however: it is not good with faces or text and, in some cases, can actually hallucinate detailed features in the decoded image that were not present in the source image. (You probably don’t want the image compressor to invent details in an image that doesn’t exist.) Also, file decoding requires 4GB of stable propagation weights and additional decoding time.

While this use of Stable Diffusion is unconventional and is more of a fun hack than a practical solution, it may indicate a new, future use of photo montage models. Could be a Pullman symbol found on Google Colab, You will find more technical details about his experience in Posted as AI.

Leave a Reply

Your email address will not be published. Required fields are marked *