How is Stable Diffusion 2.0 Better?
Now Funded, "it had one of the fastest climbs to 10K Github stars of any software"
Hey Everyone,
Happy Thanksgiving to my American readers.
The Stable Diffusion 2 release has a lot of improvements.
Stable Diffusion 2.0 delivers a number of big improvements and features versus the original V1 release, so let’s dive in and take a look at them.
And feel free to
OpenCLIP
The Stable Diffusion 2.0 release includes robust text-to-image models trained using a brand new text encoder (OpenCLIP), developed by LAION with support from Stability AI, which greatly improves the quality of the generated images compared to earlier V1 releases. The text-to-image models in this release can generate images with default resolutions of both 512x512 pixels and 768x768 pixels.
These models are trained on an aesthetic subset of the LAION-5B dataset created by the DeepFloyd team at Stability AI, which is then further filtered to remove adult content using LAION’s NSFW filter.
Let’s take a casual look at some of the new features of Stable Diffusion 2.0 in Tweets as well:
And
as well as: (click the tweet to see the changes)
Upscaler Diffusion (Higher Resolution)
Super-resolution Upscaler Diffusion Models
Stable Diffusion 2.0 also includes an Upscaler Diffusion model that enhances the resolution of images by a factor of 4. Below is an example of our model upscaling a low-resolution generated image (128x128) into a higher resolution image (512x512). Combined with our text-to-image models, Stable Diffusion 2.0 can now generate images with resolutions of 2048x2048–or even higher.
Depth-to-Image Diffusion Model (Depth2img)
Anothing thing that’s new is their depth-guided stable diffusion model, called depth2img, extends the previous image-to-image feature from V1 with brand new possibilities for creative applications. Depth2img infers the depth of an input image (using an existing model), and then generates new images using both the text and depth information.
It’s a festive season:
Learn More
Lean more: https://github.com/Stability-AI/stablediffusion
https://huggingface.co/stabilityai/stable-diffusion-2
Updated Inpainting Diffusion Model
They also include a new text-guided inpainting model, fine-tuned on the new Stable Diffusion 2.0 base text-to-image, which makes it super easy to switch out parts of an image intelligently and quickly.
According to the new release, along with its powerful new features like depth2img and higher resolution upscaling capabilities, will serve as the foundation of countless applications and enable an explosion of new creative potential.
Stable Diffusion 2 broadly speaking also adds the following:
The new stable diffusion model offers a 768×768 resolution.
The U-Net has the same amount of parameters as version 1.5, but it is trained from scratch and uses OpenCLIP-ViT/H as its text encoder. A so-called v-prediction model is SD 2.0-v.
The aforementioned model was adjusted from SD 2.0-base, which is also made available and was trained as a typical noise-prediction model on 512×512 images.
A latent text-guided diffusion model with x4 scaling has been added.
Refined SD 2.0-base depth-guided stable diffusion model. The model can be utilized for structure-preserving img2img and shape-conditional synthesis and is conditioned on monocular depth estimates deduced by MiDaS.
An improved text-guided inpainting model built on the SD 2.0 foundation.
Thanks for reading!
Cool recap. 🎇✨