original model | Stable-diffusion-xl-base-1.0 | SDXL consists of an ensemble of experts pipeline for latent diffusion: In a first step, the base model is used to generate (noisy) latents, which are then further processed with a refinement model (available here: https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0/) specialized for the final denoising steps. Note that the base model can be used as a standalone module. |
Stable-diffusion-2-base | The model is trained from scratch 550k steps at resolution 256x256 on a subset of LAION-5B filtered for explicit pornographic material, using the LAION-NSFW classifier with punsafe=0.1 and an aesthetic score >= 4.5. Then it is further trained for 850k steps at resolution 512x512 on the same dataset on images with resolution >= 512x512. |
fine tuned model | Stable-diffusion-xl-refiner-1.0 | Yon can use the refiner to improve images in this repo. |
Stable-diffusion-2-1 | This stable-diffusion-2-1 model is fine-tuned from stable-diffusion-2 (768-v-ema.ckpt) with an additional 55k steps on the same dataset (with punsafe=0.1), and then fine-tuned for another 155k extra steps with punsafe=0.98. |
sd-x2-latent-upscaler | This model was trained on a high-resolution subset of the LAION-2B dataset. It is a diffusion model that operates in the same latent space as the Stable Diffusion model, which is decoded into a full-resolution image. |
stabilityai/stable-diffusion-2-1-unclip | This stable-diffusion-2-1-unclip is a finetuned version of Stable Diffusion 2.1, modified to accept (noisy) CLIP image embedding in addition to the text prompt, and can be used to create image variations (Examples) or can be chained with text-to-image CLIP priors. The amount of noise added to the image embedding can be specified via the noise_level (0 means no noise, 1000 full noise). |
disney-pixar-cartoon | Disney cartoon-like animates are provided in this generative model |
crystal-clear-xlv1 | Provided the latest entry from the Crystal Clear suite of models. |
copax-timelessxl-sdxl10 | Fined tuned model of realistic image using the base model of |