跳到主要內容區塊

/001/Upload/592/relpic/-1/592/f7e1458b-7858-4009-9417-934eb1ab72f7.pngPaper: https://arxiv.org/abs/2108.01073

Guided image synthesis enables everyday users to create and edit photo-realistic images with minimum effort. The key challenge is balancing faithfulness to the user input (e.g., hand-drawn colored strokes) and realism of the synthesized image. Existing GAN-based methods attempt to achieve such balance using either conditional GANs or GAN inversions, which are challenging and often require additional training data or loss functions for individual applications. To address these issues, we introduce a new image synthesis and editing method, Stochastic Differential Editing (SDEdit), based on a diffusion model generative prior, which synthesizes realistic images by iteratively denoising through a stochastic differential equation (SDE). Given an input image with user guide of any type, SDEdit first adds noise to the input, then subsequently denoises the resulting image through the SDE prior to increase its realism. SDEdit does not require task-specific training or inversions and can naturally achieve the balance between realism and faithfulness. SDEdit significantly outperforms state-of-the-art GAN-based methods by up to 98.09% on realism and 91.72% on overall satisfaction scores, according to a human perception study, on multiple tasks, including stroke-based image synthesis and editing as well as image compositing.

Model sizeLLaMA modelExplanation
original modelStable-diffusion-xl-base-1.0SDXL consists of an ensemble of experts pipeline for latent diffusion: In a first step, the base model is used to generate (noisy) latents, which are then further processed with a refinement model (available here: https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0/) specialized for the final denoising steps. Note that the base model can be used as a standalone module.
Stable-diffusion-2-baseThe model is trained from scratch 550k steps at resolution 256x256 on a subset of LAION-5B filtered for explicit pornographic material, using the LAION-NSFW classifier with punsafe=0.1 and an aesthetic score >= 4.5. Then it is further trained for 850k steps at resolution 512x512 on the same dataset on images with resolution >= 512x512.
fine tuned modelStable-diffusion-xl-refiner-1.0Yon can use the refiner to improve images in this repo.
Stable-diffusion-2-1This stable-diffusion-2-1 model is fine-tuned from stable-diffusion-2 (768-v-ema.ckpt) with an additional 55k steps on the same dataset (with punsafe=0.1), and then fine-tuned for another 155k extra steps with punsafe=0.98.
sd-x2-latent-upscalerThis model was trained on a high-resolution subset of the LAION-2B dataset.  It is a diffusion model that operates in the same latent space as the Stable Diffusion model, which is decoded into a full-resolution image.
stabilityai/stable-diffusion-2-1-unclipThis stable-diffusion-2-1-unclip is a finetuned version of Stable Diffusion 2.1, modified to accept (noisy) CLIP image embedding in addition to the text prompt, and can be used to create image variations (Examples) or can be chained with text-to-image CLIP priors. The amount of noise added to the image embedding can be specified via the noise_level (0 means no noise, 1000 full noise).
disney-pixar-cartoonDisney cartoon-like animates are provided in this generative model
crystal-clear-xlv1Provided  the latest entry from the Crystal Clear suite of models.
copax-timelessxl-sdxl10Fined tuned model of realistic image using the base model of