抱抱臉(Hugging Face)應用生成式AI於教學場景-Text-to-image

mobile botton

搜尋

進階搜尋

熱門關鍵字

上方連結

下方連結

:::

Text-to-image

/001/Upload/592/relpic/-1/592/f7e1458b-7858-4009-9417-934eb1ab72f7.png Paper: https://arxiv.org/abs/2108.01073

Stable Diffusion XL 1.0

Guided image synthesis enables everyday users to create and edit photo-realistic images with minimum effort. The key challenge is balancing faithfulness to the user input (e.g., hand-drawn colored strokes) and realism of the synthesized image. Existing GAN-based methods attempt to achieve such balance using either conditional GANs or GAN inversions, which are challenging and often require additional training data or loss functions for individual applications. To address these issues, we introduce a new image synthesis and editing method, Stochastic Differential Editing (SDEdit), based on a diffusion model generative prior, which synthesizes realistic images by iteratively denoising through a stochastic differential equation (SDE). Given an input image with user guide of any type, SDEdit first adds noise to the input, then subsequently denoises the resulting image through the SDE prior to increase its realism. SDEdit does not require task-specific training or inversions and can naturally achieve the balance between realism and faithfulness. SDEdit significantly outperforms state-of-the-art GAN-based methods by up to 98.09% on realism and 91.72% on overall satisfaction scores, according to a human perception study, on multiple tasks, including stroke-based image synthesis and editing as well as image compositing.

Model size	LLaMA model	Explanation
original model	Stable-diffusion-xl-base-1.0	SDXL consists of an ensemble of experts pipeline for latent diffusion: In a first step, the base model is used to generate (noisy) latents, which are then further processed with a refinement model (available here: https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0/) specialized for the final denoising steps. Note that the base model can be used as a standalone module.
original model	Stable-diffusion-2-base	The model is trained from scratch 550k steps at resolution 256x256 on a subset of LAION-5B filtered for explicit pornographic material, using the LAION-NSFW classifier with punsafe=0.1 and an aesthetic score >= 4.5. Then it is further trained for 850k steps at resolution 512x512 on the same dataset on images with resolution >= 512x512.
fine tuned model	Stable-diffusion-xl-refiner-1.0	Yon can use the refiner to improve images in this repo.
	Stable-diffusion-2-1	This stable-diffusion-2-1 model is fine-tuned from stable-diffusion-2 (768-v-ema.ckpt) with an additional 55k steps on the same dataset (with punsafe=0.1), and then fine-tuned for another 155k extra steps with punsafe=0.98.
	sd-x2-latent-upscaler	This model was trained on a high-resolution subset of the LAION-2B dataset. It is a diffusion model that operates in the same latent space as the Stable Diffusion model, which is decoded into a full-resolution image.
	stabilityai/stable-diffusion-2-1-unclip	This stable-diffusion-2-1-unclip is a finetuned version of Stable Diffusion 2.1, modified to accept (noisy) CLIP image embedding in addition to the text prompt, and can be used to create image variations (Examples) or can be chained with text-to-image CLIP priors. The amount of noise added to the image embedding can be specified via the noise_level (0 means no noise, 1000 full noise).
	disney-pixar-cartoon	Disney cartoon-like animates are provided in this generative model
	crystal-clear-xlv1	Provided the latest entry from the Crystal Clear suite of models.
	copax-timelessxl-sdxl10	Fined tuned model of realistic image using the base model of

Demo for Stable Diffusion: https://huggingface.co/spaces/Manjushri/SDXL-1.0

關閉

:::

下方連結

更新日期

113-12-27

教育部資訊及科技教育司

106214 臺北市大安區和平東路二段106號12樓

維護：

國立政治大學創新國際學院呂欣澤助理教授
國立中央大學通識中心黃鈺晴兼任助理教授
國立中興大學資訊工程學系楊景明助理教授