Skip to content

CompVis/stable-diffusion

Repository files navigation

Stable Diffusion was made possible thanks to a collaboration with Stability AI and Runway and builds upon our previous work:

High-Resolution Image Synthesis with Latent Diffusion Models
Robin Rombach*, Andreas Blattmann*, Dominik Lorenz, Patrick Esser, Björn Ommer
CVPR '22 Oral | GitHub | arXiv | Project page

txt2img-stable2 Stable Diffusion is a latent text-to-image diffusion model. Thanks to a generous compute donation from Stability AI and support from LAION, we were able to train a Latent Diffusion Model on 512x512 images from a subset of the LAION-5B database. Similar to Google's Imagen, this model uses a frozen CLIP ViT-L/14 text encoder to condition the model on text prompts. With its 860M UNet and 123M text encoder, the model is relatively lightweight and runs on a GPU with at least 10GB VRAM. See this section below and the model card.

Requirements

A suitable conda environment named ldm can be created and activated with:

conda env create -f environment.yaml
conda activate ldm

You can also update an existing latent diffusion environment by running

conda install pytorch torchvision -c pytorch
pip install transformers==4.19.2 diffusers invisible-watermark
pip install -e .

Stable Diffusion v1

Stable Diffusion v1 refers to a specific configuration of the model architecture that uses a downsampling-factor 8 autoencoder with an 860M UNet and CLIP ViT-L/14 text encoder for the diffusion model. The model was pretrained on 256x256 images and then finetuned on 512x512 images.

Note: Stable Diffusion v1 is a general text-to-image diffusion model and therefore mirrors biases and (mis-)conceptions that are present in its training data. Details on the training procedure and data, as well as the intended use of the model can be found in the corresponding model card.

The weights are available via the CompVis organization at Hugging Face under a license which contains specific use-based restrictions to prevent misuse and harm as informed by the model card, but otherwise remains permissive. While commercial use is permitted under the terms of the license, we do not recommend using the provided weights for services or products without additional safety mechanisms and considerations, since there are known limitations and biases of the weights, and research on safe and ethical deployment of general text-to-image models is an ongoing effort. The weights are research artifacts and should be treated as such.

The CreativeML OpenRAIL M license is an Open RAIL M license, adapted from the work that BigScience and the RAIL Initiative are jointly carrying in the area of responsible AI licensing. See also the article about the BLOOM Open RAIL license on which our license is based.

Weights

We currently provide the following checkpoints:

  • sd-v1-1.ckpt: 237k steps at resolution 256x256 on laion2B-en. 194k steps at resolution 512x512 on laion-high-resolution (170M examples from LAION-5B with resolution >= 1024x1024).
  • sd-v1-2.ckpt: Resumed from sd-v1-1.ckpt. 515k steps at resolution 512x512 on laion-aesthetics v2 5+ (a subset of laion2B-en with estimated aesthetics score > 5.0, and additionally filtered to images with an original size >= 512x512, and an estimated watermark probability < 0.5. The watermark estimate is from the LAION-5B metadata, the aesthetics score is estimated using the LAION-Aesthetics Predictor V2).
  • sd-v1-3.ckpt: Resumed from sd-v1-2.ckpt. 195k steps at resolution 512x512 on "laion-aesthetics v2 5+" and 10% dropping of the text-conditioning to improve classifier-free guidance sampling.
  • sd-v1-4.ckpt: Resumed from sd-v1-2.ckpt. 225k steps at resolution 512x512 on "laion-aesthetics v2 5+" and 10% dropping of the text-conditioning to improve classifier-free guidance sampling.

Evaluations with different classifier-free guidance scales (1.5, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0) and 50 PLMS sampling steps show the relative improvements of the checkpoints: sd evaluation results

Text-to-Image with Stable Diffusion

txt2img-stable2 txt2img-stable2

Stable Diffusion is a latent diffusion model conditioned on the (non-pooled) text embeddings of a CLIP ViT-L/14 text encoder. We provide a reference script for sampling, but there also exists a diffusers integration, which we expect to see more active community development.

Reference Sampling Script

We provide a reference sampling script, which incorporates

After obtaining the stable-diffusion-v1-*-original weights, link them

mkdir -p models/ldm/stable-diffusion-v1/
ln -s <path/to/model.ckpt> models/ldm/stable-diffusion-v1/model.ckpt 

and sample with

python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms 

By default, this uses a guidance scale of --scale 7.5, Katherine Crowson's implementation of the PLMS sampler, and renders images of size 512x512 (which it was trained on) in 50 steps. All supported arguments are listed below (type python scripts/txt2img.py --help).

usage: txt2img.py [-h] [--prompt [PROMPT]] [--outdir [OUTDIR]] [--skip_grid] [--skip_save] [--ddim_steps DDIM_STEPS] [--plms] [--laion400m] [--fixed_code] [--ddim_eta DDIM_ETA]
                  [--n_iter N_ITER] [--H H] [--W W] [--C C] [--f F] [--n_samples N_SAMPLES] [--n_rows N_ROWS] [--scale SCALE] [--from-file FROM_FILE] [--config CONFIG] [--ckpt CKPT]
                  [--seed SEED] [--precision {full,autocast}]

optional arguments:
  -h, --help            show this help message and exit
  --prompt [PROMPT]     the prompt to render
  --outdir [OUTDIR]     dir to write results to
  --skip_grid           do not save a grid, only individual samples. Helpful when evaluating lots of samples
  --skip_save           do not save individual samples. For speed measurements.
  --ddim_steps DDIM_STEPS
                        number of ddim sampling steps
  --plms                use plms sampling
  --laion400m           uses the LAION400M model
  --fixed_code          if enabled, uses the same starting code across samples
  --ddim_eta DDIM_ETA   ddim eta (eta=0.0 corresponds to deterministic sampling
  --n_iter N_ITER       sample this often
  --H H                 image height, in pixel space
  --W W                 image width, in pixel space
  --C C                 latent channels
  --f F                 downsampling factor
  --n_samples N_SAMPLES
                        how many samples to produce for each given prompt. A.k.a. batch size
  --n_rows N_ROWS       rows in the grid (default: n_samples)
  --scale SCALE         unconditional guidance scale: eps = eps(x, empty) + scale * (eps(x, cond) - eps(x, empty))
  --from-file FROM_FILE
                        if specified, load prompts from this file
  --config CONFIG       path to config which constructs model
  --ckpt CKPT           path to checkpoint of model
  --seed SEED           the seed (for reproducible sampling)
  --precision {full,autocast}
                        evaluate at this precision

Note: The inference config for all v1 versions is designed to be used with EMA-only checkpoints. For this reason use_ema=False is set in the configuration, otherwise the code will try to switch from non-EMA to EMA weights. If you want to examine the effect of EMA vs no EMA, we provide "full" checkpoints which contain both types of weights. For these, use_ema=False will load and use the non-EMA weights.

Diffusers Integration

A simple way to download and sample Stable Diffusion is by using the diffusers library:

# make sure you're logged in with `huggingface-cli login`
from torch import autocast
from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained(
	"CompVis/stable-diffusion-v1-4", 
	use_auth_token=True
).to("cuda")

prompt = "a photo of an astronaut riding a horse on mars"
with autocast("cuda"):
    image = pipe(prompt)["sample"][0]  
    
image.save("astronaut_rides_horse.png")

Image Modification with Stable Diffusion

By using a diffusion-denoising mechanism as first proposed by SDEdit, the model can be used for different tasks such as text-guided image-to-image translation and upscaling. Similar to the txt2img sampling script, we provide a script to perform image modification with Stable Diffusion.

The following describes an example where a rough sketch made in Pinta is converted into a detailed artwork.

python scripts/img2img.py --prompt "A fantasy landscape, trending on artstation" --init-img <path-to-img.jpg> --strength 0.8

Here, strength is a value between 0.0 and 1.0, that controls the amount of noise that is added to the input image. Values that approach 1.0 allow for lots of variations but will also produce images that are not semantically consistent with the input. See the following example.

Input

sketch-in

Outputs

out3 out2

This procedure can, for example, also be used to upscale samples from the base model.

Comments

BibTeX

@misc{rombach2021highresolution,
      title={High-Resolution Image Synthesis with Latent Diffusion Models}, 
      author={Robin Rombach and Andreas Blattmann and Dominik Lorenz and Patrick Esser and Björn Ommer},
      year={2021},
      eprint={2112.10752},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

About

A latent text-to-image diffusion model

Resources

Readme

License

View license

Stars

Watchers

Forks

Report repository

Releases

No releases published

Packages

No packages published

相关内容推荐

重实干孵化基地玩教具制作要是那造句妈妈我害怕爱情短语质证意见人职匹配原子城清廉手抄报简单红军长征过草地居民议事会银河补习班评价一朵小花秋天的美景手抄报软笔书法作品新同学作文实验诊断学战斗在法国中班上学期西湖桥相亲简历模板设备操作规程民生实事项目第二人称作文说明文文章造心收花生logo设计征集屈原的故事爱国主题神话起源竭力炼铁技术存在英文老舍月牙儿人间真爱作文宝宝成长记录科学故事演讲稿计算机就业前景错刀数控机床操作入门白白的棉花像什么如何增强党员意识周记100全国禁毒日日常vlog新闻倒金字塔结构伤心地铁洛阳游记打扫卧室的英文咖啡屋装修业务能力个人简历空白表格好句40字知情书宠物护理白眉大侠下载撤销处分成都综合素质评价货梯安全操作规程党员信仰跳出舒适圈红色小说电表的改装苏轼人生经历学校走廊团课心得体会阿里巴巴与神灯红色英雄人物事迹经典短文青春的旋律饮酒其四陶渊明催促的英文英语反思银河补习班评价苏教版语文庐山导游词时间牢笼学校郊游英语孤儿单党员交流发言材料校团委组织部合同台账模板为民情怀手表文案普通话手抄报内容在人生的十字路口周报模板阳光大课间高中毕业评语战斗在法国忆亲人说明文的语言特点教师学习笔记特别的人英文消防评估报告小红和小绿昆曲游园惊梦房屋出租合同书少管所东开头的成语学生会招新文案帽子的英文怎么写汲取造句二翁登泰山会议简报体育的重要性地基打桩打印机的英文论述文给妈妈洗脚图片家长反馈徒步旅行的英文达摩克斯之剑反应停事件工伤证明怎么写招标代理镇远古城简介社会教案筑梦前行智慧的故事罗波安资助政策现代散文保护森林的作文美味英语意识形态领域斗争新闻通稿怎么写体育的重要性书法课英语计算机组装水资源论文阅读之星父爱像什么信封英语怎么说简历编辑撤乡并镇再塑生命的人只有这一关党校自我总结法律培训心得体会文献检索心得体会小鱼游游游弟子规谨篇李时珍的事迹飞来峰上千寻塔采购质量邀请聚餐的短信时文选粹活动清单团员大会会议记录代收代付学生手册平凡之路朴树原唱师说科学发展观内容回门宴答谢词品质提升改善方案辅导员职责探索历史我会收拾玩具科技管理研究实践过程学生英语单词我是仙女足球故事班主任教育案例学位证明体育道德校本课程开发案例霜降英文移民警察警魂试用期考核评价在人生的十字路口迎新晚会活动背景父亲节送什么花昆曲游园惊梦10的认识未参保证明怎么开古特雷斯简介拍鞋网官网遮住的反义词江雪作者小小说作文颜值焦虑新战狼和平与爱魔法师英文初二物理实验资料员实习日志清洁工英文画自己的家乡海豹英语怎么说智能化战争旧约医学生誓言朝花夕拾的直言天下第一事疏

站外内容推荐
慈喀SEO百科 | 网站设计制作 | SEO工具 | 网络红人榜 | 德赛环球网 | 耗材资讯 | 资源网 | 企业智库 | 网站关键词排名 | 网络红人榜 | 耗材资讯 | 网络营销推广 | seo优化公司 | SEO优化教程 | 关键词排名 | 关键词排名 | 网站关键词排名 | seo技术研究 | 卡通粉丝俱乐部 | 优化网站关键词 | SEO优化教程