Skip to content

Commit a9bc7ea

Browse files
[pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
1 parent fe762e4 commit a9bc7ea

File tree

234 files changed

+35350
-20024
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

234 files changed

+35350
-20024
lines changed

.dockerignore

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
11
outputs/
22
src/
3-
configs/webui/userconfig_streamlit.yaml
3+
configs/webui/userconfig_streamlit.yaml

.gitattributes

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
* text=auto
22
*.{cmd,[cC][mM][dD]} text eol=crlf
33
*.{bat,[bB][aA][tT]} text eol=crlf
4-
*.sh text eol=lf
4+
*.sh text eol=lf

.github/ISSUE_TEMPLATE/bug_report.yml

+3-3
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ body:
4040
- type: dropdown
4141
id: os
4242
attributes:
43-
label: Where are you running the webui?
43+
label: Where are you running the webui?
4444
multiple: true
4545
options:
4646
- Windows
@@ -52,7 +52,7 @@ body:
5252
attributes:
5353
label: Custom settings
5454
description: If you are running the webui with specifi settings, please paste them here for reference (like --nitro)
55-
render: shell
55+
render: shell
5656
- type: textarea
5757
id: logs
5858
attributes:
@@ -66,4 +66,4 @@ body:
6666
description: By submitting this issue, you agree to follow our [Code of Conduct](https://docs.github.com/en/site-policy/github-terms/github-community-code-of-conduct)
6767
options:
6868
- label: I agree to follow this project's Code of Conduct
69-
required: true
69+
required: true

.github/PULL_REQUEST_TEMPLATE.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -13,4 +13,4 @@ Closes: # (issue)
1313
- [ ] I have changed the base branch to `dev`
1414
- [ ] I have performed a self-review of my own code
1515
- [ ] I have commented my code in hard-to-understand areas
16-
- [ ] I have made corresponding changes to the documentation
16+
- [ ] I have made corresponding changes to the documentation

.github/workflows/deploy.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -37,4 +37,4 @@ jobs:
3737
# The GH actions bot is used by default if you didn't specify the two fields.
3838
# You can swap them out with your own user credentials.
3939
user_name: github-actions[bot]
40-
user_email: 41898282+github-actions[bot]@users.noreply.github.com
40+
user_email: 41898282+github-actions[bot]@users.noreply.github.com

.github/workflows/test-deploy.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -21,4 +21,4 @@ jobs:
2121
- name: Install dependencies
2222
run: yarn install
2323
- name: Test build website
24-
run: yarn build
24+
run: yarn build

README.md

+14-14
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66

77
## Installation instructions for:
88

9-
- **[Windows](https://sygil-dev.github.io/sygil-webui/docs/Installation/windows-installation)**
9+
- **[Windows](https://sygil-dev.github.io/sygil-webui/docs/Installation/windows-installation)**
1010
- **[Linux](https://sygil-dev.github.io/sygil-webui/docs/Installation/linux-installation)**
1111

1212
### Want to ask a question or request a feature?
@@ -34,10 +34,10 @@ Check the [Contribution Guide](CONTRIBUTING.md)
3434

3535
* Run additional upscaling models on CPU to save VRAM
3636

37-
* Textual inversion: [Reaserch Paper](https://textual-inversion.github.io/)
37+
* Textual inversion: [Reaserch Paper](https://textual-inversion.github.io/)
3838

3939
* K-Diffusion Samplers: A great collection of samplers to use, including:
40-
40+
4141
- `k_euler`
4242
- `k_lms`
4343
- `k_euler_a`
@@ -95,8 +95,8 @@ An easy way to work with Stable Diffusion right from your browser.
9595
To give a token (tag recognized by the AI) a specific or increased weight (emphasis), add `:0.##` to the prompt, where `0.##` is a decimal that will specify the weight of all tokens before the colon.
9696
Ex: `cat:0.30, dog:0.70` or `guy riding a bicycle :0.7, incoming car :0.30`
9797

98-
Negative prompts can be added by using `###` , after which any tokens will be seen as negative.
99-
Ex: `cat playing with string ### yarn` will negate `yarn` from the generated image.
98+
Negative prompts can be added by using `###` , after which any tokens will be seen as negative.
99+
Ex: `cat playing with string ### yarn` will negate `yarn` from the generated image.
100100

101101
Negatives are a very powerful tool to get rid of contextually similar or related topics, but **be careful when adding them since the AI might see connections you can't**, and end up outputting gibberish
102102

@@ -131,7 +131,7 @@ Lets you improve faces in pictures using the GFPGAN model. There is a checkbox i
131131

132132
If you want to use GFPGAN to improve generated faces, you need to install it separately.
133133
Download [GFPGANv1.4.pth](https://github.com/TencentARC/GFPGAN/releases/download/v1.3.4/GFPGANv1.4.pth) and put it
134-
into the `/sygil-webui/models/gfpgan` directory.
134+
into the `/sygil-webui/models/gfpgan` directory.
135135

136136
### RealESRGAN
137137

@@ -141,7 +141,7 @@ Lets you double the resolution of generated images. There is a checkbox in every
141141
There is also a separate tab for using RealESRGAN on any picture.
142142

143143
Download [RealESRGAN_x4plus.pth](https://github.com/xinntao/Real-ESRGAN/releases/download/v0.1.0/RealESRGAN_x4plus.pth) and [RealESRGAN_x4plus_anime_6B.pth](https://github.com/xinntao/Real-ESRGAN/releases/download/v0.2.2.4/RealESRGAN_x4plus_anime_6B.pth).
144-
Put them into the `sygil-webui/models/realesrgan` directory.
144+
Put them into the `sygil-webui/models/realesrgan` directory.
145145

146146
### LSDR
147147

@@ -174,8 +174,8 @@ which is available on [GitHub](https://github.com/CompVis/latent-diffusion). PDF
174174

175175
[Stable Diffusion](#stable-diffusion-v1) is a latent text-to-image diffusion
176176
model.
177-
Thanks to a generous compute donation from [Stability AI](https://stability.ai/) and support from [LAION](https://laion.ai/), we were able to train a Latent Diffusion Model on 512x512 images from a subset of the [LAION-5B](https://laion.ai/blog/laion-5b/) database.
178-
Similar to Google's [Imagen](https://arxiv.org/abs/2205.11487),
177+
Thanks to a generous compute donation from [Stability AI](https://stability.ai/) and support from [LAION](https://laion.ai/), we were able to train a Latent Diffusion Model on 512x512 images from a subset of the [LAION-5B](https://laion.ai/blog/laion-5b/) database.
178+
Similar to Google's [Imagen](https://arxiv.org/abs/2205.11487),
179179
this model uses a frozen CLIP ViT-L/14 text encoder to condition the model on text prompts.
180180
With its 860M UNet and 123M text encoder, the model is relatively lightweight and runs on a GPU with at least 10GB VRAM.
181181
See [this section](#stable-diffusion-v1) below and the [model card](https://huggingface.co/CompVis/stable-diffusion).
@@ -184,26 +184,26 @@ See [this section](#stable-diffusion-v1) below and the [model card](https://hugg
184184

185185
Stable Diffusion v1 refers to a specific configuration of the model
186186
architecture that uses a downsampling-factor 8 autoencoder with an 860M UNet
187-
and CLIP ViT-L/14 text encoder for the diffusion model. The model was pretrained on 256x256 images and
187+
and CLIP ViT-L/14 text encoder for the diffusion model. The model was pretrained on 256x256 images and
188188
then finetuned on 512x512 images.
189189

190190
*Note: Stable Diffusion v1 is a general text-to-image diffusion model and therefore mirrors biases and (mis-)conceptions that are present
191-
in its training data.
191+
in its training data.
192192
Details on the training procedure and data, as well as the intended use of the model can be found in the corresponding [model card](https://huggingface.co/CompVis/stable-diffusion).
193193

194194
## Comments
195195

196196
- Our code base for the diffusion models builds heavily on [OpenAI's ADM codebase](https://github.com/openai/guided-diffusion)
197-
and [https://github.com/lucidrains/denoising-diffusion-pytorch](https://github.com/lucidrains/denoising-diffusion-pytorch).
197+
and [https://github.com/lucidrains/denoising-diffusion-pytorch](https://github.com/lucidrains/denoising-diffusion-pytorch).
198198
Thanks for open-sourcing!
199199

200-
- The implementation of the transformer encoder is from [x-transformers](https://github.com/lucidrains/x-transformers) by [lucidrains](https://github.com/lucidrains?tab=repositories).
200+
- The implementation of the transformer encoder is from [x-transformers](https://github.com/lucidrains/x-transformers) by [lucidrains](https://github.com/lucidrains?tab=repositories).
201201

202202
## BibTeX
203203

204204
```
205205
@misc{rombach2021highresolution,
206-
title={High-Resolution Image Synthesis with Latent Diffusion Models},
206+
title={High-Resolution Image Synthesis with Latent Diffusion Models},
207207
author={Robin Rombach and Andreas Blattmann and Dominik Lorenz and Patrick Esser and Björn Ommer},
208208
year={2021},
209209
eprint={2112.10752},

Stable_Diffusion_v1_Model_Card.md

+9-10
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ This model card focuses on the model associated with the Stable Diffusion model,
2121

2222
# Uses
2323

24-
## Direct Use
24+
## Direct Use
2525
The model is intended for research purposes only. Possible research areas and
2626
tasks include
2727

@@ -68,11 +68,11 @@ Using the model to generate content that is cruel to individuals is a misuse of
6868
considerations.
6969

7070
### Bias
71-
While the capabilities of image generation models are impressive, they can also reinforce or exacerbate social biases.
72-
Stable Diffusion v1 was trained on subsets of [LAION-2B(en)](https://laion.ai/blog/laion-5b/),
73-
which consists of images that are primarily limited to English descriptions.
74-
Texts and images from communities and cultures that use other languages are likely to be insufficiently accounted for.
75-
This affects the overall output of the model, as white and western cultures are often set as the default. Further, the
71+
While the capabilities of image generation models are impressive, they can also reinforce or exacerbate social biases.
72+
Stable Diffusion v1 was trained on subsets of [LAION-2B(en)](https://laion.ai/blog/laion-5b/),
73+
which consists of images that are primarily limited to English descriptions.
74+
Texts and images from communities and cultures that use other languages are likely to be insufficiently accounted for.
75+
This affects the overall output of the model, as white and western cultures are often set as the default. Further, the
7676
ability of the model to generate content with non-English prompts is significantly worse than with English-language prompts.
7777

7878

@@ -84,7 +84,7 @@ The model developers used the following dataset for training the model:
8484
- LAION-2B (en) and subsets thereof (see next section)
8585

8686
**Training Procedure**
87-
Stable Diffusion v1 is a latent diffusion model which combines an autoencoder with a diffusion model that is trained in the latent space of the autoencoder. During training,
87+
Stable Diffusion v1 is a latent diffusion model which combines an autoencoder with a diffusion model that is trained in the latent space of the autoencoder. During training,
8888

8989
- Images are encoded through an encoder, which turns images into latent representations. The autoencoder uses a relative downsampling factor of 8 and maps images of shape H x W x 3 to latents of shape H/f x W/f x 4
9090
- Text prompts are encoded through a ViT-L/14 text-encoder.
@@ -108,12 +108,12 @@ filtered to images with an original size `>= 512x512`, estimated aesthetics scor
108108
- **Batch:** 32 x 8 x 2 x 4 = 2048
109109
- **Learning rate:** warmup to 0.0001 for 10,000 steps and then kept constant
110110

111-
## Evaluation Results
111+
## Evaluation Results
112112
Evaluations with different classifier-free guidance scales (1.5, 2.0, 3.0, 4.0,
113113
5.0, 6.0, 7.0, 8.0) and 50 PLMS sampling
114114
steps show the relative improvements of the checkpoints:
115115

116-
![pareto](assets/v1-variants-scores.jpg)
116+
![pareto](assets/v1-variants-scores.jpg)
117117

118118
Evaluated using 50 PLMS steps and 10000 random prompts from the COCO2017 validation set, evaluated at 512x512 resolution. Not optimized for FID scores.
119119
## Environmental Impact
@@ -137,4 +137,3 @@ Based on that information, we estimate the following CO2 emissions using the [Ma
137137
}
138138

139139
*This model card was written by: Robin Rombach and Patrick Esser and is based on the [DALL-E Mini model card](https://huggingface.co/dalle-mini/dalle-mini).*
140-

Web_based_UI_for_Stable_Diffusion_colab.ipynb

+1-1
Original file line numberDiff line numberDiff line change
@@ -582,4 +582,4 @@
582582
"outputs": []
583583
}
584584
]
585-
}
585+
}

blog/2022-10-20/1.Textual inversion usage competitio.md

+3-3
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ Hopefully demand will be high, we want to train **hundreds** of new concepts!
2323

2424
# What does `most inventive use` mean?
2525

26-
Whatever you want it to mean! be creative! experiment!
26+
Whatever you want it to mean! be creative! experiment!
2727

2828
There are several categories we will look at:
2929

@@ -33,7 +33,7 @@ There are several categories we will look at:
3333

3434
* composition; meaning anything related to how big things are, their position, the angle, etc
3535

36-
* styling;
36+
* styling;
3737

3838
![image](https://user-images.githubusercontent.com/106811348/197045629-029ba6f5-1f79-475c-9ce7-969aaf3d253b.png)
3939

@@ -45,7 +45,7 @@ There are several categories we will look at:
4545

4646
## `The Sims(TM): Stable Diffusion edition` ?
4747

48-
For this event the theme is “The Sims: Stable Diffusion edition”.
48+
For this event the theme is “The Sims: Stable Diffusion edition”.
4949

5050
So we have selected a subset of [products from Amazon Berkely Objects dataset](https://github.com/sd-webui/abo).
5151

configs/blip/bert_config.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -17,5 +17,5 @@
1717
"type_vocab_size": 2,
1818
"vocab_size": 30522,
1919
"encoder_width": 768,
20-
"add_cross_attention": true
20+
"add_cross_attention": true
2121
}

configs/blip/caption_coco.yaml

+1-2
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ init_lr: 1e-5
2121
image_size: 384
2222

2323
# generation configs
24-
max_length: 20
24+
max_length: 20
2525
min_length: 5
2626
num_beams: 3
2727
prompt: 'a picture of '
@@ -30,4 +30,3 @@ prompt: 'a picture of '
3030
weight_decay: 0.05
3131
min_lr: 0
3232
max_epoch: 5
33-

configs/blip/med_config.json

+1-1
Original file line numberDiff line numberDiff line change
@@ -17,5 +17,5 @@
1717
"type_vocab_size": 2,
1818
"vocab_size": 30524,
1919
"encoder_width": 768,
20-
"add_cross_attention": true
20+
"add_cross_attention": true
2121
}

configs/blip/nlvr.yaml

+3-4
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,13 @@
1-
image_root: '/export/share/datasets/vision/NLVR2/'
1+
image_root: '/export/share/datasets/vision/NLVR2/'
22
ann_root: 'annotation'
33

44
# set pretrained as a file path or an url
55
pretrained: 'https://storage.googleapis.com/sfr-vision-language-research/BLIP/models/model_base_nlvr.pth'
66

77
#size of vit model; base or large
88
vit: 'base'
9-
batch_size_train: 16
10-
batch_size_test: 64
9+
batch_size_train: 16
10+
batch_size_test: 64
1111
vit_grad_ckpt: False
1212
vit_ckpt_layer: 0
1313
max_epoch: 15
@@ -18,4 +18,3 @@ image_size: 384
1818
weight_decay: 0.05
1919
init_lr: 3e-5
2020
min_lr: 0
21-

configs/blip/nocaps.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -12,4 +12,4 @@ image_size: 384
1212
max_length: 20
1313
min_length: 5
1414
num_beams: 3
15-
prompt: 'a picture of '
15+
prompt: 'a picture of '

configs/blip/pretrain.yaml

+1-4
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
train_file: ['/export/share/junnan-li/VL_pretrain/annotation/coco_karpathy_train.json',
22
'/export/share/junnan-li/VL_pretrain/annotation/vg_caption.json',
33
]
4-
laion_path: ''
4+
laion_path: ''
55

66
# size of vit model; base or large
77
vit: 'base'
@@ -22,6 +22,3 @@ warmup_lr: 1e-6
2222
lr_decay_rate: 0.9
2323
max_epoch: 20
2424
warmup_steps: 3000
25-
26-
27-

configs/blip/retrieval_coco.yaml

-1
Original file line numberDiff line numberDiff line change
@@ -31,4 +31,3 @@ negative_all_rank: True
3131
weight_decay: 0.05
3232
min_lr: 0
3333
max_epoch: 6
34-

configs/blip/retrieval_flickr.yaml

-1
Original file line numberDiff line numberDiff line change
@@ -31,4 +31,3 @@ negative_all_rank: False
3131
weight_decay: 0.05
3232
min_lr: 0
3333
max_epoch: 6
34-

configs/blip/retrieval_msrvtt.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,4 +9,4 @@ vit: 'base'
99
batch_size: 64
1010
k_test: 128
1111
image_size: 384
12-
num_frm_test: 8
12+
num_frm_test: 8

configs/blip/vqa.yaml

+3-3
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,8 @@ pretrained: 'https://storage.googleapis.com/sfr-vision-language-research/BLIP/mo
88

99
# size of vit model; base or large
1010
vit: 'base'
11-
batch_size_train: 16
12-
batch_size_test: 32
11+
batch_size_train: 16
12+
batch_size_test: 32
1313
vit_grad_ckpt: False
1414
vit_ckpt_layer: 0
1515
init_lr: 2e-5
@@ -22,4 +22,4 @@ inference: 'rank'
2222
# optimizer
2323
weight_decay: 0.05
2424
min_lr: 0
25-
max_epoch: 10
25+
max_epoch: 10

configs/latent-diffusion/celebahq-ldm-vq-4.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -83,4 +83,4 @@ lightning:
8383
increase_log_steps: False
8484

8585
trainer:
86-
benchmark: True
86+
benchmark: True

configs/latent-diffusion/cin-ldm-vq-f8.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -95,4 +95,4 @@ lightning:
9595
increase_log_steps: False
9696

9797
trainer:
98-
benchmark: True
98+
benchmark: True

configs/latent-diffusion/cin256-v2.yaml

+3-3
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ model:
1515
conditioning_key: crossattn
1616
monitor: val/loss
1717
use_ema: False
18-
18+
1919
unet_config:
2020
target: ldm.modules.diffusionmodules.openaimodel.UNetModel
2121
params:
@@ -37,7 +37,7 @@ model:
3737
use_spatial_transformer: true
3838
transformer_depth: 1
3939
context_dim: 512
40-
40+
4141
first_stage_config:
4242
target: ldm.models.autoencoder.VQModelInterface
4343
params:
@@ -59,7 +59,7 @@ model:
5959
dropout: 0.0
6060
lossconfig:
6161
target: torch.nn.Identity
62-
62+
6363
cond_stage_config:
6464
target: ldm.modules.encoders.modules.ClassEmbedder
6565
params:

configs/latent-diffusion/ffhq-ldm-vq-4.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -82,4 +82,4 @@ lightning:
8282
increase_log_steps: False
8383

8484
trainer:
85-
benchmark: True
85+
benchmark: True

0 commit comments

Comments
 (0)