-
0. Create conda environment
conda create -n kernel python=3.10
(you only need the hardware cuda kernel environment for extention) -
1. Follow the instructions here: https://github.com/thu-nics/MixDQ/tree/master/kernels to set up basic environment for the moduel.
-
2. Follow the instructions here: https://github.com/comfyanonymous/ComfyUI#installing for manuel install comfyui for Linux.
-
3. Git Clone and put the
MixDQ
repository under./ComfyUI/custom_nodes/
-
4. Putting the sdxl-turbo model under
/ComfyUI/models/checkpoints/sdxl-turbo
. The model can be downloaded by Diffuser as below:
from diffusers import AutoPipelineForText2Image
import torch
model_id = "stabilityai/sdxl-turbo"
cache_dir = "/share/public/diffusion_quant/huggingface/hub/"
save_path = "~/ComfyUI/models/checkpoints/sdxl-turbo/"
pipe = AutoPipelineForText2Image.from_pretrained(model_id, cache_dir = cache_dir,torch_dtype=torch.float16, variant="fp16")
pipe.save_pretrained(save_path)
- It is worth noticing that torch 2.2.1+cu118 with xformers 0.0.25 may cause error in Diffusers inference when tested on our computer. To solve this problem, we recommend to uninstall xformers or use
xformers==0.0.20
(Although it may display version incompatibility but it works). For more details, you can see huggingface/diffusers#9889.
- Run
python main.py --cuda-device 0
to run ComfyUI, as--cuda-device
chooses which GPU to run with. - Open browser and go to
http://127.0.0.1:8188
. - Load the workflow by
mixdq_workflow.json
ormixdq.png
inComfyUI/custom_nodes/MixDQ/workflow
- To display the text, another custom node is needed:
pythongosssss/ComfyUI-Custom-Scripts
, you can git clone the repo inComfyUI/custom_nodes/
. It is recommended to git cloneComfyUI-Manager
, with which, you can install other nodes easily. - The image will be generated in both quantilized and nonquantilized method, and storage size and latency will be shown.
There are four nodes provided in this project.
- Load Pipeline: Load the model and transfer pipeline to other nodes.
- MixdqQuant: Prompt as input, use the pipeline and quant the model to W8A8 to generate images. The output image and memory usage will be displayed as output.
- OrgGen: Prompt as input, use the original model to generate images for comparison. The output image and memory usage will be displayed as output.
- MixdqIntegral: Prompt as input, integration of quantitative and primitive models for inference efficiency and display of results.
The memory footprint size calculation is based on the memory footprint on CUDA. And ComfyUI's caching mechanism makes the model stay in CUDA after completing a run in order to quickly generate a new graph after changing the prompt later. So the memory footprint shown for the first time is the real situation, and you can thus compare the quantized model compressing the UNet by 2+GB.
The first model loading and quantization process takes a little bit longer, after that you can change the prompt to use the cached quantized model to quickly generate a new picture.
- Quantilized:
Static (weights) memory usage: 4 G 142 M 162 271 K 162 Bytes (4238.264802932739 MBs)
Dynamic (acts) memory usage:1 G 78 M 962 217 K 962 Bytes (1102.2128314971924 MBs)
Peak (total) memory usage:5 G 291 M 271 505 K 271 Bytes (5411.49342250824 MBs)
- Non:
Static (weights) memory usage: 6 G 480 M 166 419 K 166 Bytes (6624.4093379974365 MBs)
Dynamic (acts) memory usage:1 G 78 M 962 217 K 962 Bytes (1102.2128314971924 MBs)
Peak (total) memory usage:7 G 621 M 275 525 K 275 Bytes (7789.512957572937 MBs)