Skip to content

Commit f0b6396

Browse files
committed
Init
1 parent 6650321 commit f0b6396

File tree

143 files changed

+9037
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

143 files changed

+9037
-0
lines changed

ArtGPT-4.pdf

662 KB
Binary file not shown.

LICENSE.md

+14
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
BSD 3-Clause License
2+
3+
Copyright 2023 Deyao Zhu
4+
All rights reserved.
5+
6+
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
7+
8+
1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
9+
10+
2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
11+
12+
3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
13+
14+
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

LICENSE_Lavis.md

+14
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
BSD 3-Clause License
2+
3+
Copyright (c) 2022 Salesforce, Inc.
4+
All rights reserved.
5+
6+
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
7+
8+
1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
9+
10+
2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
11+
12+
3. Neither the name of Salesforce.com nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
13+
14+
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

PrepareVicuna.md

+35
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
## How to Prepare Vicuna Weight
2+
Vicuna is an open-source LLAMA-based LLM that has a performance close to ChatGPT.
3+
We currently use the v0 version of Vicuna-13B.
4+
5+
To prepare Vicuna’s weight, first download Vicuna’s **delta** weight from [https://huggingface.co/lmsys/vicuna-13b-delta-v0](https://huggingface.co/lmsys/vicuna-13b-delta-v0).
6+
In case you have git-lfs installed (https://git-lfs.com), this can be done by
7+
8+
```
9+
git lfs install
10+
git clone https://huggingface.co/lmsys/vicuna-13b-delta-v0 # more powerful, need at least 24G gpu memory
11+
# or
12+
git clone https://huggingface.co/lmsys/vicuna-7b-delta-v0 # smaller, need 12G gpu memory
13+
```
14+
15+
Note that this is not directly the working weight, but the difference between the working weight and the original weight of LLAMA-13B. (Due to LLAMA’s rules, we cannot distribute the weight of LLAMA.)
16+
17+
Then, you need to obtain the original LLAMA-7B or LLAMA-13B weights in the HuggingFace format
18+
either following the instruction provided by HuggingFace
19+
[here](https://huggingface.co/docs/transformers/main/model_doc/llama) or from the Internet.
20+
21+
When these two weights are ready, we can use tools from Vicuna’s team to create the real working weight.
22+
First, Install their library that is compatible with v0 Vicuna by
23+
24+
```
25+
pip install git+https://github.com/lm-sys/FastChat.git@v0.1.10
26+
```
27+
28+
Then, run the following command to create the final working weight
29+
30+
```
31+
python -m fastchat.model.apply_delta --base /path/to/llama-13bOR7b-hf/ --target /path/to/save/working/vicuna/weight/ --delta /path/to/vicuna-13bOR7b-delta-v0/
32+
```
33+
34+
Now you are good to go!
35+

README.md

+116
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,116 @@
1+
# ArtGPT-4: Artistic Vision-Language Understanding with Adapter-enhanced MiniGPT-4
2+
[Zhengqing Yuan](https://orcid.org/0000-0002-4870-8492)*, [Huiwen Xue]()*, [Xinyi Wang]()*, [Yongming Liu](https://www.semanticscholar.org/author/Yongming-Liu/2130184867)*, [Zhuanzhe Zhao](https://www.semanticscholar.org/author/Zhuanzhe-Zhao/2727550)*, and [Kun Wang](https://www.ahpu.edu.cn/jsjyxxgc/2023/0220/c5472a187109/page.htm)*. *Equal Contribution
3+
4+
**Anhui Polytechnic University, Soochow University**
5+
6+
<a href='https://artgpt-4.github.io'><img src='https://img.shields.io/badge/Project-Page-Green'></a> <a href='ArtGPT_4.pdf'><img src='https://img.shields.io/badge/Paper-PDF-red'></a>
7+
<!-- <a href='https://huggingface.co/spaces/Vision-CAIR/minigpt4'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue'></a> <a href='https://huggingface.co/Vision-CAIR/MiniGPT-4'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-blue'></a> [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1OK4kYsZphwt5DXchKkzMBjYF6jnkqh4R?usp=sharing) [![YouTube](https://badges.aleen42.com/src/youtube.svg)](https://www.youtube.com/watch?v=__tftoxpBAw&feature=youtu.be) -->
8+
9+
10+
11+
## Online Demo
12+
13+
Click the image to chat with MiniGPT-4 around your images
14+
[![demo](figs/online_demo.png)](https://artgpt-4.github.io)
15+
16+
17+
## Examples
18+
| | |
19+
:-------------------------:|:-------------------------:
20+
![find wild](figs\examples\Art1.png) | ![write story](figs\examples\Art2.png)
21+
![solve problem](figs\examples\Art3-G.png)
22+
23+
More examples can be found in the [project page](https://artgpt-4.github.io).
24+
25+
26+
27+
## Introduction
28+
- ArtGPT-4 is a novel model that builds upon the architecture of MiniGPT-4 by incorporating tailored linear layers and activation functions into Vicuna, specifically designed to optimize the model's performance in vision-language tasks.
29+
- The modifications made to Vicuna in ArtGPT-4 enable the model to better capture intricate details and understand the meaning of artistic images, resulting in improved image understanding compared to the original MiniGPT-4 model.
30+
- To address this issue and improve usability, we propose a novel way to create high-quality image-text pairs by the model itself and ChatGPT together. Based on this, we then create a small (3500 pairs in total) yet high-quality dataset.
31+
- ArtGPT-4 was trained using about 200 GB of image-text pairs on a Tesla A100 device in just 2 hours, demonstrating impressive efficiency and effectiveness in training.
32+
- In addition to improved image understanding, ArtGPT-4 is capable of generating visual code, including aesthetically pleasing HTML/CSS web pages, with a more artistic flair.
33+
34+
35+
![overview](figs\examples\TBLOC.png)
36+
37+
38+
## Getting Started
39+
### Installation
40+
41+
**1. Prepare the code and the environment**
42+
43+
Git clone our repository, creating a python environment and ativate it via the following command
44+
45+
```bash
46+
git clone https://github.com/DLYuanGod/ArtGPT-4.git
47+
cd ArtGPT-4
48+
conda env create -f environment.yml
49+
conda activate artgpt4
50+
```
51+
52+
53+
**2. Prepare the pretrained Vicuna weights**
54+
55+
The current version of MiniGPT-4 is built on the v0 versoin of Vicuna-13B.
56+
Please refer to our instruction [here](PrepareVicuna.md)
57+
to prepare the Vicuna weights.
58+
The final weights would be in a single folder in a structure similar to the following:
59+
60+
```
61+
vicuna_weights
62+
├── config.json
63+
├── generation_config.json
64+
├── pytorch_model.bin.index.json
65+
├── pytorch_model-00001-of-00003.bin
66+
...
67+
```
68+
69+
Then, set the path to the vicuna weight in the model config file
70+
[here](minigpt4/configs/models/minigpt4.yaml#L16) at Line 16.
71+
72+
**3. Prepare the pretrained MiniGPT-4 checkpoint**
73+
[Downlad](https://drive.google.com/file/d/1a4zLvaiDBr-36pasffmgpvH5P7CKmpze/view?usp=share_link)
74+
75+
76+
Then, set the path to the pretrained checkpoint in the evaluation config file
77+
in [eval_configs/minigpt4_eval.yaml](eval_configs/minigpt4_eval.yaml#L10) at Line 11.
78+
79+
80+
81+
### Launching Demo Locally
82+
83+
Try out our demo [demo.py](demo.py) on your local machine by running
84+
85+
```
86+
python demo.py --cfg-path eval_configs/minigpt4_eval.yaml --gpu-id 0
87+
```
88+
89+
90+
### Training
91+
The training of ArtGPT-4 contains two alignment stages. The training process for the step is consistent with that of [MiniGPT-4](https://minigpt-4.github.io/).
92+
93+
**Datasets**
94+
We use [Laion-aesthetic](https://github.com/LAION-AI/laion-datasets/blob/main/laion-aesthetic.md) from the LAION-5B dataset, which amounts to approximately 200GB for the first 302 tar files.
95+
96+
97+
98+
## Acknowledgement
99+
100+
+ [MiniGPT-4](https://minigpt-4.github.io/) Our work is based on improvements to the model.
101+
102+
103+
If you're using ArtGPT-4 in your research or applications, please cite using this BibTeX:
104+
```bibtex
105+
@article{yuan2023artgpt4,
106+
title={ArtGPT-4: Artistic Vision-Language Understanding with Adapter-enhanced MiniGPT-4},
107+
author={Yuan, Zhengqng and Xue, Huiwen and Wang, Xinyi and Liu, Yongming and Zhao, zhuanzhe and Wang, Kun},
108+
year={2023}
109+
}
110+
```
111+
112+
113+
## License
114+
This repository is under [BSD 3-Clause License](LICENSE.md).
115+
Many codes are based on [Lavis](https://github.com/salesforce/LAVIS) with
116+
BSD 3-Clause License [here](LICENSE_Lavis.md).

dataset/README_1_STAGE.md

+96
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,96 @@
1+
## Download the filtered Conceptual Captions, SBU, LAION datasets
2+
3+
### Pre-training datasets download:
4+
We use the filtered synthetic captions prepared by BLIP. For more details about the dataset, please refer to [BLIP](https://github.com/salesforce/BLIP).
5+
6+
It requires ~2.3T to store LAION and CC3M+CC12M+SBU datasets
7+
8+
Image source | Filtered synthetic caption by ViT-L
9+
--- | :---:
10+
CC3M+CC12M+SBU | <a href="https://storage.googleapis.com/sfr-vision-language-research/BLIP/datasets/ccs_synthetic_filtered_large.json">Download</a>
11+
LAION115M | <a href="https://storage.googleapis.com/sfr-vision-language-research/BLIP/datasets/laion_synthetic_filtered_large.json">Download</a>
12+
13+
This will download two json files
14+
```
15+
ccs_synthetic_filtered_large.json
16+
laion_synthetic_filtered_large.json
17+
```
18+
19+
## prepare the data step-by-step
20+
21+
22+
### setup the dataset folder and move the annotation file to the data storage folder
23+
```
24+
export MINIGPT4_DATASET=/YOUR/PATH/FOR/LARGE/DATASET/
25+
mkdir ${MINIGPT4_DATASET}/cc_sbu
26+
mkdir ${MINIGPT4_DATASET}/laion
27+
mv ccs_synthetic_filtered_large.json ${MINIGPT4_DATASET}/cc_sbu
28+
mv laion_synthetic_filtered_large.json ${MINIGPT4_DATASET}/laion
29+
```
30+
31+
### Convert the scripts to data storate folder
32+
```
33+
cp convert_cc_sbu.py ${MINIGPT4_DATASET}/cc_sbu
34+
cp download_cc_sbu.sh ${MINIGPT4_DATASET}/cc_sbu
35+
cp convert_laion.py ${MINIGPT4_DATASET}/laion
36+
cp download_laion.sh ${MINIGPT4_DATASET}/laion
37+
```
38+
39+
40+
### Convert the laion and cc_sbu annotation file format to be img2dataset format
41+
```
42+
cd ${MINIGPT4_DATASET}/cc_sbu
43+
python convert_cc_sbu.py
44+
45+
cd ${MINIGPT4_DATASET}/laion
46+
python convert_laion.py
47+
```
48+
49+
### Download the datasets with img2dataset
50+
```
51+
cd ${MINIGPT4_DATASET}/cc_sbu
52+
sh download_cc_sbu.sh
53+
cd ${MINIGPT4_DATASET}/laion
54+
sh download_laion.sh
55+
```
56+
57+
58+
The final dataset structure
59+
60+
```
61+
.
62+
├── ${MINIGPT4_DATASET}
63+
│ ├── cc_sbu
64+
│ ├── convert_cc_sbu.py
65+
│ ├── download_cc_sbu.sh
66+
│ ├── ccs_synthetic_filtered_large.json
67+
│ ├── ccs_synthetic_filtered_large.tsv
68+
│ └── cc_sbu_dataset
69+
│ ├── 00000.tar
70+
│ ├── 00000.parquet
71+
│ ...
72+
│ ├── laion
73+
│ ├── convert_laion.py
74+
│ ├── download_laion.sh
75+
│ ├── laion_synthetic_filtered_large.json
76+
│ ├── laion_synthetic_filtered_large.tsv
77+
│ └── laion_dataset
78+
│ ├── 00000.tar
79+
│ ├── 00000.parquet
80+
│ ...
81+
...
82+
```
83+
84+
85+
## Set up the dataset configuration files
86+
87+
Then, set up the LAION dataset loading path in
88+
[here](../minigpt4/configs/datasets/laion/defaults.yaml#L5) at Line 5 as
89+
${MINIGPT4_DATASET}/laion/laion_dataset/{00000..10488}.tar
90+
91+
and the Conceptual Captoin and SBU datasets loading path in
92+
[here](../minigpt4/configs/datasets/cc_sbu/defaults.yaml#L5) at Line 5 as
93+
${MINIGPT4_DATASET}/cc_sbu/cc_sbu_dataset/{00000..01255}.tar
94+
95+
96+

dataset/README_2_STAGE.md

+19
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
## Second Stage Data Preparation
2+
3+
Our second stage dataset can be downloaded from
4+
[here](https://drive.google.com/file/d/1nJXhoEcy3KTExr17I7BXqY5Y9Lx_-n-9/view?usp=share_link)
5+
After extraction, you will get a data follder with the following structure:
6+
7+
```
8+
cc_sbu_align
9+
├── filter_cap.json
10+
└── image
11+
├── 2.jpg
12+
├── 3.jpg
13+
...
14+
```
15+
16+
Put the folder to any path you want.
17+
Then, set up the dataset path in the dataset config file
18+
[here](../minigpt4/configs/datasets/cc_sbu/align.yaml#L5) at Line 5.
19+

dataset/convert_cc_sbu.py

+20
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
import json
2+
import csv
3+
4+
# specify input and output file paths
5+
input_file = 'ccs_synthetic_filtered_large.json'
6+
output_file = 'ccs_synthetic_filtered_large.tsv'
7+
8+
# load JSON data from input file
9+
with open(input_file, 'r') as f:
10+
data = json.load(f)
11+
12+
# extract header and data from JSON
13+
header = data[0].keys()
14+
rows = [x.values() for x in data]
15+
16+
# write data to TSV file
17+
with open(output_file, 'w') as f:
18+
writer = csv.writer(f, delimiter='\t')
19+
writer.writerow(header)
20+
writer.writerows(rows)

dataset/convert_laion.py

+20
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
import json
2+
import csv
3+
4+
# specify input and output file paths
5+
input_file = 'laion_synthetic_filtered_large.json'
6+
output_file = 'laion_synthetic_filtered_large.tsv'
7+
8+
# load JSON data from input file
9+
with open(input_file, 'r') as f:
10+
data = json.load(f)
11+
12+
# extract header and data from JSON
13+
header = data[0].keys()
14+
rows = [x.values() for x in data]
15+
16+
# write data to TSV file
17+
with open(output_file, 'w') as f:
18+
writer = csv.writer(f, delimiter='\t')
19+
writer.writerow(header)
20+
writer.writerows(rows)

dataset/download_cc_sbu.sh

+6
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
#!/bin/bash
2+
3+
img2dataset --url_list ccs_synthetic_filtered_large.tsv --input_format "tsv"\
4+
--url_col "url" --caption_col "caption" --output_format webdataset\
5+
--output_folder cc_sbu_dataset --processes_count 16 --thread_count 128 --image_size 256 \
6+
--enable_wandb True

dataset/download_laion.sh

+6
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
#!/bin/bash
2+
3+
img2dataset --url_list laion_synthetic_filtered_large.tsv --input_format "tsv"\
4+
--url_col "url" --caption_col "caption" --output_format webdataset\
5+
--output_folder laion_dataset --processes_count 16 --thread_count 128 --image_size 256 \
6+
--enable_wandb True

0 commit comments

Comments
 (0)