GPT2 Architecture Integration #4073

dpleus · 2023-11-14T14:52:32Z

Feature Description

The idea is to be able to convert models using the GPT2 architecture into GGUF. The convert-hf-to-gguf.py should include GPT2, as well as llama.cpp for running the model.

Motivation

There are quite a few models for low resource languages or specific use cases that are fine-tuned on GPT2 architecture.

Possible Implementation

The structure of models is quite similar to Starcoder. From my understanding, you can modify it quite easily by:

convert-hf-to-gguf.py

Add a new model class
Modify the set_gguf_parameters() [kv heads] and write_tensors() [maybe you need to transpose the qkv, up-ffn and down-ffn layer] methods

llama.cpp

Add an new model class

Status

I tried implementing that myself, but am not deep enough into the topic and find it quite hard to understand the libraries structure (is there any good documentation). So, I am probably not able to pull this off by myself, but am happy to support!

dpleus · 2023-11-21T15:33:13Z

I started, but could not get it to work. The model outputs something, but just gibberish. I lack the documentation into LLama.cpp and the C++ skills to really finish this, but maybe someone has an idea on how to get it over the line. This is how far I got in my own fork.

I based my implementation mainly on the Starcoder class, because the architecture is quite similar. I took inspiration from mmnga's fork, who implemented it in an older version.

From my understanding, you need to modify the following elements in the code.

Serializing the model using convert-hf-to-gguf.py
Adding the architecture by creating a GPT2 class that contains

set_gguf_parameters() -> deviation from Starcoder is head_count_kv=n_head and add_rope_scaling_type = None
write_tensors() -> adding skipping .attn.masked_bias", ".attn.bias" and transposing ".ffn_down.weight", ".ffn_up.weight", ".attn_qkv.weight", ".attn_output,weight"
set_vocab() -> using sentence piece. Might need to change that, because I used a fine-tuned one without original GPT2 tokenizer

Adding the mappings in gguf-py/gguf/constants.py and gguf-py/gguf/tensor_mapping.py

Add the architecture in constants
Move "transformer.h.{bid}.ln_2" to ATTN_NORM_2 in tensor_mapping.py

Adjust the backend file llama.cpp

Creating the inital model architecture in class LLM_ARCH_GPT2 (entirely based on Starcoder, except for the Norm_2)
Build the graph (entirely based on Starcoder, except for the Norm_2)

Galunid · 2023-11-21T17:47:46Z

Which model are you using? I tried https://huggingface.co/gpt2 and https://huggingface.co/gpt2-medium/tree/main, but they fail to convert, once I added missing properties, they still miss output
The keys I see in my models:

h.{lid}.attn.c_attn
h.{lid}.attn.c_proj
h.{lid}.ln_1
h.{lid}.ln_2
h.{lid}.mlp.c_fc
h.{lid}.mlp.c_proj
ln_f
wpe
wte

Galunid · 2023-11-21T22:22:20Z

As a sidenote

    def set_vocab(self):
        self._set_vocab_sentencepiece()

Should most likely be

    def set_vocab(self):
        self._set_vocab_gpt2()

dpleus · 2023-11-22T13:03:29Z

@Galunid Thanks for having a look into this 👍 I first started with one of the models from AI Sweden, which is based on GPT2. But I realised they have a few specifics, so I made new commit with a few changes to make it compatible with the original GPT2.

Adjusting the tokenizer
Setting "add_feed_forward_length" to "n_inner" or "n_embd*4"
Adding the "transformer." to the layer names prefix if not given (there seems to be some inconsistencies in various GPT2 derivatives)

The other thing, as you mentioned, is the lack of an output layer. I extracted it from the model and wrote it to the safetensor file (code below) . But wasn't sure how to fit it into the codebase best.

Overall it runs through, but the output is still somewhat gibberish.

Once upon a time, the was
The I on. is the. of the it all the.
can.
and is's, is

Code to add the output layer to safetensors.

from transformers import AutoTokenizer, AutoModelForCausalLM
from safetensors.torch import save_file, safe_open
import os

model_name = "gpt2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

def safe_save_tensors(model, file_path):
    tensors = {}
    try:
        # Open the file in a safe context
        with safe_open(file_path, framework="pt", device="cpu") as f:
            for k in f.keys():
                tensors[k] = f.get_tensor(k)
            tensors["lm_head.weight"] = model.lm_head.weight

        # Save to a temporary file first
        temp_file_path = file_path + ".temp"
        save_file(tensors, temp_file_path)
        del tensors

        # Rename the temporary file
        if os.path.exists(file_path):
            os.remove(file_path)
        os.rename(temp_file_path, file_path)
    except Exception as e:
        print("An error occurred:", e)
        # Handle or log the error as needed


safe_save_tensors(model, "model/model.safetensors")

ggerganov · 2023-11-22T16:07:21Z

Would be great to add GPT2 arch to llama.cpp.
A working example is available in the ggml repo

manikbhandari · 2023-12-11T14:53:53Z

I'd like to help with this

dpleus added the enhancement New feature or request label Nov 14, 2023

ggerganov added the good first issue Good for newcomers label Nov 22, 2023

manikbhandari mentioned this issue Dec 20, 2023

Add gpt2 architecture integration #4555

Merged

Galunid closed this as completed in #4555 Dec 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPT2 Architecture Integration #4073

GPT2 Architecture Integration #4073

dpleus commented Nov 14, 2023 •

edited

Loading

dpleus commented Nov 21, 2023

Galunid commented Nov 21, 2023

Galunid commented Nov 21, 2023

dpleus commented Nov 22, 2023 •

edited

Loading

ggerganov commented Nov 22, 2023

manikbhandari commented Dec 11, 2023

GPT2 Architecture Integration #4073

GPT2 Architecture Integration #4073

Comments

dpleus commented Nov 14, 2023 • edited Loading

Feature Description

Motivation

Possible Implementation

Status

dpleus commented Nov 21, 2023

Galunid commented Nov 21, 2023

Galunid commented Nov 21, 2023

dpleus commented Nov 22, 2023 • edited Loading

ggerganov commented Nov 22, 2023

manikbhandari commented Dec 11, 2023

dpleus commented Nov 14, 2023 •

edited

Loading

dpleus commented Nov 22, 2023 •

edited

Loading