Skip to content

This repository serves as simple code for the training and evaluation of the GPT2 model. The repository supports multi-gpu DDP parallelism.

Notifications You must be signed in to change notification settings

OliverGrainge/GPT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GPT Model Training Repository

Overview

This repository contains code for training a GPT-based model. The project is structured to handle the configuration, data preparation, model definition, and training processes for a GPT model.

File Structure

  • config.py: Contains configuration details for the model and training process, such as hyperparameters, file paths, and other settings.
  • dataset.py: Manages dataset loading and preprocessing. This script is responsible for preparing the data pipeline required for model training.
  • distributed.py: Provides functionalities for distributed training, allowing the model to be trained across multiple GPUs or machines.
  • download_data.py: A utility script for downloading the necessary datasets or external files for training.
  • model.py: Defines the architecture of the GPT model, including layers, forward passes, and other components.
  • train.py: The main script for training the model. It includes code to initialize the model, load data, and handle training loops.
  • utils.py: Contains various utility functions that are used throughout the project, such as logging, checkpoint saving, or performance metrics.

Usage

  1. Install dependencies: Ensure you have the required libraries installed by running:

    pip install -r requirements.txt
  2. Download data: Use the download_data.py script to download the necessary datasets:

    python download_data.py
  3. Configure settings: Adjust settings in config.py to suit your specific training environment, such as modifying hyperparameters, data paths, or training options.

  4. Train the model: Run the train.py script to begin training:

    python train.py
  5. Distributed training: If you are training the model across multiple GPUs or machines, ensure that distributed.py is properly configured.

Contributing

Feel free to contribute to this project by submitting a pull request or opening an issue to report bugs or suggest features.

License

This project is licensed under the MIT License.

About

This repository serves as simple code for the training and evaluation of the GPT2 model. The repository supports multi-gpu DDP parallelism.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages