🚀 ML Project Template

A modern template for machine learning experimentation using wandb, hydra-zen, and submitit on a Slurm cluster with Docker/Apptainer containerization.

Note: This template is optimized for the ML Group cluster setup but can be easily adapted to similar environments.

✨ Key Features

📦 Python environment in Docker via uv
📊 Logging and visualizations via Weights and Biases
🧩 Reproducibility and modular type-checked configs via hydra-zen
🖥️ Submit Slurm jobs and parameter sweeps directly from Python via submitit
🔄 No .def or .sh files needed for Apptainer/Slurm

📋 Table of Contents

Container Setup
- Option 1: Apptainer
- Option 2: Docker
Package Management
Updating the Docker Image
Container Registry Authentication
Development Notes
Running Experiments
Contributions
Acknowledgements

🐳 Container Setup

Choose one of the following methods to set up your environment:

Option 1: Apptainer

Configure environment bindings

Add to your .zshrc or .bashrc:

export APPTAINER_BIND=/opt/slurm-23.2,/opt/slurm,/etc/slurm,/etc/munge,/var/log/munge,/var/run/munge,/lib/x86_64-linux-gnu
export APPTAINERENV_APPEND_PATH=/opt/slurm/bin:/opt/slurm/sbin

Install VSCode Command Line Interface (Optional)

This step is required if you plan to create a remote tunnel. First, install the Remote Tunnels extentsion in VSCode.

Connect to compute resources

For CPU resources:

srun --partition=cpu-2h --pty bash

For GPU resources:

srun --partition=gpu-2h --gpus-per-task=1 --pty bash

Launch container

To open a tunnel to connect you local VSCode to the container on the cluster:
```
apptainer run --nv --writable-tmpfs docker://ghcr.io/marvinsxtr/ml-project-template:main code tunnel
```
In VSCode press Shift+Alt+P (Windows/Linux) or Shift+Cmd+P (Mac), type connect to tunnel, select GitHub and select your named node on the cluster. Your IDE is now connected to the cluster.

To open a shell in the container on the cluster:
```
apptainer run --nv --writable-tmpfs docker://ghcr.io/marvinsxtr/ml-project-template:main /bin/bash
```
💡 This may take a few minutes on the first run as the container image is downloaded.

Option 2: Docker

Run the container directly with:

docker run -it --rm --platform=linux/amd64 ghcr.io/marvinsxtr/ml-project-template:main /bin/bash

💡 You can specify a version tag (e.g., v0.0.1) instead of main. Available versions are listed at GitHub Container Registry.

📦 Package Management

This project uses uv for Python dependency management.

Adding or Updating Dependencies

Inside the container (e.g., VSCode shell with Docker Container):

# Add a specific package
uv add <package-name>

# Update all dependencies from pyproject.toml or requirements.txt
uv sync

🔄 Updating the Docker Image

Update dependencies using uv as described above

Commit changes to the repository:

Use tags for versioning:

git add pyproject.toml uv.lock 
git commit -m "Updated dependencies"
git tag v0.0.1
git push && git push --tags

Use the updated image:

The GitHub Actions workflow automatically builds a new image when changes are pushed.

With Apptainer:

apptainer run --nv --writable-tmpfs docker://ghcr.io/marvinsxtr/ml-project-template:v0.0.1 /bin/bash

With Docker:

docker run -it --rm --platform=linux/amd64 ghcr.io/marvinsxtr/ml-project-template:v0.0.1 /bin/bash

🔑 Container Registry Authentication

Generate Token

Create a new GitHub token at Settings → Developer settings → Personal access tokens with:
- read:packages permission
- write:packages permission

Log In

With Apptainer:

apptainer remote login --username <your GitHub username> docker://ghcr.io

When prompted, enter your token as the password.

With Docker:

echo <your GitHub token> | docker login ghcr.io -u <your GitHub username> --password-stdin

🛠️ Development Notes

Building Locally for Testing

Test your Dockerfile locally before pushing:

docker buildx build -t ml-project-template .

🧪 Running Experiments

WandB Logging

Logging to WandB is optional for local jobs but mandatory for jobs submitted to the cluster.

Create a .env file in the root of the repository with:

WANDB_API_KEY=your_api_key_here
WANDB_ENTITY=your_entity
WANDB_PROJECT=your_project_name

Local Execution

Run a script locally with:

python src/ml_project_template/runs/main.py

Hydra will automatically generate a config.yaml in the outputs/<date>/<time>/.hydra folder which you can use to reproduce the same run later.

To enable WandB logging:

python src/ml_project_template/runs/main.py cfg/wandb=base

For WandB offline mode:

python src/ml_project_template/runs/main.py cfg/wandb=base cfg.wandb.mode=offline

Single Job

To run a job on the cluster:

python src/ml_project_template/runs/main.py cfg/job=base

This will automatically enable WandB logging. See src/ml_project_template/configs/runs/base.py to configure the job settings.

Distributed Sweep

Run a parameter sweep over multiple seeds using multiple nodes:

python src/ml_project_template/runs/main.py cfg/job=sweep

This will automatically enable WandB logging. See src/ml_project_template/configs/runs/base.py to configure sweep parameters.

👥 Contributions

Contributions to this documentation and template are very welcome! Feel free to open a PR or reach out with suggestions.

🙏 Acknowledgements

This template is based on a previous example project.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.github/workflows		.github/workflows
src/ml_project_template		src/ml_project_template
.devcontainer.json		.devcontainer.json
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 ML Project Template

✨ Key Features

📋 Table of Contents

🐳 Container Setup

Option 1: Apptainer

Option 2: Docker

📦 Package Management

Adding or Updating Dependencies

🔄 Updating the Docker Image

🔑 Container Registry Authentication

Generate Token

Log In

🛠️ Development Notes

Building Locally for Testing

🧪 Running Experiments

WandB Logging

Local Execution

Single Job

Distributed Sweep

👥 Contributions

🙏 Acknowledgements

About

Releases

Packages

Contributors 2

Languages

License

marvinsxtr/ml-project-template

Folders and files

Latest commit

History

Repository files navigation

🚀 ML Project Template

✨ Key Features

📋 Table of Contents

🐳 Container Setup

Option 1: Apptainer

Option 2: Docker

📦 Package Management

Adding or Updating Dependencies

🔄 Updating the Docker Image

🔑 Container Registry Authentication

Generate Token

Log In

🛠️ Development Notes

Building Locally for Testing

🧪 Running Experiments

WandB Logging

Local Execution

Single Job

Distributed Sweep

👥 Contributions

🙏 Acknowledgements

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages