Large-scale dimensionality reduction on HPC clusters

This repository contains scripts and workflows to perform large-scale dimensionality reduction tasks using HPC resources on Athena and Ares clusters. The primary goal is to enable efficient computation and parallelism for handling high-dimensional datasets. Below are the steps, commands, and setup details for this repository.

Prerequisites

Requirements

Valid account on Athena or Ares cluster.

ssh [username]@ares.cyfronet.pl
ssh [username]@athena.cyfronet.pl

Access to dataset.

ls -l /net/pr2/projects/plgrid/plgglscclass/geometricus_embeddings/X_concatenated_all_dims.npy

Familiarity with SLURM job scheduler.

Environment Setup

Install Miniconda to manage Python environments.
Load necessary modules (specific to the cluster):
```
module load miniconda3
```
on Ares or
```
module load Miniconda3
```
on Athena.

Repository Structure

.
├── ares/                           # Directory for Ares cluster scripts and outputs
│   ├── dim_red/                    # Dimensionality reduction scripts and results
│   │   ├── output_1node_max_run0/  # Results for 1-node maximum configuration
│   │   ├── output_2nodes_run0/     # Results for 2-nodes configuration
│   │   ├── dim_red_1node.sh        # SLURM script for 1-node
│   │   ├── dim_red_2nodes.sh       # SLURM script for 2-nodes
│   │   └── geom_emb_dim_red.ipynb  # Jupyter notebook for dimensionality reduction
├── athena/                         # Directory for Athena cluster scripts and outputs
│   ├── dim_red/                    # Dimensionality reduction scripts and results
│   │   ├── output_1node_run0/      # Results for 1-node configuration
│   │   ├── output_2nodes_run0/     # Results for 2-nodes configuration
│   │   ├── dim_red_1node.sh        # SLURM script for 1-node
│   │   ├── dim_red_2nodes.sh       # SLURM script for 2-nodes
│   │   └── geom_emb_dim_red.py     # Python script for dimensionality reduction
└── README.md                       # This file

Workflow and Commands

1. Setting Up the Python Environment

(Recommended step) Configure conda to use your $SCRATCH storage space:

conda config --add envs_dirs ${SCRATCH}/.conda/envs 
conda config --add pkgs_dirs ${SCRATCH}/.conda/pkgs

Create a virtual environment:

conda create -n dim-reduction python=3.8 -y
conda activate dim-reduction

Install required libraries:

conda install -c conda-forge numpy pandas seaborn matplotlib scikit-learn umap-learn pacmap trimap

2. Preparing the Input Data

Check whether you have acces to the data

ls -l /net/pr2/projects/plgrid/plgglscclass/geometricus_embeddings/X_concatenated_all_dims.npy
ls -ld /net/pr2/projects/plgrid/plgglscclass/geometricus_embeddings

3. Running Dimensionality Reduction Locally

Test scripts locally to ensure compatibility before deploying on the cluster:
```
python ares/dim_red/geom_emb_dim_red.py
```
You can also use jupyter notebook file to test compatibility of the code (locally or in the cluster). To do it on the cluster:
1. Request interactive job on the claster, e.g., on Ares:
```
srun --time=2:00:00 --mem=64G --cpus-per-task=16 --ntasks=1 --partition=plgrid --account=[grantname]-cpu --pty /bin/bash
```
1. Install and run Jupyter server:
```
conda install -c conda-forge jupyter
hostname
jupyter notebook --no-browser --port=[port-number] --ip=[hostname]
```
Note: Use a port number greater than 1024 (e.g., 8888). Jupyter will display a connection URL like:
```
http://ag0009:8888/?token=your_token_here
```
Keep this URL handy, you will need it later.
1. On your local machine:
- Open Visual Studio Code.
- Install the Remote - SSH extension by Microsoft (if not already installed).
- Press Ctrl+Shift+P (or Cmd+Shift+P on macOS) to open the Command Palette.
- Type "Remote-SSH: Connect to Host" and select it.
- Connect to Ares using your SSH configuration.
- In VSC, navigate to ares/dim_red and open geom_emb_dim_red.ipynb file.
- When prompted to select a kernel, choose "Existing Jupyter Server".
- Paste the Jupyter connection URL you obtained earlier (e.g., http://ag0009:8888/?token=your_token_here).

4. Running on HPC Clusters

Example SLURM Script for Ares (1 Node)

Save the following script as ares/dim_red/dim_red_1node.sh:

#!/bin/bash
#SBATCH --job-name=geom_emb_dim_red_1node
#SBATCH --output=dim_red_1node_%j.out
#SBATCH --error=dim_red_1node_%j.err
#SBATCH --time=72:00:00
#SBATCH --partition=plgrid
#SBATCH --account=[grantname]-cpu
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=48
#SBATCH --mem=184800

module load miniconda3
conda init
eval "$(conda shell.bash hook)"
conda activate dim-reduction

python geom_emb_dim_red.py

Example SLURM Script for Athena (2 Nodes)

Save the following script as athena/dim_red/dim_red_2nodes.sh:

#!/bin/bash
#SBATCH --job-name=dim_red_2nodes
#SBATCH --output=dim_red_2nodes_%j.out
#SBATCH --error=dim_red_2nodes_%j.err
#SBATCH --time=48:00:00
#SBATCH --partition=plgrid-gpu-a100
#SBATCH --account=[grantname]-gpu-a100
#SBATCH --nodes=2
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=16
#SBATCH --mem=800G
#SBATCH --gres=gpu:1

module load miniconda3
conda init
eval "$(conda shell.bash hook)"
conda activate dim-reduction

python geom_emb_dim_red.py

Submitting the Job

Submit the job using:

sbatch dim_red/dim_red_1node.sh
sbatch dim_red/dim_red_2nodes.sh

5. Monitoring Job Progress

Check all jobs status:
```
squeue
```
Check your jobs status:
```
squeue -u $USER
```
View detailed job information:
```
sacct -j <job_id>
```
Cancel the job:
```
scancel <job_id>
```

Inspect logs:

less dim_red/dim_red_1node_<job_id>.out
less dim_red/dim_red_2nodes_<job_id>.out

Links and Resources

Notes

Modify memory and time requirements in the SLURM script according to the size of your dataset.
Use multi-node setups for larger datasets and adjust #SBATCH directives accordingly.
The first time you run conda it might be necessary to initialize it with command conda init bash (after which the shell needs to be reloaded).

Authors and Acknowledgments

Developed by .
Thanks to Cyfronet HPC team for their documentation.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Large-scale dimensionality reduction on HPC clusters

Prerequisites

Requirements

Environment Setup

Repository Structure

Workflow and Commands

1. Setting Up the Python Environment

2. Preparing the Input Data

3. Running Dimensionality Reduction Locally

4. Running on HPC Clusters

Example SLURM Script for Ares (1 Node)

Example SLURM Script for Athena (2 Nodes)

Submitting the Job

5. Monitoring Job Progress

Links and Resources

Notes

Authors and Acknowledgments

License

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
ares		ares
athena/dim_red		athena/dim_red
LICENSE		LICENSE
README.md		README.md

License

purbancz/large-scale-dim-red

Folders and files

Latest commit

History

Repository files navigation

Large-scale dimensionality reduction on HPC clusters

Prerequisites

Requirements

Environment Setup

Repository Structure

Workflow and Commands

1. Setting Up the Python Environment

2. Preparing the Input Data

3. Running Dimensionality Reduction Locally

4. Running on HPC Clusters

Example SLURM Script for Ares (1 Node)

Example SLURM Script for Athena (2 Nodes)

Submitting the Job

5. Monitoring Job Progress

Links and Resources

Notes

Authors and Acknowledgments

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages