This repository contains scripts and workflows to perform large-scale dimensionality reduction tasks using HPC resources on Athena and Ares clusters. The primary goal is to enable efficient computation and parallelism for handling high-dimensional datasets. Below are the steps, commands, and setup details for this repository.
- Valid account on Athena or Ares cluster.
ssh [username]@ares.cyfronet.pl ssh [username]@athena.cyfronet.pl
- Access to dataset.
ls -l /net/pr2/projects/plgrid/plgglscclass/geometricus_embeddings/X_concatenated_all_dims.npy
- Familiarity with SLURM job scheduler.
- Install Miniconda to manage Python environments.
- Load necessary modules (specific to the cluster):
on Ares or
module load miniconda3
on Athena.module load Miniconda3
.
├── ares/ # Directory for Ares cluster scripts and outputs
│ ├── dim_red/ # Dimensionality reduction scripts and results
│ │ ├── output_1node_max_run0/ # Results for 1-node maximum configuration
│ │ ├── output_2nodes_run0/ # Results for 2-nodes configuration
│ │ ├── dim_red_1node.sh # SLURM script for 1-node
│ │ ├── dim_red_2nodes.sh # SLURM script for 2-nodes
│ │ └── geom_emb_dim_red.ipynb # Jupyter notebook for dimensionality reduction
├── athena/ # Directory for Athena cluster scripts and outputs
│ ├── dim_red/ # Dimensionality reduction scripts and results
│ │ ├── output_1node_run0/ # Results for 1-node configuration
│ │ ├── output_2nodes_run0/ # Results for 2-nodes configuration
│ │ ├── dim_red_1node.sh # SLURM script for 1-node
│ │ ├── dim_red_2nodes.sh # SLURM script for 2-nodes
│ │ └── geom_emb_dim_red.py # Python script for dimensionality reduction
└── README.md # This file
- (Recommended step) Configure conda to use your
$SCRATCH
storage space:conda config --add envs_dirs ${SCRATCH}/.conda/envs conda config --add pkgs_dirs ${SCRATCH}/.conda/pkgs
- Create a virtual environment:
conda create -n dim-reduction python=3.8 -y conda activate dim-reduction
- Install required libraries:
conda install -c conda-forge numpy pandas seaborn matplotlib scikit-learn umap-learn pacmap trimap
- Check whether you have acces to the data
ls -l /net/pr2/projects/plgrid/plgglscclass/geometricus_embeddings/X_concatenated_all_dims.npy ls -ld /net/pr2/projects/plgrid/plgglscclass/geometricus_embeddings
-
Test scripts locally to ensure compatibility before deploying on the cluster:
python ares/dim_red/geom_emb_dim_red.py
-
You can also use jupyter notebook file to test compatibility of the code (locally or in the cluster). To do it on the cluster:
- Request interactive job on the claster, e.g., on Ares:
srun --time=2:00:00 --mem=64G --cpus-per-task=16 --ntasks=1 --partition=plgrid --account=[grantname]-cpu --pty /bin/bash
- Install and run Jupyter server:
conda install -c conda-forge jupyter hostname jupyter notebook --no-browser --port=[port-number] --ip=[hostname]
Note: Use a port number greater than 1024 (e.g., 8888). Jupyter will display a connection URL like:
http://ag0009:8888/?token=your_token_here
Keep this URL handy, you will need it later.
- On your local machine:
- Open Visual Studio Code.
- Install the Remote - SSH extension by Microsoft (if not already installed).
- Press
Ctrl+Shift+P
(orCmd+Shift+P
on macOS) to open the Command Palette. - Type "Remote-SSH: Connect to Host" and select it.
- Connect to Ares using your SSH configuration.
- In VSC, navigate to
ares/dim_red
and opengeom_emb_dim_red.ipynb
file. - When prompted to select a kernel, choose "Existing Jupyter Server".
- Paste the Jupyter connection URL you obtained earlier (e.g.,
http://ag0009:8888/?token=your_token_here
).
Save the following script as ares/dim_red/dim_red_1node.sh
:
#!/bin/bash
#SBATCH --job-name=geom_emb_dim_red_1node
#SBATCH --output=dim_red_1node_%j.out
#SBATCH --error=dim_red_1node_%j.err
#SBATCH --time=72:00:00
#SBATCH --partition=plgrid
#SBATCH --account=[grantname]-cpu
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=48
#SBATCH --mem=184800
module load miniconda3
conda init
eval "$(conda shell.bash hook)"
conda activate dim-reduction
python geom_emb_dim_red.py
Save the following script as athena/dim_red/dim_red_2nodes.sh
:
#!/bin/bash
#SBATCH --job-name=dim_red_2nodes
#SBATCH --output=dim_red_2nodes_%j.out
#SBATCH --error=dim_red_2nodes_%j.err
#SBATCH --time=48:00:00
#SBATCH --partition=plgrid-gpu-a100
#SBATCH --account=[grantname]-gpu-a100
#SBATCH --nodes=2
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=16
#SBATCH --mem=800G
#SBATCH --gres=gpu:1
module load miniconda3
conda init
eval "$(conda shell.bash hook)"
conda activate dim-reduction
python geom_emb_dim_red.py
Submit the job using:
sbatch dim_red/dim_red_1node.sh
sbatch dim_red/dim_red_2nodes.sh
- Check all jobs status:
squeue
- Check your jobs status:
squeue -u $USER
- View detailed job information:
sacct -j <job_id>
- Cancel the job:
scancel <job_id>
- Inspect logs:
less dim_red/dim_red_1node_<job_id>.out less dim_red/dim_red_2nodes_<job_id>.out
- Modify memory and time requirements in the SLURM script according to the size of your dataset.
- Use multi-node setups for larger datasets and adjust
#SBATCH
directives accordingly. - The first time you run conda it might be necessary to initialize it with command
conda init bash
(after which the shell needs to be reloaded).
This project is licensed under the MIT License. See the LICENSE
file for details.