Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Training Fails on CPU-Only Machine Due to GPU Misconfiguration #13

Closed
sumana-2705 opened this issue Feb 26, 2025 · 5 comments
Closed
Labels
bug Something isn't working

Comments

@sumana-2705
Copy link

Describe the bug

When attempting to train the model on a CPU-only laptop using python sat_pred/train.py, I encountered the following error:

Image

To Reproduce

  1. Run the training script on a machine without a GPU:
python sat_pred/train.py
  1. The script throws an error, stating that no supported GPU backend is found.

Expected behavior

The model should train successfully on a CPU-only machine.

Additional context

To verify if a GPU is required, I tested the following code separately:

Image

@sumana-2705 sumana-2705 added the bug Something isn't working label Feb 26, 2025
@sumana-2705
Copy link
Author

Hello @dfulu, can you please help me with this.

@dfulu
Copy link
Member

dfulu commented Feb 28, 2025

Hey @sumana-2705, I think your issue is with the arguments you've passed into pytorch lightning. The devices argument is used to specify which (when passed as a list e.g. devices=[0,1,2]), or how many (when passed as an int e.g devices=1) GPUs will be used. I think you should just not set that parameter, or set it to "auto".

@vinay752
Copy link

vinay752 commented Feb 28, 2025

Hi @sumana-2705 , hope you are doing great. I encountered the same problem but I tried installing torch with CUDA and that worked for me. CUDA is developed by NVIDIA, so must have a Windows or Linux based computer that has NVIDIA GPU and does not work on MacBooks with Apple Silicon chips. CUDA is the language of CPU actively communicates with GPU. If your computer has NVIDIA based GPU try this command :

conda install pytorch=2.5.1 torchvision torchaudio pytorch-cuda=11.8 -c pytorch-test -c nvidia

I recommend you to use torch 2.5.1, since Cloudcasting requires pytorch version >=2.3.0. Before installing CUDA, I constantly got the error : MisconfigurationException('No supported gpu backend found!'). But after installing it the program was running fine.

Image



After all i got an error :

raise FileNotFoundError(FileNotFoundError: No such file or directory: 'C:\mnt\disks\sat_data\sat_data_all\2008_training_nonhrv.zarr'.

I assume we don't have access for all the files or I got to change the path for training data. But I think the data is available publicly on google cloud that can be found here :

gsutil -m cp -r "gs://public-datasets-eumetsat-solar-forecasting/satellite/EUMETSAT/SEVIRI_RSS/v4" 

Downloading the complete dataset will take days with my internet speed due to its large size. I tried to runnig the program with partial amount of data and got an error due to insufficient amount of data regarding

File "C:\Users\vinay\anaconda3\envs\sat_pred\lib\site-packages\cloudcasting\utils.py", line 76, in find_contiguous_time_periods assert len(datetimes) > 0 AssertionError

And here are a few graphs from W&B:

Image

@sumana-2705
Copy link
Author

Hello @vinay752 and @dfulu,

Thank you for your response and for your concern, @vinay752. My PC doesn’t have a GPU, so I wanted to double-check whether the project could run efficiently on a CPU. Since working with large ML models often requires GPU support, I’ve decided to explore other areas where I can contribute meaningfully.

I find Open Climate Fix’s work really exciting, and the Cloudcasting project particularly interests me. I also have a strong background in UI/UX, so I’d love to contribute to the Cloudcasting UI project, as I unfortunately cannot contribute to the Cloudcasting ML project. Looking forward to learning more and seeing how I can help!

@vinay752
Copy link

vinay752 commented Mar 4, 2025

Hi @sumana-2705 ,

If you are interested in contributing, I recommend you to explore Google cloud platform or Microsoft Azure. Both of these platform provides free credits when you register as a new user and you can use them towards Compute Engines/GPUs. Also, both the platforms offer a free tier, that provides limited amount of free services.

I hope this helps. Thank You!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants