Google Summer of Code 2025: Quartz Solar - New data source in ML model Discussion Thread #26
Replies: 5 comments 6 replies
-
Hi @AUdaltsova @Sukh-P, Thank you for organising this year’s event and putting this document together for the GSOC program. I am interested in your project and would like to know further details.
I look forward to hearing from you. Thank you for your time. |
Beta Was this translation helpful? Give feedback.
-
Hi @AUdaltsova and @Sukh-P, Thank you for sharing this fascinating project on improving solar energy forecasting models. Building on the discussion so far, I'd like to offer some additional thoughts and questions: Regarding auxiliary data sources:
On the technical approach:
I have experience implementing similar forecasting models with PyTorch, particularly focusing on:
I'd be happy to elaborate on any of these approaches or discuss other aspects of the project that might be helpful. The scaling aspect is particularly interesting - determining how generalizable these models can be across different climate zones. Looking forward to learning more about this project! |
Beta Was this translation helpful? Give feedback.
-
Hi @AUdaltsova I am keen on working on this project and thought that the idea was really fascinating. I do have some questions, before proceeding: How would we structure an ablation study to measure the independent contribution of dust levels and nearby site information to solar energy forecasting performance? What data preprocessing methods would be suitable for dealing with noisy or missing dust level data before it is used in the model? How do we adapt the current PyTorch-based PVNet model to include more features without substantially raising computational overhead? And, what extent of prior experience in data analytics is necessary for effectively contributing to this project? |
Beta Was this translation helpful? Give feedback.
-
Hi @AUdaltsova and @Sukh-P, Thank you for sharing this exciting project! I’ve been going through the details, and I have a few questions to better understand the challenges and potential directions: Since PVNet relies on a substantial amount of data for training, have you considered any techniques to address potential data limitations, such as data augmentation or transfer learning? Would these approaches be relevant in this context? Given the goal of evaluating the impact of new data sources, what criteria are used to determine whether a new feature is valuable enough to justify its inclusion in the model? Is there a structured process for feature selection, or is it more experimental and iterative? When evaluating the impact of additional data, do you typically analyze its effects across different time scales (e.g., short-term vs. long-term forecasting), or is the focus primarily on immediate predictive accuracy? Given that the project involves assessing new data sources, how do you typically balance adding complexity to the model versus maintaining interpretability? Are there any techniques you prioritize to ensure that new inputs provide meaningful contributions without overfitting? Have you considered domain adaptation techniques to improve model transferability across different geographic regions or weather conditions? Given that solar energy forecasting can be highly location-dependent, what approaches have been most effective in ensuring model generalization? I’m really interested in this project and would love to contribute meaningfully. Looking forward to your insights! |
Beta Was this translation helpful? Give feedback.
-
Hi @AUdaltsova , Thank you so much for the thorough and insightful response! I really appreciate you taking the time to explain the project’s approach and considerations in such detail. It’s great to gain a deeper understanding of the thought process behind data selection and model evaluation. If there’s anything I can do to assist or contribute, I’d be more than happy to help. Just let me know! |
Beta Was this translation helpful? Give feedback.
-
This space is for you to ask any questions you have about this project. We're here to provide clarifications and help you understand the project's goals, scope, and requirements. Feel free to ask about anything that interests you!
Please note that this discussion is for questions and clarifications, not for formal applications.
Project Description
Adding new data sources usually gives a boost to the predictive power of our models, and finding innovative ways of extracting information from them can be even more beneficial. We would like to explore ways to improve our solar energy forecast with an ablation study of how much data on features like dust or neighbouring sites can contribute to the precision of the model. The project can be scaled depending on time constraints.
Expected Outcome
A comparative analysis of the effects of auxiliary data sources on the forecast quality.
Other Key Information
Beta Was this translation helpful? Give feedback.
All reactions