Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update gaeac5 CI from Anil's PR #3419

Open
wants to merge 23 commits into
base: develop
Choose a base branch
from

Conversation

TerrenceMcGuinness-NOAA
Copy link
Collaborator

Description

This PR is some changes to Node Names for running global-workflow on Gaea C5.
I had to make this a branch from Anil K. PR because he does not have permissions to update the Jenkninsfile on the controller from his PR.

@TerrenceMcGuinness-NOAA TerrenceMcGuinness-NOAA changed the title Update gaeac5 ci anil Update gaeac5 CI from Anil's PR Mar 5, 2025
@TerrenceMcGuinness-NOAA TerrenceMcGuinness-NOAA added CI/CD Issue related to CI/CD CI-Gaeac5-Ready **CM use only** PR is ready for CI testing on Gaea C5 labels Mar 5, 2025
@emcbot emcbot added CI-Gaeac5-Building **Bot use only** CI testing is cloning/building on Gaea C5 CI-Gaeac5-Failed **Bot use only** CI testing on Gaea C5 for this PR has failed and removed CI-Gaeac5-Ready **CM use only** PR is ready for CI testing on Gaea C5 CI-Gaeac5-Building **Bot use only** CI testing is cloning/building on Gaea C5 labels Mar 5, 2025
@TerrenceMcGuinness-NOAA TerrenceMcGuinness-NOAA removed the CI-Gaeac5-Failed **Bot use only** CI testing on Gaea C5 for this PR has failed label Mar 5, 2025
@emcbot emcbot added CI-Gaeac5-Building **Bot use only** CI testing is cloning/building on Gaea C5 and removed CI-Gaeac5-Ready **CM use only** PR is ready for CI testing on Gaea C5 labels Mar 6, 2025
@emcbot
Copy link

emcbot commented Mar 6, 2025

Build FAILED on Gaeac5 in Build# 10 with error logs:

/gpfs/f5/epic/proj-shared/global/CI/3419/global-workflow/sorc/logs/ufs_utils.log

Follow link here to view the contents of the above file(s): (link)

@emcbot emcbot added CI-Gaeac5-Failed **Bot use only** CI testing on Gaea C5 for this PR has failed and removed CI-Gaeac5-Building **Bot use only** CI testing is cloning/building on Gaea C5 labels Mar 6, 2025
@AnilKumar-NOAA
Copy link
Contributor

Build FAILED on Gaeac5 in Build# 10 with error logs:

/gpfs/f5/epic/proj-shared/global/CI/3419/global-workflow/sorc/logs/ufs_utils.log

Follow link here to view the contents of the above file(s): (link)

Please take a look on hsi module @DavidHuber-NOAA & @TerrenceMcGuinness-NOAA - Related to hsi update recently in the PR here the hsi update was made in a recent PR of his: https://github.com/NOAA-EMC/global-workflow/pull/3323/files#diff-794f4dde1ed73e1fd0b0c7f4b54b7224da90bfa9225fccae305ea2c5b2ab4352

@TerrenceMcGuinness-NOAA
Copy link
Collaborator Author

@DavidHuber-NOAA come to think of David, we do not have custom build options for CI. And yes we just did a merge from develop a couple of hours ago so that may have changed the build.

@DavidHuber-NOAA
Copy link
Contributor

DavidHuber-NOAA commented Mar 7, 2025

@AnilKumar-NOAA Yes, the HSI module is available on C5, but (I think) the DTN partition needs to be updated. The GAEA ES cluster had new nodes added earlier this week and I am unsure of the new partition setup. Are you aware of how it is setup now?

@DavidHuber-NOAA
Copy link
Contributor

@AnilKumar-NOAA Apologies, I see now the issue you were referring to. I will look a little deeper.

@DavidHuber-NOAA
Copy link
Contributor

The issue is in the ufs_utils modulefiles/build.gaeac5.intel.lua:

help([[
Load environment to compile UFS_UTILS on Gaea using Intel
]])

prepend_path("MODULEPATH", "/sw/rdtn/modulefiles")
load("hsi")

This path needs to be updated to /usw/hpss/modulefiles.

@TerrenceMcGuinness-NOAA You are correct, there is no way to skip this build. Strictly speaking, I don't think this is required by the forecast-only tests, but I don't think we should try to put in a hack. Instead, I will open a UFS_Utils PR now to fix the issue upstream.

@DavidBurrows-NCO
Copy link
Contributor

@DavidHuber-NOAA Thanks for looking into this...Let me know if you want EPIC to handle that ufs-utils PR or anything else.

@DavidHuber-NOAA
Copy link
Contributor

@DavidBurrows-NCO No problem. It's just a two-line fix. I've already tested it on C5/C6. If George Gayno would like it tested further (CI testing), then I would gladly take assistance.

@DavidHuber-NOAA
Copy link
Contributor

Opened PR ufs-community/UFS_UTILS#1031.

@DavidBurrows-NCO
Copy link
Contributor

If George Gayno would like it tested further (CI testing), then I would gladly take assistance.

@DavidHuber-NOAA Sounds good. Let us know. We have a full day EPIC meeting today but will check in periodically.

@emcbot
Copy link

emcbot commented Mar 7, 2025

Build FAILED on Gaeac6 in Build# 8 with error logs:

/gpfs/f6/drsa-precip3/proj-shared/global/CI/3419/global-workflow/sorc/logs/ufs_utils.log

Follow link here to view the contents of the above file(s): (link)

@emcbot emcbot added CI-Gaeac6-Failed **Bot use only** CI testing on Gaea C6 for this PR has failed and removed CI-Gaeac6-Building **Bot use only** CI testing is cloning/building on Gaea C6 labels Mar 7, 2025
@TerrenceMcGuinness-NOAA TerrenceMcGuinness-NOAA added CI-Gaeac6-Ready **CM use only** PR is ready for CI testing on Gaea C6 and removed CI-Gaeac6-Failed **Bot use only** CI testing on Gaea C6 for this PR has failed labels Mar 7, 2025
@TerrenceMcGuinness-NOAA
Copy link
Collaborator Author

Had some issues with Rocoto in role account on Gaea C6. Restarting build to test fix.

@emcbot emcbot added CI-Gaeac6-Building **Bot use only** CI testing is cloning/building on Gaea C6 and removed CI-Gaeac6-Ready **CM use only** PR is ready for CI testing on Gaea C6 labels Mar 7, 2025
@emcbot
Copy link

emcbot commented Mar 7, 2025

Build FAILED on Gaeac6 in Build# 11 with error logs:

/gpfs/f6/drsa-precip3/proj-shared/global/CI/3419/global-workflow/sorc/logs/ufs_utils.log

Follow link here to view the contents of the above file(s): (link)

@emcbot emcbot added CI-Gaeac6-Failed **Bot use only** CI testing on Gaea C6 for this PR has failed and removed CI-Gaeac6-Building **Bot use only** CI testing is cloning/building on Gaea C6 labels Mar 7, 2025
@TerrenceMcGuinness-NOAA
Copy link
Collaborator Author

TerrenceMcGuinness-NOAA commented Mar 7, 2025

Just confirmed build fail on develop branch on Gaia C6 with ufs_utils and functional CI test passes with Jenkins updates in this PR with the same fail on this branch (see link above)

@AnilKumar-NOAA
Copy link
Contributor

Just confirmed build fail on develop branch on Gaia C6 with ufs_utils and functional CI test passes with Jenkins updates in this PR with the same fail on this branch (see link above)

Yes, hsi issues on c6 as well. Hopefully hsi PR will get merge soon and then we hopefully pass the test atleast on gaeac5.

@DavidHuber-NOAA
Copy link
Contributor

@AnilKumar-NOAA @DavidBurrows-NCO Would you like to move forward with this PR as is or wait for a UFS_Utils update? There are a couple of science updates to UFS_Utils between the current hash and what would go in after ufs-community/UFS_UTILS#1031 is merged, which would require comprehensive CI testing on the tier 1 platforms.

@DavidBurrows-NCO
Copy link
Contributor

@DavidHuber-NOAA If it's not an issue to move forward, we would prefer that. I'm afraid ufs-utils may take some time to upgrade.

Copy link
Contributor

@DavidHuber-NOAA DavidHuber-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI/CD Issue related to CI/CD CI-Gaeac5-Failed **Bot use only** CI testing on Gaea C5 for this PR has failed CI-Gaeac6-Failed **Bot use only** CI testing on Gaea C6 for this PR has failed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants