Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change UFS_UTILS location to point to a branch in the NCAR fork that … #26

Merged

Conversation

gsketefian
Copy link
Collaborator

@gsketefian gsketefian commented Sep 25, 2020

This PR must be merged BEFORE PR #300 into NOAA-EMC/regional_workflow can be merged.

DESCRIPTION OF CHANGES:

This updates the repo and branch of the ufs_utils external in Externals.cfg. Previous repo/branch were:

repo_url = https://github.com/JeffBeck-NOAA/UFS_UTILS
branch = feature/regional_release

However, this doesn't work due to recent commits to this branch. Thus, switch to an older version of the above branch (that also has one extra small commit):

repo_url = https://github.com/NCAR/UFS_UTILS
branch = feature/regional_release_STRING

TESTS CONDUCTED:

See PR #300 into NOAA-EMC/regional_workflow) into regional_workflow for a description of tests conducted.

…contains the fix for make_solo_mosaic (needed when the full path to the grid mosaic file is too long). This fix is needed to get the WE2E tests to succeed.
@gsketefian gsketefian merged commit d8e9e31 into ufs-community:master Sep 25, 2020
gsketefian added a commit to ufs-community/regional_workflow that referenced this pull request Sep 25, 2020
This PR must be merged after PR #[26](ufs-community/ufs-srweather-app#26) into ufs-community/ufs-srweather-app.

## DESCRIPTION OF CHANGES: 
### Modifications to workflow -- these are needed mostly to make it more convenient to formulate/run the WE2E tests:
  * Introduce the new workflow variable USE_USER_STAGED_EXTRN_FILES to indicate whether or not to look in user-specified directories for external files.  Previously, this was determined by checking whether the variables EXTRN_MDL_SOURCE_DIR_ICS and EXTRN_MDL_SOURCE_DIR_LBCS are empty or not.  Now, there is an explicit flag to set.
  * Make the number of attempts for the rocoto tasks (maxtries) user-specifiable workflow variables (instead of setting them to a default of 1).
  * Allow the workflow variable EXPT_BASEDIR to be set to a relative path.  In this case, the relative path gets created under the usual default location of EXPT_BASEDIR (which is ${SR_WX_APP_TOP_DIR}/../expt_dirs).
  * Bug fix - update the configuration file name for UPP from postxconfig-NT-fv3sar.txt to postxconfig-NT-fv3lam.txt to match the name in the EMC_post repository (release branch).
### Modifications to script that runs the WE2E tests (run_experiments.sh):
  * Add new arguments expt_basedir, testset_name, and verbose.  expt_basedir and verbose can be used to override the default values of EXPT_BASEDIR and VERBOSE in the workflow.  testset_name can be used to specify a subdirectory under expt_basedir in which to place all the experiment directories.  This is convenient to isolate sets of tests in different directories.
  * Bug fix - Source the default workflow configuration file for each WE2E test, not just once at the beginning of the script.
  * Rewrite the way EXPT_SUBDIR is set so that it is set properly whether or not expt_basedir and testset_name are specified as input arguments.
  * Instead of using "cat" with heredocs to create the workflow configuration file for each test, store the contents of the configuration file in a variable (str) and write it out to a file after it is complete.
  * If running the WE2E tests on hera or cheyenne, increase maxtries for various tasks that will likely need more than one attempt (because non-reproducible bugs in codes cause the tasks to fail).
  * Remove unused code.
### Modifications to WE2E test configuration files:
  * Remove lines that set workflow variables (e.g. RUN_TASK_MAKE_GRID) to their default values in config_defaults.sh because they effectively do nothing; better to keep the test configuration files shorter.
  * Add 4 new WE2E tests that test fetching of files from various external models from NOAA HPSS.  These use the new workflow variable USE_USER_STAGED_EXTRN_FILES described above.
  * Remove lines in some WE2E configuration files that now get generated in run_experiments.sh (e.g. comments included in those configuration files that run in NCO mode).
  * Rename WE2E test configuration files to clarify their purpose, i.e. what workflow feature/capability they test.
  * Update the list of WE2E test names in baselines_list.txt.

## TESTS CONDUCTED: 
The WE2E tests were run on both hera and cheyenne.  The experiment directories can be found here:

hera:  /scratch2/BMC/det/Gerard.Ketefian/UFS_CAM/PR_feature_WE2E_testing/expt_dirs/try13
cheyenne:  /glade/scratch/ketefian/PR_feature_WE2E_testing/expt_dirs/try14

Note that:
* On both machines:
  * Test suite_FV3_CPT_v0 fails due to a problem with the namelist (likely a namelist variable name is incorrect).  This is a preexisting issue, i.e. it also occurs in the develop branch.
  * Test suite_FV3_RRFS_v1beta fails in the make_orog task because orography statistics files required by the gravity wave drag (GWD) parameterization in the RRFS_v1beta scheme aren't available.  This will be temporarily remedied by using a different GWD parameterization in this suite.  That change will happen in a future PR.
* On hera:
  * Test grid_GSD_RAP13km fails because run_post_f003 fails (even after 20 tries) while run_post_f005 hangs.  Others run_post_f### tasks that succeed take many attempts.  This did not happen with the previous version of EMC_post being used in the workflow.  Thus, it is either due to this change in EMC_post version or the change in the grid generation code (from regional_grid to regional_esg_grid).  Needs further investigation.
  * All tests other than grid_GSD_RAP13km, suite_FV3_CPT_v0, and suite_FV3_RRFS_v1beta succeed.
* On cheyenne:
  * Test grid_GSD_RAP13km fails because run_post_f005 hangs.  All other run_post_f### tasks succeed, although some take many attempts.  As on hera, this did not happen with the previous version of EMC_post being used in the workflow, so it is either due to the change in EMC_post version or the change in the grid generation code (from regional_grid to regional_esg_grid).  Needs further investigation.
  * The four new tests get_extrn_files_from_hpss_... that try to fetch the external model files from NOAA HPSS fail.  This is expected because cheyenne does not provide access to NOAA HPSS.  Thus, these tests do not need to be run on cheyenne.
  * **All tests that use the HRRRX and/or RAPX as external model files for ICs/LBCs fail in both the make_ics and make_lbcs tasks.**  
    * These tests are:  grid_GSD_HRRR_AK_50km, nco_GSD_HRRR25km_HRRRX_RAPX, nco_GSD_HRRR3km_HRRRX_RAPX, nco_GSD_SUBCONUS3km_HRRRX_RAPX, suite_FV3_GSD_SAR, suite_FV3_GSD_v0, suite_FV3_RRFS_v1beta.  (Recall that suite_FV3_RRFS_v1beta fails in the make_orog task (before the make_ics and make_lbcs tasks) because the orography statistics files needed by the FV3_RRFS_v1beta suite are not available.)
    * These failures do not happen on hera and thus are likely due to a bug in chgres_cube.  This needs further investigation.
  * All tests other than the ones mentioned above succeed.

The failures listed above are due missing input files or to possible bugs in the chgres_cube and EMC_post codes (or possibly due to the changeover to the new regional_esg_grid code, which will not be reverted), or they may indicate inconsistencies in the software versions loaded in the modulefiles on cheyenne.  They do not indicate any problems with this PR.
@gsketefian gsketefian deleted the feature/regional_release_STRING branch March 9, 2021 20:29
christinaholtNOAA pushed a commit to christinaholtNOAA/ufs-srweather-app that referenced this pull request May 20, 2021
mkavulich pushed a commit to mkavulich/ufs-srweather-app that referenced this pull request Aug 26, 2022
This PR must be merged after PR #[26](ufs-community#26) into ufs-community/ufs-srweather-app.

## DESCRIPTION OF CHANGES: 
### Modifications to workflow -- these are needed mostly to make it more convenient to formulate/run the WE2E tests:
  * Introduce the new workflow variable USE_USER_STAGED_EXTRN_FILES to indicate whether or not to look in user-specified directories for external files.  Previously, this was determined by checking whether the variables EXTRN_MDL_SOURCE_DIR_ICS and EXTRN_MDL_SOURCE_DIR_LBCS are empty or not.  Now, there is an explicit flag to set.
  * Make the number of attempts for the rocoto tasks (maxtries) user-specifiable workflow variables (instead of setting them to a default of 1).
  * Allow the workflow variable EXPT_BASEDIR to be set to a relative path.  In this case, the relative path gets created under the usual default location of EXPT_BASEDIR (which is ${SR_WX_APP_TOP_DIR}/../expt_dirs).
  * Bug fix - update the configuration file name for UPP from postxconfig-NT-fv3sar.txt to postxconfig-NT-fv3lam.txt to match the name in the EMC_post repository (release branch).
### Modifications to script that runs the WE2E tests (run_experiments.sh):
  * Add new arguments expt_basedir, testset_name, and verbose.  expt_basedir and verbose can be used to override the default values of EXPT_BASEDIR and VERBOSE in the workflow.  testset_name can be used to specify a subdirectory under expt_basedir in which to place all the experiment directories.  This is convenient to isolate sets of tests in different directories.
  * Bug fix - Source the default workflow configuration file for each WE2E test, not just once at the beginning of the script.
  * Rewrite the way EXPT_SUBDIR is set so that it is set properly whether or not expt_basedir and testset_name are specified as input arguments.
  * Instead of using "cat" with heredocs to create the workflow configuration file for each test, store the contents of the configuration file in a variable (str) and write it out to a file after it is complete.
  * If running the WE2E tests on hera or cheyenne, increase maxtries for various tasks that will likely need more than one attempt (because non-reproducible bugs in codes cause the tasks to fail).
  * Remove unused code.
### Modifications to WE2E test configuration files:
  * Remove lines that set workflow variables (e.g. RUN_TASK_MAKE_GRID) to their default values in config_defaults.sh because they effectively do nothing; better to keep the test configuration files shorter.
  * Add 4 new WE2E tests that test fetching of files from various external models from NOAA HPSS.  These use the new workflow variable USE_USER_STAGED_EXTRN_FILES described above.
  * Remove lines in some WE2E configuration files that now get generated in run_experiments.sh (e.g. comments included in those configuration files that run in NCO mode).
  * Rename WE2E test configuration files to clarify their purpose, i.e. what workflow feature/capability they test.
  * Update the list of WE2E test names in baselines_list.txt.

## TESTS CONDUCTED: 
The WE2E tests were run on both hera and cheyenne.  The experiment directories can be found here:

hera:  /scratch2/BMC/det/Gerard.Ketefian/UFS_CAM/PR_feature_WE2E_testing/expt_dirs/try13
cheyenne:  /glade/scratch/ketefian/PR_feature_WE2E_testing/expt_dirs/try14

Note that:
* On both machines:
  * Test suite_FV3_CPT_v0 fails due to a problem with the namelist (likely a namelist variable name is incorrect).  This is a preexisting issue, i.e. it also occurs in the develop branch.
  * Test suite_FV3_RRFS_v1beta fails in the make_orog task because orography statistics files required by the gravity wave drag (GWD) parameterization in the RRFS_v1beta scheme aren't available.  This will be temporarily remedied by using a different GWD parameterization in this suite.  That change will happen in a future PR.
* On hera:
  * Test grid_GSD_RAP13km fails because run_post_f003 fails (even after 20 tries) while run_post_f005 hangs.  Others run_post_f### tasks that succeed take many attempts.  This did not happen with the previous version of EMC_post being used in the workflow.  Thus, it is either due to this change in EMC_post version or the change in the grid generation code (from regional_grid to regional_esg_grid).  Needs further investigation.
  * All tests other than grid_GSD_RAP13km, suite_FV3_CPT_v0, and suite_FV3_RRFS_v1beta succeed.
* On cheyenne:
  * Test grid_GSD_RAP13km fails because run_post_f005 hangs.  All other run_post_f### tasks succeed, although some take many attempts.  As on hera, this did not happen with the previous version of EMC_post being used in the workflow, so it is either due to the change in EMC_post version or the change in the grid generation code (from regional_grid to regional_esg_grid).  Needs further investigation.
  * The four new tests get_extrn_files_from_hpss_... that try to fetch the external model files from NOAA HPSS fail.  This is expected because cheyenne does not provide access to NOAA HPSS.  Thus, these tests do not need to be run on cheyenne.
  * **All tests that use the HRRRX and/or RAPX as external model files for ICs/LBCs fail in both the make_ics and make_lbcs tasks.**  
    * These tests are:  grid_GSD_HRRR_AK_50km, nco_GSD_HRRR25km_HRRRX_RAPX, nco_GSD_HRRR3km_HRRRX_RAPX, nco_GSD_SUBCONUS3km_HRRRX_RAPX, suite_FV3_GSD_SAR, suite_FV3_GSD_v0, suite_FV3_RRFS_v1beta.  (Recall that suite_FV3_RRFS_v1beta fails in the make_orog task (before the make_ics and make_lbcs tasks) because the orography statistics files needed by the FV3_RRFS_v1beta suite are not available.)
    * These failures do not happen on hera and thus are likely due to a bug in chgres_cube.  This needs further investigation.
  * All tests other than the ones mentioned above succeed.

The failures listed above are due missing input files or to possible bugs in the chgres_cube and EMC_post codes (or possibly due to the changeover to the new regional_esg_grid code, which will not be reverted), or they may indicate inconsistencies in the software versions loaded in the modulefiles on cheyenne.  They do not indicate any problems with this PR.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants