Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JEDI-based ensemble recentering and analysis calculation #3312

Open
wants to merge 103 commits into
base: develop
Choose a base branch
from

Conversation

DavidNew-NOAA
Copy link
Contributor

@DavidNew-NOAA DavidNew-NOAA commented Feb 10, 2025

Description

COORDINATED MERGE

This PR implements ensemble recentering and analysis calculation in the Global Workflow, using JEDI-based applications to replace certain GSI utilities when JEDI is turned on in the workflow. If using GSI, then the workflow will remain unchanged. This PR also (finally) implements native-grid DA increments into the worflow.

The gdas_analcalc and enkfgdas_ecen jobs will be replaced by gdas_analcalc_fv3jedi and enkfgdas_ecen_fv3jedi jobs respectively. The enkfgdas_echgres job is eliminated, since changing of resolution of the deterministic backgrounds is done internally in the JEDI-based recentering application.

The design for this PR is based on discussions between the DA team and GW team a few months ago. Explanation of the flow of data through the workflow:

The gdas_analcalc_fv3jedi job dependencies do not change. The native-grid backgrounds andincrements are staged, and then the GDASApp JEDI fv3jedi_add_increments application is run to add them and interpolate to the Gaussian grid. The Gaussian-grid backgrounds are also staged, and then a simple Python function inserts to these analysis variables into the histories, which become the Gaussian analyses. This is done this way to guarantee that the resulting Gaussian analyses are in the exact format required by UPP.

The enkfgdas_ecen_fv3jedi no longer depends on the analysis calc job, since the ensemble-resolution variational analysis is computed/interpolated internally in the JEDI-based recentering application. All other job dependencies remain the same. We no longer need to compute the ensemble mean analysis in this job, since it can be outputted the the JEDI local ensemble DA application in the enkfgdas_atmensanlsol job and just staged for recentering. The variational increment and deterministic backgrounds are also staged to compute the ensemble-resolution variational analysis. The output of this job is no longer the recentered ensemble increments, but rather the "correction increment", which when added to ensemble increments becomes the recentered increments. The prefix for the "correction increment" is catminc.

The enkfgdas_fcst job now stages both the ensemble increments and the correction increment. They are added together with ncbo in forecast_postdet.sh to generate the recentered increment.

All forecast increments, both deterministic and ensemble, are now on the native cubed-sphere grid

Resolves #3248

Type of change

  • Bug fix (fixes something broken)
  • New feature (adds functionality)
  • Maintenance (code refactor, clean-up, new CI test, etc.)

Change characteristics

  • Is this a breaking change (a change in existing functionality)? YES
  • Does this change require a documentation update? IDK
  • Does this change require an update to any of the following submodules? YES
    • EMC verif-global
    • GDAS #1488
    • GFS-utils
    • GSI
    • GSI-monitor
    • GSI-utils
    • UFS-utils
    • UFS-weather-model
    • wxflow

How has this been tested?

  • Clone and build on Hera
  • Run C96C48_ufs_hybatmDA

Checklist

  • Any dependent changes have been merged and published
  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have documented my code, including function, input, and output descriptions
  • My changes generate no new warnings
  • New and existing tests pass with my changes
  • This change is covered by an existing CI test or a new one has been added
  • Any new scripts have been added to the .github/CODEOWNERS file with owners
  • I have made corresponding changes to the system documentation if necessary

@DavidHuber-NOAA DavidHuber-NOAA removed the CI-Hera-Running **Bot use only** CI testing on Hera for this PR is in-progress label Mar 5, 2025
@DavidNew-NOAA
Copy link
Contributor Author

@aerorahul All comments addressed. I can't test the staging of the zero-valued warm-start native-grid increments which I made until they are staged in /scratch1/NCEPDEV/global/glopara/data/ICSDIR/

@DavidNew-NOAA
Copy link
Contributor Author

I don't have access to /scratch1/NCEPDEV/global/glopara/data/ICSDIR/, but I do have zeroed native-grid increments created for each resolution at /scratch1/NCEPDEV/da/David.New/zeroinc/C*

@aerorahul
Copy link
Contributor

Thanks @DavidNew-NOAA
@KateFriedman-NOAA Can you place the initial zero increments in the path that @DavidNew-NOAA is suggesting?

@DavidNew-NOAA
Copy link
Contributor Author

@aerorahul @KateFriedman-NOAA The path I gave you is pretty general, and I don't know how many places we want to put these zero increments yet. The only JEDI atmos CI case is C96C48_ufs_hybatmDA (I believe), so the C96 increments would be in put in /scratch1/NCEPDEV/global/glopara/data/ICSDIR/C96C48/20241120/gdas.20240223/18/analysis/atmos/ with the prefix changed to gdas.t18z. and the C48 increments would need to go in /scratch1/NCEPDEV/global/glopara/data/ICSDIR/C96C48/20241120/enkfgdas.20240223/18/mem001/analysis/atmos and /scratch1/NCEPDEV/global/glopara/data/ICSDIR/C96C48/20241120/enkfgdas.20240223/18/mem002/analysis/atmos with the prefix changes to enkfgdas.t18z and atminc changed to ratminc.

@DavidNew-NOAA
Copy link
Contributor Author

I've addressed some some reviewer comments in the companion PRs (soon to merge) to this one, so I'll need to retest C96C48_ufs_hybatmDA

@KateFriedman-NOAA
Copy link
Member

@DavidNew-NOAA I'm working to copy in the zero increment files into the ICSDIR folders on Hera. So far I have done this:

cd /scratch1/NCEPDEV/global/glopara/data/ICSDIR/C96C48/20241120/gdas.20240223/18/analysis/atmos/
rsync -azv /scratch1/NCEPDEV/da/David.New/zeroinc/C96/* .
mv enkfgdas.t18z.cubed_sphere_grid_ratminc.tile1.nc gdas.t18z.cubed_sphere_grid_ratminc.tile1.nc
mv enkfgdas.t18z.cubed_sphere_grid_ratminc.tile2.nc gdas.t18z.cubed_sphere_grid_ratminc.tile2.nc
mv enkfgdas.t18z.cubed_sphere_grid_ratminc.tile3.nc gdas.t18z.cubed_sphere_grid_ratminc.tile3.nc
mv enkfgdas.t18z.cubed_sphere_grid_ratminc.tile4.nc gdas.t18z.cubed_sphere_grid_ratminc.tile4.nc
mv enkfgdas.t18z.cubed_sphere_grid_ratminc.tile5.nc gdas.t18z.cubed_sphere_grid_ratminc.tile5.nc
mv enkfgdas.t18z.cubed_sphere_grid_ratminc.tile6.nc gdas.t18z.cubed_sphere_grid_ratminc.tile6.nc

See those files here:

20:06:43 ICSDIR/$pwd
/scratch1/NCEPDEV/global/glopara/data/ICSDIR
20:06:51 ICSDIR/$ll C96C48/20241120/gdas.20240223/18/analysis/atmos/*cubed*
-rw-r--r-- 1 role.glopara global 84993680 Mar  5 16:49 C96C48/20241120/gdas.20240223/18/analysis/atmos/gdas.t18z.cubed_sphere_grid_atminc.tile1.nc
-rw-r--r-- 1 role.glopara global 84993680 Mar  5 16:49 C96C48/20241120/gdas.20240223/18/analysis/atmos/gdas.t18z.cubed_sphere_grid_atminc.tile2.nc
-rw-r--r-- 1 role.glopara global 84993680 Mar  5 16:49 C96C48/20241120/gdas.20240223/18/analysis/atmos/gdas.t18z.cubed_sphere_grid_atminc.tile3.nc
-rw-r--r-- 1 role.glopara global 84993680 Mar  5 16:49 C96C48/20241120/gdas.20240223/18/analysis/atmos/gdas.t18z.cubed_sphere_grid_atminc.tile4.nc
-rw-r--r-- 1 role.glopara global 84993680 Mar  5 16:49 C96C48/20241120/gdas.20240223/18/analysis/atmos/gdas.t18z.cubed_sphere_grid_atminc.tile5.nc
-rw-r--r-- 1 role.glopara global 84993680 Mar  5 16:49 C96C48/20241120/gdas.20240223/18/analysis/atmos/gdas.t18z.cubed_sphere_grid_atminc.tile6.nc
-rw-r--r-- 1 role.glopara global 84993680 Mar  6 19:37 C96C48/20241120/gdas.20240223/18/analysis/atmos/gdas.t18z.cubed_sphere_grid_ratminc.tile1.nc
-rw-r--r-- 1 role.glopara global 84993680 Mar  6 20:04 C96C48/20241120/gdas.20240223/18/analysis/atmos/gdas.t18z.cubed_sphere_grid_ratminc.tile2.nc
-rw-r--r-- 1 role.glopara global 84993680 Mar  6 19:37 C96C48/20241120/gdas.20240223/18/analysis/atmos/gdas.t18z.cubed_sphere_grid_ratminc.tile3.nc
-rw-r--r-- 1 role.glopara global 84993680 Mar  6 19:37 C96C48/20241120/gdas.20240223/18/analysis/atmos/gdas.t18z.cubed_sphere_grid_ratminc.tile4.nc
-rw-r--r-- 1 role.glopara global 84993680 Mar  6 19:37 C96C48/20241120/gdas.20240223/18/analysis/atmos/gdas.t18z.cubed_sphere_grid_ratminc.tile5.nc
-rw-r--r-- 1 role.glopara global 84993680 Mar  6 19:37 C96C48/20241120/gdas.20240223/18/analysis/atmos/gdas.t18z.cubed_sphere_grid_ratminc.tile6.nc

I'm a bit confused about this part of the request:

the C48 increments would need to go in 
/scratch1/NCEPDEV/global/glopara/data/ICSDIR/C96C48/20241120/enkfgdas.20240223/18/mem001/analysis/atmos 
and 
/scratch1/NCEPDEV/global/glopara/data/ICSDIR/C96C48/20241120/enkfgdas.20240223/18/mem002/analysis/atmos 
with the prefix changes to enkfgdas.t18z and atminc changed to ratminc.

I see these C48 files:

20:10:26 ICSDIR/$ll /scratch1/NCEPDEV/da/David.New/zeroinc/C48/
total 247584
-rw-r--r-- 1 David.New da 21126296 Mar  6 18:31 enkfgdas.t18z.cubed_sphere_grid_ratminc.tile1.nc
-rw-r--r-- 1 David.New da 21126296 Mar  6 18:31 enkfgdas.t18z.cubed_sphere_grid_ratminc.tile2.nc
-rw-r--r-- 1 David.New da 21126296 Mar  6 18:31 enkfgdas.t18z.cubed_sphere_grid_ratminc.tile3.nc
-rw-r--r-- 1 David.New da 21126296 Mar  6 18:31 enkfgdas.t18z.cubed_sphere_grid_ratminc.tile4.nc
-rw-r--r-- 1 David.New da 21126296 Mar  6 18:31 enkfgdas.t18z.cubed_sphere_grid_ratminc.tile5.nc
-rw-r--r-- 1 David.New da 21126296 Mar  6 18:31 enkfgdas.t18z.cubed_sphere_grid_ratminc.tile6.nc
-rw-r--r-- 1 David.New da 21126296 Mar  5 17:08 gdas.t18z.cubed_sphere_grid_atminc.tile1.nc
-rw-r--r-- 1 David.New da 21126296 Mar  5 17:08 gdas.t18z.cubed_sphere_grid_atminc.tile2.nc
-rw-r--r-- 1 David.New da 21126296 Mar  5 17:08 gdas.t18z.cubed_sphere_grid_atminc.tile3.nc
-rw-r--r-- 1 David.New da 21126296 Mar  5 17:08 gdas.t18z.cubed_sphere_grid_atminc.tile4.nc
-rw-r--r-- 1 David.New da 21126296 Mar  5 17:08 gdas.t18z.cubed_sphere_grid_atminc.tile5.nc
-rw-r--r-- 1 David.New da 21126296 Mar  5 17:08 gdas.t18z.cubed_sphere_grid_atminc.tile6.nc

If I copy those in, change the gdas.t18z prefix to enkfgdas.t18z then they will all be enkfgdas...but if I then change atminc to ratminc I will overwrite the existing ratminc files. Am I missing something? Thanks!

@RussTreadon-NOAA
Copy link
Contributor

WCOSS2 g-w CI

Install DavidNew-NOAA:feature/calcanl at 16448be on Cactus. Run g-w CI. Failures occurs in the following cases.

/lfs/h2/emc/ptmp/russ.treadon/EXPDIR/C96C48_hybatmaerosnowDA_pr3312
202112201800          gfs_aeroanlvar                   182638699                DEAD                   1         2          73.0
202112201800         gdas_aeroanlvar                   182638700                DEAD                   1         2          75.0
 
/lfs/h2/emc/ptmp/russ.treadon/EXPDIR/C96C48_hybatmDA_pr3312
 
/lfs/h2/emc/ptmp/russ.treadon/EXPDIR/C96C48_ufs_hybatmDA_pr3312
202402231800          gdas_fcst_seg0                   182635578                DEAD                   1         2          46.0
202402231800    enkfgdas_fcst_mem001                   182635579                DEAD                   1         2          43.0
202402231800    enkfgdas_fcst_mem002                   182635580                DEAD                   1         2          43.0

The C96C48_ufs_hybatmDA failure is due to

+ forecast_postdet.sh[223]: echo 'FATAL ERROR: missing increment file '\''/lfs/h2/emc/ptmp/russ.treadon/COMROOT/C96C48_ufs_hybatmDA_pr3312/gdas.20240223/18//analysis/atmos/gdas.t18z.cubed_sphere_grid_atminc.tile1.nc'\'', ABORT!'
FATAL ERROR: missing increment file '/lfs/h2/emc/ptmp/russ.treadon/COMROOT/C96C48_ufs_hybatmDA_pr3312/gdas.20240223/18//analysis/atmos/gdas.t18z.cubed_sphere_grid_atminc.tile1.nc', ABORT!
+ forecast_postdet.sh[224]: exit 1

Expected initial condition files are not on WCOSS2. This may be related to @KateFriedman-NOAA's comment above

The C96C48_hybatmaerosnowDA failure occurs when gdas.x is running

nid004033.cactus.wcoss2.ncep.noaa.gov 36: fv3jedi_io_fms_mod.read_restart_fields: file ./bkg/20211220.150000.sfc_data.nc could not be opened
nid004033.cactus.wcoss2.ncep.noaa.gov 37: fv3jedi_io_fms_mod.read_restart_fields: file ./bkg/20211220.150000.sfc_data.nc could not be opened
nid004033.cactus.wcoss2.ncep.noaa.gov 38: fv3jedi_io_fms_mod.read_restart_fields: file ./bkg/20211220.150000.sfc_data.nc could not be opened
nid004033.cactus.wcoss2.ncep.noaa.gov 39: fv3jedi_io_fms_mod.read_restart_fields: file ./bkg/20211220.150000.sfc_data.nc could not be opened
nid004033.cactus.wcoss2.ncep.noaa.gov 40: fv3jedi_io_fms_mod.read_restart_fields: file ./bkg/20211220.150000.sfc_data.nc could not be opened
nid004033.cactus.wcoss2.ncep.noaa.gov 41: fv3jedi_io_fms_mod.read_restart_fields: file ./bkg/20211220.150000.sfc_data.nc could not be opened
nid004033.cactus.wcoss2.ncep.noaa.gov 42: fv3jedi_io_fms_mod.read_restart_fields: file ./bkg/20211220.150000.sfc_data.nc could not be opened
nid004033.cactus.wcoss2.ncep.noaa.gov 43: fv3jedi_io_fms_mod.read_restart_fields: file ./bkg/20211220.150000.sfc_data.nc could not be opened
nid004033.cactus.wcoss2.ncep.noaa.gov 44: fv3jedi_io_fms_mod.read_restart_fields: file ./bkg/20211220.150000.sfc_data.nc could not be opened
nid004033.cactus.wcoss2.ncep.noaa.gov 45: fv3jedi_io_fms_mod.read_restart_fields: file ./bkg/20211220.150000.sfc_data.nc could not be opened
nid004033.cactus.wcoss2.ncep.noaa.gov 46: fv3jedi_io_fms_mod.read_restart_fields: file ./bkg/20211220.150000.sfc_data.nc could not be opened
nid004033.cactus.wcoss2.ncep.noaa.gov 47: fv3jedi_io_fms_mod.read_restart_fields: file ./bkg/20211220.150000.sfc_data.nc could not be opened
nid004033.cactus.wcoss2.ncep.noaa.gov 16: fv3jedi_io_fms_mod.read_restart_fields: file ./bkg/20211220.150000.sfc_data.nc could not be opened
nid004033.cactus.wcoss2.ncep.noaa.gov 16: MPICH ERROR [Rank 16] [job id 2f320b72-060a-4127-a016-fc6d17dd20a5] [Fri Mar  7 13:06:53 2025] [nid004033] - Abort(1) (ra\
nk 16 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 16

aborting job:

While the gdas_aeroanlinit job log file contains 'filename_sfcd': '%yyyy%mm%dd.%hh%MM%ss.sfc_data.nc', there is no copy of sfc_data tiles to bkg.

Does fvfiles in $HOMEgfs/parm/gdas/aero_stage_variational.yaml.j2 need to be expanded to include sfc_data?

Currently, aero_stage_variational.yaml.j2 has

{% set fvfiles = ['fv_core.res.', 'fv_tracer.res.'] %}

@DavidNew-NOAA
Copy link
Contributor Author

@KateFriedman-NOAA Disregard my comment about renaming. I renamed them myself.

@DavidNew-NOAA
Copy link
Contributor Author

@RussTreadon-NOAA Hold off on testing JEDI as I have some more commits to push. I will take a look at the aero case this afternoon.

@KateFriedman-NOAA
Copy link
Member

@KateFriedman-NOAA Disregard my comment about renaming. I renamed them myself.

@DavidNew-NOAA Ok, which files would you like me to pull in for C48 then? Are the C96 files that I pulled ok or do they need adjusting? Thanks!

@DavidNew-NOAA
Copy link
Contributor Author

@KateFriedman-NOAA What you've copied allows us to run the JEDI atmos CI case, so no need for further action. Thanks!

@KateFriedman-NOAA
Copy link
Member

@KateFriedman-NOAA What you've copied allows us to run the JEDI atmos CI case, so no need for further action. Thanks!

@DavidNew-NOAA Excellent, thanks for confirming! I have now synced the new C96 ICs to WCOSS2, MSU, and Gaea C5+C6. If you have more IC files in the future please open a Static Data Update issue and I'll take care of it. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
JEDI Feature development to support JEDI-based DA
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create JEDI-based ensemble recentering and analysis calculation job
7 participants