Z_R, L_R, T_RESCALE_POWER answers don't match for my Bering domain #54

kshedstrom · 2021-12-30T19:14:34Z

R and Q are not a problem. Debugging tips welcome.

Hallberg-NOAA · 2022-03-12T15:19:37Z

There are well established procedures to debug dimensional consistency issues. Do a pair of short runs with DEBUG=True, one with all of the rescaling disabled by setting [ZLTRQH]_RESCALE_POWER = 0, and a second one with one or more of these rescaling factors set to another integer. These runs will write out a series of unscaled checksums with identifying messages (a few hundred lines per timestep) to stdout. The first place where the output differs is usually a pretty good indication of where in the code flow the inconsistency lies, although it is sometimes necessary to go back and add more of the unscaled checksum calls to find the source of the problem.

Sometimes the first difference turns out to be a red herring where the problem is with the checksum call; in this case you will have the isolated line that differs followed by a bunch of lines that match. Once the lines start to differ repeatedly, you can be pretty sure that you are past the first problematic point.

After finding the source of the problem, it is up to you whether to suggest that we add any new checksum calls that you had to add to detect the problem, although of course they should be all wrapped in a logical test for whether DEBUG=True.

kshedstrom · 2022-03-17T23:20:16Z

I didn't realize the checksums would be unscaled. The first differing checksum of note happens after the first Coriolis computation. I tracked it down to update_OBC_segment_data in MOM_open_boundary.F90. For having a Z_SCALE of 1 vs. 0, the vertical remapping happens with matching values of h_stack, but not matching values of segment%field(m)%dz_src. The wrong levels from buffer_src are being used to populate buffer_dst. Differences show up in the Coriolis when using segment%tangential_vel at the open boundaries.

Should h_stack match with a nonzero Z_SCALE? Or how do I get dz_src to match? @Hallberg-NOAA @MJHarrison-GFDL

kshedstrom · 2022-03-18T00:55:13Z

As for T_SCALE of 1 vs. 0, there is trouble in the uv_sponge code. CS%var_u%p differs by a factor of two where the sponge is zero, but not by exactly two in the sponge area. The nudging timescales match and probably shouldn't. @sdbachman

kshedstrom · 2022-03-18T01:05:38Z

Wait, that was the L_SCALE case. Probably the same story for T_SCALE for similar reasons.

kshedstrom · 2022-03-21T18:20:29Z

I don't seen any scale factors in set_up_ALE_sponge_vel_field_varying. It is similar to set_up_ALE_sponge_field_varying for the tracers - how and where would the tracers be scaled?

Hallberg-NOAA · 2022-03-21T21:43:16Z

It looks to me like the problem (or at least part of it) is the 3rd argument in the calls to horiz_interp_and_extrap_tracer() on lines 968 and 1017 of MOM_ALE_sponge.F90 for u and v. These calls are currently using conversion factors of 1.0, but I think that these should be changed to US%m_s_to_L_T. This particular bug would not explain why there differences when the value of Z_RESCALE_POWER changes, but at least it is a start.

kshedstrom · 2022-03-31T21:41:58Z

I have a fix to the sponge trouble with L_SCALE and T_SCALE. As for Z_SCALE, should h change with non-zero Z_SCALE? It doesn't. The trouble comes about in adustSegmentEtaToFitBathymetry in MOM_open_boundary.F90, where it assumes vertical distances should be scaled by the Z_SCALE. The segment%Htot field comes from h and is therefore not scaled.

Hallberg-NOAA · 2022-04-01T13:51:23Z

Thanks for all of your work delving into this, @kshedstrom!

The thickness variables scale with H_SCALE, not the Z_SCALE scaling used by height variables. The reason for that is that in Boussinesq mode, the thickness variables have completely different units (m when unscaled) than in non-Boussinesq mode (kg m-2 when unscaled). To facilitate the conversions between the two vertical thickness and height units, use the factors GV%H_to_Z, GV%H_to_m, GV%H_to_kg_m2, GV%H_to_MKS or the reciprocal variables like GV%Z_to_H, etc.

A quick examination of adustSegmentEtaToFitBathymetry() seems to have the right conversion factors to first set segment%dz_src in units of Z (and assume that it is coming in with units of Z), and then convert them to units of H:

MOM6/src/core/MOM_open_boundary.F90

Lines 5180 to 5183 in 08582ee

    
           ! Now convert thicknesses to units of H. 
        
           do k=1,nz 
        
             segment%field(fld)%dz_src(i,j,k) = segment%field(fld)%dz_src(i,j,k)*GV%Z_to_H 
        
           enddo

It seems like a really bad idea that segment%dz_src uses different units at different points in the code. However, I am not quite sure how to address this. It seems like the time_interp_external() capability does not offer the ability to rescale variables, but this is one part of the code that is still very poorly documented, with interfaces inherited directly from the FMS code without any documentation of their arguments. When I was going through the MOM6 code to ensure consistent dimensional rescaling, this appears to have been omitted because (1) I did not have a test actively exercising this code, and (2) the interactions between the time_interp_external() facilities and the buffers to interpolate onto the segments introduced enough complexity, array-reuse and indirection that I was not brave enough to try to guess at the units of the various re-used variables or the right solution within the code. In such cases (and there were very few), I confined my edits to comments (such as documenting the shifting dimensions of segment%dz_src), so as not to unwittingly break something that was working.

I think that the preferred outcome is that we have a single well-defined unit for segment%dz_src wherever it is used, and that the necessary rescaling occur as soon as possible when it is being read - either inside of time_interp_external() or immediately afterward. The fact that you have a case that is detecting these problems is exactly what we need to know that we have gotten them right.

kshedstrom · 2022-04-02T01:59:12Z

I added a scale factor right after the call to time_interp_external. It now differs in MOM_barotropic, BT_force_[uv]. The term forces%taux differs by the scale factor, but mass_to_Z is the same in both runs. This is with the Z_RESCALE set to non-zero, so that seems odd.

Hallberg-NOAA · 2022-04-04T09:35:51Z

It looks to me like the problem might be that when BT_NONLIN_STRESS = True (ie., CS%nonlin_stress==.true.) in MOM_barotropic.F90, CS%IDatu and CS%IDatv are being set with units of [H-1], whereas they are otherwise being set with units of [Z-1]. Multiplying the expression for CS%IDatu by a factor of GV%Z_to_H on lines 1248, 1251 and 1254 and similarly for the expressoins for CS%IDatv on lines 1274, 1277 and 1280 might be one way to solve this problem. This should not change answers in Boussinesq mode, but in non-Boussinesq mode this will change answers and correct a real bug.

kshedstrom · 2022-04-04T21:35:22Z

Thanks, this did fix the barotropic scaling issue. The next trouble is in MOM_energetic_PBL. For the point i,j = 5,5 (the first one) the default case does not converge in 20 iterations while the scaled case converges in 11 iterations inside ePBL_column. I need to investigate more.

kshedstrom · 2022-04-04T22:24:27Z

On the third iteration, it returns differing values of MLD_output. More digging needed...

Hallberg-NOAA · 2022-04-04T23:07:25Z

Do these values differ dramatically, or only at something resembling roundoff?

I find that using very large scaling factors (like 2^140 or 2^-140) can be helpful for making the problems obvious.

The other thing that I have sometimes found is that setting an explicit small underflow value (like VEL_UNDERFLOW), can sometimes be important for getting answers to reproduce. It is possible that we have missed one of these cases of underflow in the ePBL TKE budget, and that your Bering domain test case is tickling this issue when other cases do not.

However, the fact that the iteration does not converge at all is suggestive that this is another real bug and not a minor oversight.

kshedstrom · 2022-04-05T00:00:13Z

Sorry, the values differ on the first time through. Not by much, but not at underflow levels either. Then MixLen_shape is different the second time through and things are rather more different. I'm using a scaling of 2^2.

kshedstrom · 2022-04-05T18:48:59Z

At L#1079 of MOM_energetic_PBL.F90, htot is not scaled and MLD_guess is scaled. The result is differing values of Surface_Scale and therefore vstar is off by some scale that isn't a power of 2.

kshedstrom · 2022-04-05T20:13:03Z

I fixed that one, now there's an H_RESCALE issue with pre-btstep accel. The fun never ends!

kshedstrom · 2022-04-05T23:09:32Z

It's in the Coriolis term, truly in the last few bits, such that totalview stats match except for the checksum. Is this where you'd try all the Coriolis options?

Hallberg-NOAA · 2022-04-06T02:37:49Z

Do the differences remain in the last few bits if you try very large and very small rescaling factors? Trying more extreme rescaling may be a way to help smoke out the problem.

And yes, I think that trying the various Coriolis options would be a good way to figure out which ones are problematic. Which Coriolis scheme parameter settings are you using in your case?

kshedstrom · 2022-04-06T17:57:28Z

It's that cursed OBC code again, specifically in remapping_core_h. I don't think it matters which Coriolis scheme we are using. Totalview really does not want to step into remapping_core_h, but buffer_dst comes out matching in the top four levels (all within the top ten meters of the source), they don't match for k between 5 and 10, then they match below that.

kshedstrom · 2022-04-06T21:17:24Z

Do these need to be scaled with H_SCALE:
h_neglect, h_neglect_edge?

kshedstrom · 2022-04-06T23:02:28Z

We discussed it and agreed to pass GV%H_subroundoff for the two optional arguments to get the correct scalings. Most of the calls to remapping_core_h are straightforward to correct. However, MOM_lateral_boundary_diffusion.F90 has unit tests calling a routine calling remapping_core_h and it doesn't have GV.

- Scaling of OBC's dz_src. - Scaling of CS%IDatu in MOM_barotropic.

Hallberg-NOAA added the bug Something isn't working label Apr 4, 2022

kshedstrom mentioned this issue Apr 7, 2022

Fix some rescaling issues smoked out by the regional Bering domain #104

Merged

Hallberg-NOAA linked a pull request Apr 7, 2022 that will close this issue

Fix some rescaling issues smoked out by the regional Bering domain #104

Merged

Hallberg-NOAA mentioned this issue Apr 22, 2022

+Fix some rescaling issues smoked out by the regional Bering domain #117

Merged

Hallberg-NOAA linked a pull request Apr 22, 2022 that will close this issue

+Fix some rescaling issues smoked out by the regional Bering domain #117

Merged

Hallberg-NOAA closed this as completed in #117 Apr 22, 2022

Hallberg-NOAA pushed a commit that referenced this issue Apr 22, 2022

Part of the fix for issue #54.

24f412d

Hallberg-NOAA pushed a commit that referenced this issue Apr 22, 2022

More fixes for issue #54.

1e6924e

- Scaling of OBC's dz_src. - Scaling of CS%IDatu in MOM_barotropic.

Hallberg-NOAA pushed a commit that referenced this issue Apr 22, 2022

Rest of fix for issue #54.

57a1a8a

Hallberg-NOAA pushed a commit that referenced this issue Apr 22, 2022

More cleaning up for issue #54

3c7bd19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Z_R, L_R, T_RESCALE_POWER answers don't match for my Bering domain #54

Z_R, L_R, T_RESCALE_POWER answers don't match for my Bering domain #54

kshedstrom commented Dec 30, 2021

Hallberg-NOAA commented Mar 12, 2022

kshedstrom commented Mar 17, 2022 •

edited

Loading

kshedstrom commented Mar 18, 2022

kshedstrom commented Mar 18, 2022

kshedstrom commented Mar 21, 2022

Hallberg-NOAA commented Mar 21, 2022

kshedstrom commented Mar 31, 2022

Hallberg-NOAA commented Apr 1, 2022

kshedstrom commented Apr 2, 2022

Hallberg-NOAA commented Apr 4, 2022

kshedstrom commented Apr 4, 2022

kshedstrom commented Apr 4, 2022

Hallberg-NOAA commented Apr 4, 2022

kshedstrom commented Apr 5, 2022

kshedstrom commented Apr 5, 2022

kshedstrom commented Apr 5, 2022

kshedstrom commented Apr 5, 2022

Hallberg-NOAA commented Apr 6, 2022

kshedstrom commented Apr 6, 2022

kshedstrom commented Apr 6, 2022

kshedstrom commented Apr 6, 2022

Z_R, L_R, T_RESCALE_POWER answers don't match for my Bering domain #54

Z_R, L_R, T_RESCALE_POWER answers don't match for my Bering domain #54

Comments

kshedstrom commented Dec 30, 2021

Hallberg-NOAA commented Mar 12, 2022

kshedstrom commented Mar 17, 2022 • edited Loading

kshedstrom commented Mar 18, 2022

kshedstrom commented Mar 18, 2022

kshedstrom commented Mar 21, 2022

Hallberg-NOAA commented Mar 21, 2022

kshedstrom commented Mar 31, 2022

Hallberg-NOAA commented Apr 1, 2022

kshedstrom commented Apr 2, 2022

Hallberg-NOAA commented Apr 4, 2022

kshedstrom commented Apr 4, 2022

kshedstrom commented Apr 4, 2022

Hallberg-NOAA commented Apr 4, 2022

kshedstrom commented Apr 5, 2022

kshedstrom commented Apr 5, 2022

kshedstrom commented Apr 5, 2022

kshedstrom commented Apr 5, 2022

Hallberg-NOAA commented Apr 6, 2022

kshedstrom commented Apr 6, 2022

kshedstrom commented Apr 6, 2022

kshedstrom commented Apr 6, 2022

kshedstrom commented Mar 17, 2022 •

edited

Loading