-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Z_R, L_R, T_RESCALE_POWER answers don't match for my Bering domain #54
Comments
There are well established procedures to debug dimensional consistency issues. Do a pair of short runs with DEBUG=True, one with all of the rescaling disabled by setting [ZLTRQH]_RESCALE_POWER = 0, and a second one with one or more of these rescaling factors set to another integer. These runs will write out a series of unscaled checksums with identifying messages (a few hundred lines per timestep) to stdout. The first place where the output differs is usually a pretty good indication of where in the code flow the inconsistency lies, although it is sometimes necessary to go back and add more of the unscaled checksum calls to find the source of the problem. Sometimes the first difference turns out to be a red herring where the problem is with the checksum call; in this case you will have the isolated line that differs followed by a bunch of lines that match. Once the lines start to differ repeatedly, you can be pretty sure that you are past the first problematic point. After finding the source of the problem, it is up to you whether to suggest that we add any new checksum calls that you had to add to detect the problem, although of course they should be all wrapped in a logical test for whether DEBUG=True. |
I didn't realize the checksums would be unscaled. The first differing checksum of note happens after the first Coriolis computation. I tracked it down to update_OBC_segment_data in MOM_open_boundary.F90. For having a Z_SCALE of 1 vs. 0, the vertical remapping happens with matching values of h_stack, but not matching values of segment%field(m)%dz_src. The wrong levels from buffer_src are being used to populate buffer_dst. Differences show up in the Coriolis when using segment%tangential_vel at the open boundaries. Should h_stack match with a nonzero Z_SCALE? Or how do I get dz_src to match? @Hallberg-NOAA @MJHarrison-GFDL |
As for T_SCALE of 1 vs. 0, there is trouble in the uv_sponge code. CS%var_u%p differs by a factor of two where the sponge is zero, but not by exactly two in the sponge area. The nudging timescales match and probably shouldn't. @sdbachman |
Wait, that was the L_SCALE case. Probably the same story for T_SCALE for similar reasons. |
I don't seen any scale factors in set_up_ALE_sponge_vel_field_varying. It is similar to set_up_ALE_sponge_field_varying for the tracers - how and where would the tracers be scaled? |
It looks to me like the problem (or at least part of it) is the 3rd argument in the calls to |
I have a fix to the sponge trouble with L_SCALE and T_SCALE. As for Z_SCALE, should h change with non-zero Z_SCALE? It doesn't. The trouble comes about in adustSegmentEtaToFitBathymetry in MOM_open_boundary.F90, where it assumes vertical distances should be scaled by the Z_SCALE. The segment%Htot field comes from h and is therefore not scaled. |
Thanks for all of your work delving into this, @kshedstrom! The thickness variables scale with H_SCALE, not the Z_SCALE scaling used by height variables. The reason for that is that in Boussinesq mode, the thickness variables have completely different units (m when unscaled) than in non-Boussinesq mode (kg m-2 when unscaled). To facilitate the conversions between the two vertical thickness and height units, use the factors GV%H_to_Z, GV%H_to_m, GV%H_to_kg_m2, GV%H_to_MKS or the reciprocal variables like GV%Z_to_H, etc. A quick examination of adustSegmentEtaToFitBathymetry() seems to have the right conversion factors to first set segment%dz_src in units of Z (and assume that it is coming in with units of Z), and then convert them to units of H: MOM6/src/core/MOM_open_boundary.F90 Lines 5180 to 5183 in 08582ee
It seems like a really bad idea that segment%dz_src uses different units at different points in the code. However, I am not quite sure how to address this. It seems like the time_interp_external() capability does not offer the ability to rescale variables, but this is one part of the code that is still very poorly documented, with interfaces inherited directly from the FMS code without any documentation of their arguments. When I was going through the MOM6 code to ensure consistent dimensional rescaling, this appears to have been omitted because (1) I did not have a test actively exercising this code, and (2) the interactions between the time_interp_external() facilities and the buffers to interpolate onto the segments introduced enough complexity, array-reuse and indirection that I was not brave enough to try to guess at the units of the various re-used variables or the right solution within the code. In such cases (and there were very few), I confined my edits to comments (such as documenting the shifting dimensions of segment%dz_src), so as not to unwittingly break something that was working. I think that the preferred outcome is that we have a single well-defined unit for segment%dz_src wherever it is used, and that the necessary rescaling occur as soon as possible when it is being read - either inside of time_interp_external() or immediately afterward. The fact that you have a case that is detecting these problems is exactly what we need to know that we have gotten them right. |
I added a scale factor right after the call to time_interp_external. It now differs in MOM_barotropic, BT_force_[uv]. The term forces%taux differs by the scale factor, but mass_to_Z is the same in both runs. This is with the Z_RESCALE set to non-zero, so that seems odd. |
It looks to me like the problem might be that when BT_NONLIN_STRESS = True (ie., |
Thanks, this did fix the barotropic scaling issue. The next trouble is in MOM_energetic_PBL. For the point i,j = 5,5 (the first one) the default case does not converge in 20 iterations while the scaled case converges in 11 iterations inside ePBL_column. I need to investigate more. |
On the third iteration, it returns differing values of MLD_output. More digging needed... |
Do these values differ dramatically, or only at something resembling roundoff? I find that using very large scaling factors (like 2^140 or 2^-140) can be helpful for making the problems obvious. The other thing that I have sometimes found is that setting an explicit small underflow value (like VEL_UNDERFLOW), can sometimes be important for getting answers to reproduce. It is possible that we have missed one of these cases of underflow in the ePBL TKE budget, and that your Bering domain test case is tickling this issue when other cases do not. However, the fact that the iteration does not converge at all is suggestive that this is another real bug and not a minor oversight. |
Sorry, the values differ on the first time through. Not by much, but not at underflow levels either. Then MixLen_shape is different the second time through and things are rather more different. I'm using a scaling of 2^2. |
At L#1079 of MOM_energetic_PBL.F90, htot is not scaled and MLD_guess is scaled. The result is differing values of Surface_Scale and therefore vstar is off by some scale that isn't a power of 2. |
I fixed that one, now there's an H_RESCALE issue with pre-btstep accel. The fun never ends! |
It's in the Coriolis term, truly in the last few bits, such that totalview stats match except for the checksum. Is this where you'd try all the Coriolis options? |
Do the differences remain in the last few bits if you try very large and very small rescaling factors? Trying more extreme rescaling may be a way to help smoke out the problem. And yes, I think that trying the various Coriolis options would be a good way to figure out which ones are problematic. Which Coriolis scheme parameter settings are you using in your case? |
It's that cursed OBC code again, specifically in remapping_core_h. I don't think it matters which Coriolis scheme we are using. Totalview really does not want to step into remapping_core_h, but buffer_dst comes out matching in the top four levels (all within the top ten meters of the source), they don't match for k between 5 and 10, then they match below that. |
Do these need to be scaled with H_SCALE: |
We discussed it and agreed to pass GV%H_subroundoff for the two optional arguments to get the correct scalings. Most of the calls to remapping_core_h are straightforward to correct. However, MOM_lateral_boundary_diffusion.F90 has unit tests calling a routine calling remapping_core_h and it doesn't have GV. |
- Scaling of OBC's dz_src. - Scaling of CS%IDatu in MOM_barotropic.
R and Q are not a problem. Debugging tips welcome.
The text was updated successfully, but these errors were encountered: