rt bug fixes #395

MinsukJi-NOAA · 2021-01-26T20:36:15Z

Description

Bug fix in check_results function of rt_utils.sh

Exit if error occurs in compare_ncfile.py
Exit if error occurs in cmp
Is a change of answers expected from this PR? No
Are any library updates included in this PR (modulefiles etc.)? No

Bug fix for rt.sh -n option flag. This was broken when the MACHINES column of rt.conf was modified.

Issue(s) addressed

#296

Testing

Will run regression tests on supported platforms.

…ns 1 if error occurs.

climbfuji

Looks ok as far as I can tell. Will check again and approve when it is time for the PR to go in.

DeniseWorthen · 2021-01-27T13:56:59Z

One thing I've noticed is that if the RT fails (for example, one test times out) then the "clean after" command does not take affect. All the fv3_X.exe and matching modules.fv3_X are left in the tests directory. Is this a design feature or is it something we like to fix?

climbfuji · 2021-01-27T14:12:43Z

Design feature I think - you want to be able to see what happened, use the executable without having to recompile. Same reason why the rt_* directories are not removed if the regression tests fail.

…

On Jan 27, 2021, at 6:57 AM, Denise Worthen ***@***.***> wrote: One thing I've noticed is that if the RT fails (for example, one test times out) then the "clean after" command does not take affect. All the fv3_X.exe and matching modules.fv3_X are left in the tests directory. Is this a design feature or is it something we like to fix? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#395 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB5C2RJK3V7K6VP52CDIEJ3S4ALT3ANCNFSM4WUCLANA>.

DeniseWorthen · 2021-01-27T14:16:28Z

I can see where you'd use this---the trouble I have is associating the compile line w/ the test number. Short of counting each COMPILE line, is there a way to know which fv3_XX.exe is used for the failed test?

climbfuji · 2021-01-27T14:21:29Z

I think so. The log_hera.intel/run_001 script, for example, should have a line with "cp ... fv3_N.exe RUNDIR/fv3.exe"

…

On Jan 27, 2021, at 7:16 AM, Denise Worthen ***@***.***> wrote: I can see where you'd use this---the trouble I have is associating the compile line w/ the test number. Short of counting each COMPILE line, is there a way to know which fv3_XX.exe is used for the failed test? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#395 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB5C2RIXWSRQQE6GTFWL7RTS4AN4ZANCNFSM4WUCLANA>.

MinsukJi-NOAA · 2021-01-27T14:23:19Z

I think so. The log_hera.intel/run_001 script, for example, should have a line with "cp ... fv3_N.exe RUNDIR/fv3.exe"
…
On Jan 27, 2021, at 7:16 AM, Denise Worthen @.***> wrote: I can see where you'd use this---the trouble I have is associating the compile line w/ the test number. Short of counting each COMPILE line, is there a way to know which fv3_XX.exe is used for the failed test? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#395 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB5C2RIXWSRQQE6GTFWL7RTS4AN4ZANCNFSM4WUCLANA.

That is a good suggestion. Let me look into implementing it.

climbfuji · 2021-01-27T14:25:23Z

I think so. The log_hera.intel/run_001 script, for example, should have a line with "cp ... fv3_N.exe RUNDIR/fv3.exe"
…
On Jan 27, 2021, at 7:16 AM, Denise Worthen @.***> wrote: I can see where you'd use this---the trouble I have is associating the compile line w/ the test number. Short of counting each COMPILE line, is there a way to know which fv3_XX.exe is used for the failed test? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#395 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB5C2RIXWSRQQE6GTFWL7RTS4AN4ZANCNFSM4WUCLANA.

That is a good suggestion. Let me look into implementing it.

That line is there already. From cat log_cheyenne.intel/run_001_fv3_ccpp_control_prod.log:

+ mkdir -p /glade/scratch/heinzell/FV3_RT/rt_13602/fv3_ccpp_control_prod
+ cd /glade/scratch/heinzell/FV3_RT/rt_13602/fv3_ccpp_control_prod
+ cp /glade/work/heinzell/fv3/ufs-weather-model/ufs-weather-model-gsl-develop-20210120-update-from-develop/intel/tests/fv3_1.exe fv3.exe
+ cp /glade/work/heinzell/fv3/ufs-weather-model/ufs-weather-model-gsl-develop-20210120-update-from-develop/intel/tests/modules.fv3_1 modules.fv3

junwang-noaa · 2021-01-27T14:29:59Z

I have another question. I was running a test with shorter forecast length (24->3), I noticed RT reports that the atmos_4xdaily.nc reports ALT OK compared to baseline even though the run has fewer forecast times in the file, is that expected? I guess this could also be true if the diag_table is changed to have fewer fields, as long as the file has subset data of baseline data, the compare_netcdf will report the comparison is OK.

…

On Wed, Jan 27, 2021 at 9:23 AM Minsuk Ji ***@***.***> wrote: I think so. The log_hera.intel/run_001 script, for example, should have a line with "cp ... fv3_N.exe RUNDIR/fv3.exe" … <#m_7022545815085251882_> On Jan 27, 2021, at 7:16 AM, Denise Worthen *@*.***> wrote: I can see where you'd use this---the trouble I have is associating the compile line w/ the test number. Short of counting each COMPILE line, is there a way to know which fv3_XX.exe is used for the failed test? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#395 (comment) <#395 (comment)>>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB5C2RIXWSRQQE6GTFWL7RTS4AN4ZANCNFSM4WUCLANA . That is a good suggestion. Let me look into implementing it. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#395 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AI7D6TKBL3HEVBRWFHNNENTS4AOWRANCNFSM4WUCLANA> .

junwang-noaa · 2021-01-27T14:33:45Z

Also, I am wondering if we can have the total running time of each test added to the log file? We can grab "The total amount of wall time" from out file. On Wed, Jan 27, 2021 at 9:29 AM Jun Wang - NOAA Federal <jun.wang@noaa.gov> wrote:

…

I have another question. I was running a test with shorter forecast length (24->3), I noticed RT reports that the atmos_4xdaily.nc reports ALT OK compared to baseline even though the run has fewer forecast times in the file, is that expected? I guess this could also be true if the diag_table is changed to have fewer fields, as long as the file has subset data of baseline data, the compare_netcdf will report the comparison is OK. On Wed, Jan 27, 2021 at 9:23 AM Minsuk Ji ***@***.***> wrote: > I think so. The log_hera.intel/run_001 script, for example, should have a > line with "cp ... fv3_N.exe RUNDIR/fv3.exe" > … <#m_-4685041434677695492_m_7022545815085251882_> > On Jan 27, 2021, at 7:16 AM, Denise Worthen *@*.***> wrote: I can see > where you'd use this---the trouble I have is associating the compile line > w/ the test number. Short of counting each COMPILE line, is there a way to > know which fv3_XX.exe is used for the failed test? — You are receiving this > because you commented. Reply to this email directly, view it on GitHub <#395 > (comment) > <#395 (comment)>>, > or unsubscribe > https://github.com/notifications/unsubscribe-auth/AB5C2RIXWSRQQE6GTFWL7RTS4AN4ZANCNFSM4WUCLANA > . > > That is a good suggestion. Let me look into implementing it. > > — > You are receiving this because you are subscribed to this thread. > Reply to this email directly, view it on GitHub > <#395 (comment)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AI7D6TKBL3HEVBRWFHNNENTS4AOWRANCNFSM4WUCLANA> > . >

MinsukJi-NOAA · 2021-01-27T14:37:16Z

@junwang-noaa, let me look into both your questions.

MinsukJi-NOAA · 2021-01-27T20:01:37Z

I have another question. I was running a test with shorter forecast length (24->3), I noticed RT reports that the atmos_4xdaily.nc reports ALT OK compared to baseline even though the run has fewer forecast times in the file, is that expected? I guess this could also be true if the diag_table is changed to have fewer fields, as long as the file has subset data of baseline data, the compare_netcdf will report the comparison is OK.

If forecast length is different, ALT CHECK should lead to NOT OK, because the array dimensions are different between baseline and rundir. Changes were made to add dimension check to compare_ncfile.py just now. Similarly, if the number of variables is different in the fields, it should lead to ALT CHECK NOT OK.

MinsukJi-NOAA · 2021-01-29T19:01:34Z

This will be merged together with @DomHeinzeller 's PR #396.

MinsukJi-NOAA · 2021-02-01T19:07:58Z

This will be merged together with #396.

climbfuji · 2021-02-16T20:45:23Z

@MinsukJi-NOAA I believe this was just merged as part of #396. Can you check and, if so, close the PR please? Thanks.

MinsukJi-NOAA · 2021-02-16T20:47:18Z

Merged via #396

) * Fix @ issue on LOGDIR. * Get rid of RUN_CMD_* specification in deactivate_tasks. * Add TEST_ALT_* directories to all machines. * Enforce config sourcing order in setup. * Also fix DYN/PHY dir @ situation.

MinsukJi-NOAA added 3 commits January 26, 2021 18:34

Fix rt.sh -n option to work with new rt.conf MACHINES format

6b6d74f

Fix compare_nc ufs-community#296

6e9dfc7

Make compare_ncfile.py return 2 if files are different since it retur…

e68a134

…ns 1 if error occurs.

climbfuji reviewed Jan 26, 2021

View reviewed changes

Fix a typo in rt_utils.sh. Add Hera Intel RT results.

a4c570c

Add variable dimension check in compare_ncfile.py

03a15e5

Use compare_ncfile only if it's a *.nc file. Add error messages

be86f1a

MinsukJi-NOAA marked this pull request as ready for review January 29, 2021 18:59

MinsukJi-NOAA requested review from climbfuji, DusanJovic-NOAA, junwang-noaa and DeniseWorthen January 29, 2021 18:59

DusanJovic-NOAA approved these changes Jan 29, 2021

View reviewed changes

DeniseWorthen approved these changes Jan 29, 2021

View reviewed changes

Merge the latest develop branch

193d141

MinsukJi-NOAA closed this Feb 16, 2021

MinsukJi-NOAA deleted the rt-fixes branch February 24, 2021 19:20

DeniseWorthen mentioned this pull request Mar 9, 2021

[BUG]: rt.sh reuses old executables which can cause confusion if a more recent build fails #228

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rt bug fixes #395

rt bug fixes #395

MinsukJi-NOAA commented Jan 26, 2021

climbfuji left a comment

DeniseWorthen commented Jan 27, 2021

climbfuji commented Jan 27, 2021 via email

DeniseWorthen commented Jan 27, 2021

climbfuji commented Jan 27, 2021 via email

MinsukJi-NOAA commented Jan 27, 2021

climbfuji commented Jan 27, 2021

junwang-noaa commented Jan 27, 2021 via email

junwang-noaa commented Jan 27, 2021 via email

MinsukJi-NOAA commented Jan 27, 2021

MinsukJi-NOAA commented Jan 27, 2021

MinsukJi-NOAA commented Jan 29, 2021

MinsukJi-NOAA commented Feb 1, 2021

climbfuji commented Feb 16, 2021

MinsukJi-NOAA commented Feb 16, 2021

rt bug fixes #395

rt bug fixes #395

Conversation

MinsukJi-NOAA commented Jan 26, 2021

Description

Issue(s) addressed

Testing

climbfuji left a comment

Choose a reason for hiding this comment

DeniseWorthen commented Jan 27, 2021

climbfuji commented Jan 27, 2021 via email

DeniseWorthen commented Jan 27, 2021

climbfuji commented Jan 27, 2021 via email

MinsukJi-NOAA commented Jan 27, 2021

climbfuji commented Jan 27, 2021

junwang-noaa commented Jan 27, 2021 via email

junwang-noaa commented Jan 27, 2021 via email

MinsukJi-NOAA commented Jan 27, 2021

MinsukJi-NOAA commented Jan 27, 2021

MinsukJi-NOAA commented Jan 29, 2021

MinsukJi-NOAA commented Feb 1, 2021

climbfuji commented Feb 16, 2021

MinsukJi-NOAA commented Feb 16, 2021