-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Identify who maintains stack where and establish a process for updating the stacks #83
Comments
Doesn't @GeorgeVandenberghe-NOAA usually do this? |
I am still without functional access due to the destruction of my GFE
laptop 11/12 by a forced upgrade. I expect a repair by COB Friday 11/20
…On Thu, Nov 19, 2020 at 11:13 AM Edward Hartnett ***@***.***> wrote:
Doesn't @GeorgeVandenberghe-NOAA
<https://github.com/GeorgeVandenberghe-NOAA> usually do this?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#83 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ANDS4FQ5FWSDPAMWN3FMF33SQU73NANCNFSM4T2SATAQ>
.
--
George W Vandenberghe
*IMSG* at NOAA/NWS/NCEP/EMC
5830 University Research Ct., Rm. 2141
College Park, MD 20740
George.Vandenberghe@noaa.gov
301-683-3769(work) 3017751547(cell)
|
OK, great opportunity to identify some back-ups to @GeorgeVandenberghe-NOAA ! |
Can we start with an exhaustive list of machines we are responsible to install hpc-stack on? @GeorgeVandenberghe-NOAA which machines would you install on? |
Hang and I usually install hpc-stack. Orion, Hera, Jet, WCOSS-Dell |
Just those 4 machines then? Where do you install it? That is, under what root directory? |
Hang and I have kinda of being doing it ad-hoc. I think I installed it on Hera and Jet, and he did WCOSS and Orion. I think he and I should split up which machines we're responsible for and formally document that. |
Do we have a non lmod capability so we can build it on gaea and wcossC ?
I would also like it to be THE stack we use on weird new machines like some
azure cluster of the near future.
…Sent from my phone
On Friday, November 20, 2020, Kyle Gerheiser ***@***.***> wrote:
Hang and I have kinda of being doing it ad-hoc. I think I installed it on
Hera and Jet, and he did WCOSS and Orion.
I think he and I should split up which machines we're responsible for and
formally document that.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#83 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ANDS4FTDNCUGIENMWETEX5TSQ2YW5ANCNFSM4T2SATAQ>
.
--
George W Vandenberghe
*IMSG* at NOAA/NWS/NCEP/EMC
5830 University Research Ct., Rm. 2141
College Park, MD 20740
George.Vandenberghe@noaa.gov
301-683-3769(work) 3017751547(cell)
|
The README would be a good place to document this. We have authors and code manager, add a section "Installers." |
It can be built on systems without lmod, but then you don't have modules |
I am doing cheyenne with both gnu and intel, and this one will likely stay with me.
I am currrently doing jet - hope to get rid of this responsibility once it is a tier-1 platform for the ufs-weather-model. Arun created an issue in the ufs-weather-model github repo to elevate jet to tier-1.
I am also doing gaea - ok to keep it as tier-2 platform as cheyenne, or pass it on to emc as a tier-1 platform.
… On Nov 20, 2020, at 12:08 PM, Kyle Gerheiser ***@***.***> wrote:
It can be built on systems without lmod, but then you don't have modules
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub <#83 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB5C2RITHLX5KLGYXQF3C2DSQ25BRANCNFSM4T2SATAQ>.
|
@climbfuji Hang or I can do Jet. We've been maintaining a build of hpc-stack on there |
@kgerheiser this would be great, and a necessary first step to make jet a tier-1 platform. @arunchawla-NOAA created an issue for this work here: ufs-community/ufs-weather-model#271 - once you install the stack on jet, can you please let the ufs-weather-model code managers (@junwang-noaa, @DusanJovic-NOAA, myself) know so that we can update the modulefile? Going forward, we should continue the discussion and work towards making jet a tier-1 platform in the ufs-weather-model issue 271. |
I agree although Jet has the added issue that it is a heterogeneous
platform with different node types and hardware. This makes resource
specification in workflows where a job can land on any jet, a
nuisance level problem. A module change at the admin level March 2019
broke our workflow enough we never really got it working again but it's
definitely doable and tractable. A stack that looks the same across all
platforms, will be a big advance. HPC-Stack does that. Admin modules
don't. One of the big advantages of my ancient and obsolete tarball
nceplibs distro was that module names were the same on all platforms .
…On Fri, Nov 20, 2020 at 5:23 PM Dom Heinzeller ***@***.***> wrote:
@kgerheiser <https://github.com/kgerheiser> this would be great, and a
necessary first step to make jet a tier-1 platform. @arunchawla-NOAA
<https://github.com/arunchawla-NOAA> created an issue for this work here:
ufs-community/ufs-weather-model#271
<ufs-community/ufs-weather-model#271> - once
you install the stack on jet, can you please let the ufs-weather-model code
managers ***@***.*** <https://github.com/junwang-noaa>,
@DusanJovic-NOAA <https://github.com/DusanJovic-NOAA>, myself) know so
that we can update the modulefile?
Going forward, we should continue the discussion and work towards making
jet a tier-1 platform in the ufs-weather-model issue 271.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#83 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ANDS4FXAXVBLMWJMPGC2ASLSQ3T33ANCNFSM4T2SATAQ>
.
--
George W Vandenberghe
*IMSG* at NOAA/NWS/NCEP/EMC
5830 University Research Ct., Rm. 2141
College Park, MD 20740
George.Vandenberghe@noaa.gov
301-683-3769(work) 3017751547(cell)
|
Yes, hpc-stack does not use the nightmare flag The fact that jet has different node types and hardware is one reason why we need to make it a tier-1 platform - we need to make sure that our codes function in such an environment. The ufs-weather-model currently works around the default AVX2 flags by compiling the model with multiple SIMD instruction sets on jet:
While this provides flexibility, it makes compiling a lot slower. We may consider other options such as only specifying The |
When I build my old portable tarball NCEPLIBS, on jet I did it in a batch
job on tjet to use the lowest instruction set possible.
There were numerous cases where mine worked and the admins' didn't.
…On Fri, Nov 20, 2020 at 5:57 PM Dom Heinzeller ***@***.***> wrote:
Yes, hpc-stack does not use the nightmare flag -xHOST, which makes this
possible.
The fact that jet has different node types and hardware is one reason why
we need to make it a tier-1 platform - we need to make sure that our codes
function in such an environment.
The ufs-weather-model currently works around the default AVX2 flags by
compiling the model with multiple SIMD instruction sets on jet:
elseif(SIMDMULTIARCH)
set(CMAKE_Fortran_FLAGS "${CMAKE_Fortran_FLAGS} -axSSE4.2,AVX,CORE-AVX2,CORE-AVX512 -qno-opt-dynamic-align")
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -axSSE4.2,AVX,CORE-AVX2,CORE-AVX512 -qno-opt-dynamic-align")
While this provides flexibility, it makes compiling *a lot* slower. We
may consider other options such as only specifying -axSSE4.2,CORE-AVX2 or
turning off SIMD instructions entirely on jet. TBD.
The rt.sh scripts currently compile and run on xjet, but there is no
reason to keep doing this. We could run some tests on xjet, some on kjet,
some on whatever-jet. TBD.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#83 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ANDS4FVFUMU46QQBFHKD37DSQ3X6PANCNFSM4T2SATAQ>
.
--
George W Vandenberghe
*IMSG* at NOAA/NWS/NCEP/EMC
5830 University Research Ct., Rm. 2141
College Park, MD 20740
George.Vandenberghe@noaa.gov
301-683-3769(work) 3017751547(cell)
|
Here's a summary from the comments above:
Is that all of them? I would suggest that we mark the release, then everyone install and report back either success or problems. If there are problems, we hold the release, resolve the problems, and move the tag to the fixed release. Once there are no problems and we are all happy with the release, we announce it, and move on to planning of the 1.2.0 release. |
Elsewhere @mark-a-potts mentions a system called "acorn". Is that a NOAA system? Mark, do you want to try our our 1.1.0 release before we announce it? Or do you want to try building the develop branch? |
Acorn is the name of WCOSS2 machine |
Lets leave wcoss2 (acorn) out of this release. |
OK I've added an issue for acorn and assigned it to the next release (1.2.0). |
maybe we should create a milestone for 1.2.0 and identify issues to address for that? We need to add met plus libraries before we roll it out on WCOSS2. Has the met team created an issue for that? |
@arunchawla-NOAA to add an issue to the next release, use the "Project" pull-down on the right side of the issue screen. At each weekly meeting we will examine the issue list for the next release, and also place any new issues into a release. For release planning for the 1.2.0 release, see: https://github.com/NOAA-EMC/hpc-stack/projects/2 (New issues can also be added from this screen, or selected from the issue list and added to the release with the Add Cards button on upper right.) There is as yet no issue for the met plus libraries, and I will add that now. |
(@arunchawla-NOAA for release planning of the upcoming 1.1.0 release see https://github.com/NOAA-EMC/hpc-stack/projects/1). |
May I ask who will maintain the hpc stack equivalent nceplibs module files
on cray? If the library requires model code changes, the library needs to
be installed on cray too.
…On Sat, Nov 21, 2020 at 9:55 AM Edward Hartnett ***@***.***> wrote:
***@***.*** <https://github.com/arunchawla-NOAA> for release
planning of the upcoming 1.1.0 release see
https://github.com/NOAA-EMC/hpc-stack/projects/1).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#83 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AI7D6TKGCEILW3CQWPNMQYTSQ7IHBANCNFSM4T2SATAQ>
.
|
Hang will take care of WCOSS Cray. |
Please reinstall hpc-stack on WCOSS2. It's broken after they renamed /lfs/h2 to /lfs/h1.
MODULEPATH still points to /lsf/h2. |
The test version you mentioned is being renewing and will be fully ready in
an hour or so.
…On Wed, Nov 25, 2020 at 3:32 PM Dusan Jovic ***@***.***> wrote:
Please reinstall hpc-stack on WCOSS2. It's broken after they renamed
/lfs/h2 to /lfs/h1.
$ module show hpc/1.0.0-beta1
------------------------------------------------------------------------------------------------------
/lfs/h1/emc/nceplibs/noscrub/hpc-stack/test/noaa/modulefiles/stack/hpc/1.0.0-beta1.lua:
------------------------------------------------------------------------------------------------------
help([[]])
conflict("hpc")
setenv("HPC_OPT","/lfs/h2/emc/nceplibs/noscrub/hpc-stack/test/noaa")
prepend_path("MODULEPATH","/lfs/h2/emc/nceplibs/noscrub/hpc-stack/test/noaa/modulefiles/core")
setenv("LMOD_EXACT_MATCH","no")
setenv("LMOD_EXTENDED_DEFAULT","yes")
whatis("Name: hpc")
whatis("Version: 1.0.0-beta1")
whatis("Category: Base")
whatis("Description: Initialize HPC software stack")
MODULEPATH still points to /lsf/h2.
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
<#83 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AKWSMFEK5HIOZLXF62O2J2LSRVSUPANCNFSM4T2SATAQ>
.
|
Clearly identify, who officially maintains a stack on which machine.
There can be a back-up.
Establish a process for updating a stack and the versioning that goes with it:
And more.
The text was updated successfully, but these errors were encountered: