Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

boomerAMG on GPU for SolidMechanicsLagrangianSSLE #1054

Merged
merged 68 commits into from
Mar 9, 2021

Conversation

castelletto1
Copy link
Contributor

@castelletto1 castelletto1 commented Jul 16, 2020

The purpose of this PR is to enable boomerAMG-preconditioning on GPU for an elasticity problem. The linear algebra interface is set to hypre by default. As a model problem the following simple cantilivered cube problem has been added:

SSLE-QS-cantileveredCube.xml

The mesh consists of a unit cube that is discretized with a regular 10x10x10 Cartesian mesh.

GEOS-DEV/LvArray#213

The Hypre build options are in the corresponding thirdPartyLibs PR:
GEOS-DEV/thirdPartyLibs#132

@castelletto1 castelletto1 marked this pull request as draft July 16, 2020 01:26
@castelletto1 castelletto1 changed the title Optimizing boomerAMG parameters for SolidMechanicsLagrangianSSLE boomerAMG on GPU for SolidMechanicsLagrangianSSLE Aug 18, 2020
@oseikuffuor1
Copy link
Contributor

All, here are some recommendations for building and running with hypre on the GPU.
Configure Options:

  1. --with-cuda
  2. --enable-cusparse (this is optional and allows one to use cusparse for certain operations. You can ignore this for now)
  3. --enable-debug (Helpful for debugging issues down the road. Can be ommitted if the integration is stable)
  4. Need to set the environment variable HYPRE_CUDA_SM to match the SM (streaming multiprocessor) for the hardware. Default is 60, but change to 70 if running on lassen (V100 systems) for example.
  5. See https://hypre.readthedocs.io/en/latest/ch-misc.html for more details about building hypre for GPU support.

Enabling GPU support

  1. In user code, first call HYPRE_Init(); to initialize some GPU libraries prior to calling any other hypre functions
    2a. Set the HYPRE_ExecutionPolicy variable to device: HYPRE_ExecutionPolicy default_exec_policy = HYPRE_EXEC_DEVICE;
    2b. Set the default policy handle: hypre_HandleDefaultExecPolicy(hypre_handle()) = default_exec_policy; (this tells hypre to do AMG setup on device. Clearly you can combine step 2 and just set the handle to HYPRE_EXEC_DEVICE)
  2. Call HYPRE_Finalize(); at the end (before MPI_Finalize is called)
  3. See ij.c or ij_assembly.c in test directory of hypre for additional insight.

Runtime solver options:

  • Relaxation options: Use options 18 or 7.
  • Coarsening options: Only PMIS is supported
  • Interpolation options: Use options 3, 6, 14 or 15.
    Once everything is up and running, there may be other options to tune for performance. I have omitted them for now.

There's one minor edit from our discussion yesterday. I believe there were two scenarios:

  1. Assemble linear system on host and pass to hypre
  2. Assemble linear system on device and pass to hypre

Since geosx does not rely on unified memory, I am leaning towards option 2, as long as the data given to hypre is a consistent parcsr matrix data. I had mentioned that passing hypre host data should work, but this is only true with unified memory. Without unified memory, option 1 could be realized by moving the assembled matrix to device and calling hypre then. If you have questions about the linear system matrix setup on device, let me know and we can chat again. Of course let me know also if you have additional questions about these notes.

@andrea-franceschini
Copy link
Contributor

Does hypre provide a block Jacobi preconditioner?

blkdiag

@oseikuffuor1
Copy link
Contributor

Does hypre provide a block Jacobi preconditioner?

blkdiag

@AF1990 are these blocks per processor or per unknowns? We do not have a BJ preconditioned for the unknown version, but for the per processor blocks, we have the BJ ILU preconditioner.

@andrea-franceschini
Copy link
Contributor

@AF1990 are these blocks per processor or per unknowns? We do not have a BJ preconditioned for the unknown version, but for the per processor blocks, we have the BJ ILU preconditioner.

I am interested in the unknown version of the BJ preconditioner. I know that the processor version is already available.

@rrsettgast rrsettgast force-pushed the feature/boomerAMG-for-elasticity branch from d194411 to 16ffd62 Compare October 1, 2020 06:50
Copy link
Member

@rrsettgast rrsettgast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oseikuffuor1 I think this is setup per your instructions, but execution on Lassen using nvprof doesn't show a cuda kernel so it must still be executing on host. Any suggestions?

@@ -26,11 +26,13 @@ namespace geosx

// Check matching requirements on index/value types between GEOSX and SuperLU_Dist

#if !defined(GEOSX_USE_HYPRE_CUDA)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@corbett5 @klevzoff I won't be addressing this since hypre will be fixing their global dog index types soon.

@rrsettgast rrsettgast requested review from corbett5 and klevzoff March 8, 2021 07:39
CRSMatrix< real64 > tempMat;
tempMat.resize( localRows, src.numGlobalCols(), maxDstEntries );

for( globalIndex r=0; r<localRows; ++r )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean for this to be a parallel kernel launch? Or what's the purpose of using 2D arrays for srcIndices and srcValues?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a bunch of host functions in this for loop. They are from hypre, so we don't have control over them. It is pretty strange, but the underlying hypre functions are all host, but if running on device, they take device pointers.

/// Enables use of PETSc library (CMake option ENABLE_PETSC)
#define GEOSX_USE_PETSC

/// Choice of global linear algebra interface (CMake option GEOSX_LA_INTERFACE)
#define GEOSX_LA_INTERFACE Hypre
#define GEOSX_LA_INTERFACE Trilinos
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted back to Trilinos, intentional?

(We should look into changing the way this file is handled... it's getting out of hand, especially with more macros added)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be a general issue. I must have built on lassen, and accidentally committed this file.

@rrsettgast rrsettgast added the ci: run CUDA builds Allows to triggers (costly) CUDA jobs label Mar 8, 2021
@rrsettgast rrsettgast merged commit 3c24607 into develop Mar 9, 2021
@rrsettgast rrsettgast deleted the feature/boomerAMG-for-elasticity branch March 9, 2021 23:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci: run CUDA builds Allows to triggers (costly) CUDA jobs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants