Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run PR gpu utests/relvals on both CUDA and ROCm GPUs #2418

Open
wants to merge 17 commits into
base: master
Choose a base branch
from

Conversation

iarspider
Copy link
Contributor

@iarspider iarspider commented Jan 22, 2025


Additional changes :

@cmsbuild
Copy link
Contributor

cmsbuild commented Jan 22, 2025

cms-bot internal usage

@iarspider
Copy link
Contributor Author

please test with cms-sw/cmssw#46579

to check that cpu tests are not broken

@cmsbuild
Copy link
Contributor

-1

Failed Tests: ClangBuild
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b157ff/43916/summary.html
COMMIT: 118dd7e
CMSSW: CMSSW_15_0_X_2025-01-22-1100/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cms-bot/2418/43916/install.sh to create a dev area with all the needed externals and cmssw changes.

CMS deprecated warnings: 1 CMS deprecated warnings found, see summary page for details.

Clang Build

I found compilation warning while trying to compile with clang. Command used:

USER_CUDA_FLAGS='--expt-relaxed-constexpr' USER_CXXFLAGS='-Wno-register -fsyntax-only' /usr/bin/time -v scram build -k -j 32 COMPILER='llvm compile'

See details on the summary page.

@iarspider
Copy link
Contributor Author

please test with cms-sw/cmssw#47163

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b157ff/43917/summary.html
COMMIT: 118dd7e
CMSSW: CMSSW_15_0_X_2025-01-22-1100/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cms-bot/2418/43917/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially added 1 lines to the logs
  • Reco comparison results: 1664 differences found in the comparisons
  • DQMHistoTests: Total files compared: 49
  • DQMHistoTests: Total histograms compared: 3819085
  • DQMHistoTests: Total failures: 149
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3818916
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 48 files compared)
  • Checked 214 log files, 184 edm output root files, 49 DQM output files
  • TriggerResults: found differences in 1 / 47 workflows

@iarspider
Copy link
Contributor Author

iarspider commented Jan 23, 2025

test parameters:

  • enable = rocm
  • workflows_gpu = 141.044406,141.044408,141.044412,141.044414,141.044422,141.044424,160.03502

@iarspider
Copy link
Contributor Author

please test

@cmsbuild
Copy link
Contributor

Pull request #2418 was updated.

@iarspider
Copy link
Contributor Author

please test

@iarspider
Copy link
Contributor Author

please test

@cmsbuild
Copy link
Contributor

Pull request #2418 was updated.

@cmsbuild
Copy link
Contributor

-1

Failed Tests: RelVals-ROCM
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b157ff/44603/summary.html
COMMIT: f0974fa
CMSSW: CMSSW_15_1_X_2025-02-24-1100/el8_amd64_gcc12
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cms-bot/2418/44603/install.sh to create a dev area with all the needed externals and cmssw changes.

RelVals-ROCM

  • 141.044412141.044412_Run3-2023_JetMET2023D_RecoECALOnlyGPU/step3_Run3-2023_JetMET2023D_RecoECALOnlyGPU.log
  • 141.044408141.044408_Run3-2023_JetMET2023D_RecoPixelOnlyTripletsGPU_Profiling/step3_Run3-2023_JetMET2023D_RecoPixelOnlyTripletsGPU_Profiling.log
  • 141.044414141.044414_Run3-2023_JetMET2023D_RecoECALOnlyGPU_Profiling/step3_Run3-2023_JetMET2023D_RecoECALOnlyGPU_Profiling.log
Expand to see more relval errors ...

Comparison Summary

Summary:

  • You potentially added 1 lines to the logs
  • Reco comparison results: 8 differences found in the comparisons
  • DQMHistoTests: Total files compared: 49
  • DQMHistoTests: Total histograms compared: 3920300
  • DQMHistoTests: Total failures: 69
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3920211
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 48 files compared)
  • Checked 214 log files, 184 edm output root files, 49 DQM output files
  • TriggerResults: no differences found

CUDA Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 1
  • DQMHistoTests: Total histograms compared: 0
  • DQMHistoTests: Total failures: 0
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 0
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0 KiB( 0 files compared)
  • Checked 0 log files, 0 edm output root files, 1 DQM output files

@iarspider
Copy link
Contributor Author

please test

@cmsbuild
Copy link
Contributor

Pull request #2418 was updated.

@cmsbuild
Copy link
Contributor

Pull request #2418 was updated.

@iarspider
Copy link
Contributor Author

please test

@cmsbuild
Copy link
Contributor

Pull request #2418 was updated.

@@ -1326,7 +1342,9 @@ if [ "X$BUILD_OK" = Xtrue -a "$RUN_TESTS" = "true" ]; then
fi
if [ $(echo ${ENABLE_BOT_TESTS} | tr ',' ' ' | tr ' ' '\n' | grep '^GPU$' | wc -l) -gt 0 -a X"${DISABLE_GPU_TESTS}" != X"true" ] ; then
DO_GPU_TESTS=true
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: figure out how to set DO_GPU_TESTS now that "GPU" is not in EXTRA_RELVAL_TESTS. Maybe keep it and skip when scheduling relvals? L385, L1482,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants