Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rebuild for CUDA 11.8 support #70

Conversation

regro-cf-autotick-bot
Copy link
Contributor

@regro-cf-autotick-bot regro-cf-autotick-bot commented Sep 2, 2023

This PR has been triggered in an effort to update cuda118.

Notes and instructions for merging this PR:

  1. Please merge the PR only after the tests have passed.
  2. Feel free to push to the bot's branch to update this PR if needed.

Please note that if you close this PR we presume that the feedstock has been rebuilt, so if you are going to perform the rebuild yourself don't close this PR until the your rebuild has been merged.


If this PR was opened in error or needs to be updated please add the bot-rerun label to this PR. The bot will close this PR and schedule another one. If you do not have permissions to add this label, you can use the phrase @conda-forge-admin, please rerun bot in a PR comment to have the conda-forge-admin add it for you.

This PR was created by the regro-cf-autotick-bot. The regro-cf-autotick-bot is a service to automatically track the dependency graph, migrate packages, and propose package version updates for conda-forge. Feel free to drop us a line if there are any issues! This PR was generated by https://github.com/regro/cf-scripts/actions/runs/6057259647, please use this URL for debugging.

Closes #64

@conda-forge-webservices
Copy link

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

@jakirkham
Copy link
Member

@conda-forge-admin , please re-render

@jakirkham
Copy link
Member

@conda-forge-admin , please re-render

@jakirkham
Copy link
Member

jakirkham commented Nov 28, 2023

Think we will want this logic as well ( c2edc96 )

Edit: Done in commit ( 805825f ) below

@jakirkham
Copy link
Member

@conda-forge-admin , please re-render

Copy link
Contributor

Hi! This is the friendly automated conda-forge-webservice.

I tried to rerender for you, but it looks like there was nothing to do.

This message was generated by GitHub actions workflow run https://github.com/conda-forge/faiss-split-feedstock/actions/runs/7014307041.

Comment on lines +15 to +19
if [ $(version2int $cuda_compiler_version) -ge $(version2int "11.8") ]; then
# Hopper support for H100 (sm_90) needs cuda >= 11.8
LATEST_ARCH=90
# ARCHES does not contain LATEST_ARCH; see usage below
ARCHES=( "${ARCHES[@]}" 75 80 86)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a CUDA 11.8 branch here that includes sm_90

Please let me know if anything else is needed here

# Hopper support for H100 (sm_90) needs cuda >= 11.8
LATEST_ARCH=90
# ARCHES does not contain LATEST_ARCH; see usage below
ARCHES=( "${ARCHES[@]}" 75 80 86)
Copy link
Member

@jakirkham jakirkham Nov 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on how we want to handle sm_89 (if at all)

We could potentially add sm_89 here

Suggested change
ARCHES=( "${ARCHES[@]}" 75 80 86)
ARCHES=( "${ARCHES[@]}" 75 80 86 89)

Alternatively we could replace sm_86 with sm_89

Suggested change
ARCHES=( "${ARCHES[@]}" 75 80 86)
ARCHES=( "${ARCHES[@]}" 75 80 89)

Some relevant background

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given we are already running into CI time limits on Azure, think the answer needs to be picking 86 or 89, but not both (at least not without dropping others)

@jakirkham
Copy link
Member

It looks like TestClustering.test_ivf_train_2level failed on the CPU only build. Not sure what is going on there

Punting on the arch builds where we are running into CI limits. Likely need to trim GPU architectures there. Had tried pushing on cross-compiling as an alternative in the past without much luck ( #62 )

@h-vetinari
Copy link
Member

Something we can do (finally) is split off the AVX2 builds into separate jobs. It's on my backlog.

Currently this feedstock doesn't have CUDA arch builds. This PR adds
them as CUDA arch builds as they are handled as part of enabling arch
builds. Since these timeout currently and may require a bit more work to
figure out (like adding cross-compilation), disable CUDA arch builds for
now. Instead focus on OSes and architectures already supported here.
@jakirkham
Copy link
Member

@conda-forge-admin , please re-render

conda-forge-webservices[bot] and others added 2 commits December 19, 2023 07:59
@jakirkham
Copy link
Member

@conda-forge-admin , please re-render

conda-forge-webservices[bot] and others added 2 commits December 19, 2023 08:12
@jakirkham
Copy link
Member

@conda-forge-admin , please re-render

@jakirkham
Copy link
Member

@conda-forge-admin , please re-render

conda-forge-webservices[bot] and others added 2 commits December 19, 2023 08:24
@jakirkham jakirkham changed the title Rebuild for CUDA 11.8 w/arch support Rebuild for CUDA 11.8 support Dec 19, 2023
@jakirkham
Copy link
Member

It looks like TestClustering.test_ivf_train_2level failed on the CPU only build. Not sure what is going on there

Looks like upstream relaxed the threshold of this test ( facebookresearch/faiss#2927 ). Have added a patch doing the same

Punting on the arch builds where we are running into CI limits. Likely need to trim GPU architectures there. Had tried pushing on cross-compiling as an alternative in the past without much luck ( #62 )

Have disabled these above

@jakirkham
Copy link
Member

On Windows CUDA 11.8, it appears the job hung waiting for input in the midst of running the tests

tests\test_fast_scan_ivf.py ............................................ [ 34%]
Traceback (most recent call last):
  File "C:\Miniforge\Scripts\conda-mambabuild-script.py", line 9, in <module>
    sys.exit(main())
  File "C:\Miniforge\lib\site-packages\boa\cli\mambabuild.py", line 256, in main
    call_conda_build(action, config)
  File "C:\Miniforge\lib\site-packages\boa\cli\mambabuild.py", line 228, in call_conda_build
    result = api.build(
  File "C:\Miniforge\lib\site-packages\conda_build\api.py", line 253, in build
    return build_tree(
  File "C:\Miniforge\lib\site-packages\conda_build\build.py", line 3819, in build_tree
    test(pkg, config=metadata.config.copy(), stats=stats)
  File "C:\Miniforge\lib\site-packages\conda_build\build.py", line 3625, in test
    utils.check_call_env(
  File "C:\Miniforge\lib\site-packages\conda_build\utils.py", line 445, in check_call_env
    return _func_defaulting_env_to_os_environ("call", *popenargs, **kwargs)
  File "C:\Miniforge\lib\site-packages\conda_build\utils.py", line 416, in _func_defaulting_env_to_os_environ
    proc = PopenWrapper(_args, **kwargs)
  File "C:\Miniforge\lib\site-packages\conda_build\utils.py", line 286, in __init__
    self.out, self.err = self._execute(*args, **kwargs)
  File "C:\Miniforge\lib\site-packages\conda_build\utils.py", line 360, in _execute
    self.disk = max(directory_size(disk_usage_dir), self.disk)
  File "C:\Miniforge\lib\site-packages\conda_build\utils.py", line 186, in directory_size
    out = subprocess.check_output(command.format(path), shell=True)
  File "C:\Miniforge\lib\subprocess.py", line 421, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "C:\Miniforge\lib\subprocess.py", line 505, in run
    stdout, stderr = process.communicate(input, timeout=timeout)
  File "C:\Miniforge\lib\subprocess.py", line 1141, in communicate
    stdout = self.stdout.read()
KeyboardInterrupt
.............................................................

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! KeyboardInterrupt !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
%PREFIX%\lib\site-packages\faiss\swigfaiss.py:5811: KeyboardInterrupt
(to show a full traceback on KeyboardInterrupt use --full-trace)
========== 334 passed, 1 skipped, 4 deselected in 116.91s (0:01:56) ===========
Terminate batch job (Y/N)? 
Entering debug mode. Use h or ? for help. 

At D:\a\_tasks\CmdLine_d9bafed4-0b18-4f58-968d-86655b4d2ce9\2.231.0\ps_modules\VstsTaskSdk\ToolFunctions.ps1:113
char:13

+         if ($originalEncoding) {

+             ~~~~~~~~~~~~~~~~~
^C
[DBG]: PS D:\a\1\s>> 

Are we running pytest with pdb enabled? If so, that might explain this behavior (and we should disable using pdb in CI to fix this)

@jakirkham
Copy link
Member

It looks like TestClustering.test_ivf_train_2level failed on the CPU only build. Not sure what is going on there

Looks like upstream relaxed the threshold of this test ( facebookresearch/faiss#2927 ). Have added a patch doing the same

Appears that was not enough

>       self.assertLess(ndiff, 51)
E       AssertionError: 52 not less than 51

On Windows CUDA 11.8, it appears the job hung waiting for input in the midst of running the tests

Sorry think I misread. It looks like CUDA 11.2 jobs for Windows & Linux exceed the CI time limit ( 6hrs )

@jakirkham
Copy link
Member

@conda-forge-admin , please re-render

@jakirkham
Copy link
Member

JFYI we are planning to drop CUDA 11.2 support in conda-forge soon

Please see this announcement and this issue ( conda-forge/conda-forge-pinning-feedstock#5339 ) for more details

The smoothest upgrade path would be to add CUDA 11.8. IOW completing this migration PR

Please let us know if you have any questions on next steps

@jakirkham
Copy link
Member

JFYI CUDA 11.2 is now officially dropped

@jakirkham
Copy link
Member

@conda-forge-admin , please re-render

@conda-forge-webservices
Copy link

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

I do have some suggestions for making it better though...

For recipe:

Also fix lint
@conda-forge-webservices
Copy link

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

@jakirkham
Copy link
Member

@conda-forge-admin , please re-render

@h-vetinari h-vetinari mentioned this pull request Aug 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants