Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

manual popcnt has unnecessary cmove since clang 16.0.0 #62450

Closed
y21 opened this issue Apr 29, 2023 · 4 comments
Closed

manual popcnt has unnecessary cmove since clang 16.0.0 #62450

y21 opened this issue Apr 29, 2023 · 4 comments

Comments

@y21
Copy link

y21 commented Apr 29, 2023

Code:

int popcnt(int x) {
    int cnt = 0;
    while (x) {
        cnt++;
        x &= x - 1;
    }
    return cnt;
}

On x86-64 clang 16.0.0 as well as trunk (17.0.0 d7fa921) with -O -march=haswell compiles to:

popcnt:                                 # @popcnt
        popcnt  eax, edi
        cmove   eax, edi
        ret

(Note the unnecessary cmove)

On clang 15.0.0 with the same options:

popcnt:                                 # @popcnt
        popcnt  eax, edi
        ret

Reproducer

@nikic
Copy link
Contributor

nikic commented Apr 29, 2023

We probably need to switch the select value equivalence fold to perform operand replacement ignoring poison flags/metadata and then drop them, rather than the current hack of trying to drop and restore flags (which doesn't cover the metadata case).

@llvmbot
Copy link
Member

llvmbot commented Apr 29, 2023

@llvm/issue-subscribers-backend-x86

@pcordes
Copy link

pcordes commented Aug 28, 2023

In case it helps, the extra cmove has come and gone in recent clang versions:

  • Clang 10: not present (good)
  • Clang 11, 12: present (bad)
  • Clang 13, 14, 15: not present (good)
  • Clang 16, 17-trunk: present (bad)

Making the variables unsigned doesn't help. The SWAR bithack idiom does avoid the cmove, giving us just popcnt. Other popcount implementations aren't recognized at all. https://godbolt.org/z/91j3feoE9 (thanks to harold at https://stackoverflow.com/questions/76992511/what-type-of-analysis-do-compilers-do-to-spot-opportunities-for-reducing-entires#comment135728081_76992511)

This happens even with -march=znver1 so it's (hopefully) unrelated to an attempt to mitigate the false output dependency in Sandybridge-family before Ice Lake. And clang doesn't do this for __builtin_popcnt (or for C++20 std::popcount), which also emit the popcnt instruction when the target supports it, so again, it doesn't seem to be a failed attempt to do something for popcnt in general.

(Related: lzcnt/tzcnt have a false output dependency that's fixed in Skylake. And bsf / bsr have a true output dependency on Intel and AMD CPUs which even the normally-cautious GCC doesn't bother to avoid. https://stackoverflow.com/questions/21390165/why-does-breaking-the-output-dependency-of-lzcnt-matter discusses popcnt as well since it's the same hardware execution unit)

@nikic
Copy link
Contributor

nikic commented Sep 19, 2023

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants