Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pull] main from llvm:main #32

Merged
merged 9 commits into from
Sep 1, 2021
Merged

[pull] main from llvm:main #32

merged 9 commits into from
Sep 1, 2021

Conversation

pull[bot]
Copy link

@pull pull bot commented Sep 1, 2021

See Commits and Changes for more details.


Created by pull[bot]

Can you help keep this open source service alive? 💖 Please sponsor : )

aganea and others added 9 commits August 31, 2021 19:05
When a nodeduplicate COMDAT group contains a weak symbol, choose
a non-weak symbol (or one of the weak ones) rather than reporting
an error. This should address issue PR51394.

With the current IR representation, a generic comdat nodeduplicate
semantics is not representable for LTO. In the linker, sections and
symbols are separate concepts. A dropped weak symbol does not force the
defining input section to be dropped as well (though it can be collected
by GC). In the IR, when a weak linkage symbol is dropped, its associate
section content is dropped as well.

For InstrProfiling, which is where ran into this issue in PR51394, the
deduplication semantic is a sufficient workaround.

Differential Revision: https://reviews.llvm.org/D108689
Change pass-by-const-ref to pass-by-value as objects are recreated
due to custom up-/down-casting anwyway.
A new LLVM specific TAG DW_TAG_LLVM_annotation is added.
The name is suggested by Paul Robinson ([1]).
Currently, this tag is used to output __attribute__((btf_tag("string")))
annotations in dwarf. The following is an example for a global
variable with two btf_tag attributes:
  0x0000002a:   DW_TAG_variable
                  DW_AT_name      ("g1")
                  DW_AT_type      (0x00000052 "int")
                  DW_AT_external  (true)
                  DW_AT_decl_file ("/tmp/home/yhs/work/tests/llvm/btf_tag/t.c")
                  DW_AT_decl_line (8)
                  DW_AT_location  (DW_OP_addr 0x0)

  0x0000003f:     DW_TAG_LLVM_annotation
                    DW_AT_name    ("btf_tag")
                    DW_AT_const_value     ("tag1")

  0x00000048:     DW_TAG_LLVM_annotation
                    DW_AT_name    ("btf_tag")
                    DW_AT_const_value     ("tag2")

  0x00000051:     NULL

In the future, DW_TAG_LLVM_annotation may encode other type
of non-string const value.

 [1] https://lists.llvm.org/pipermail/llvm-dev/2021-June/151250.html

Differential Revision: https://reviews.llvm.org/D106621
Analogous to the TSFlags for machine instructions, this
patch introduces a bit vector for register classes to have
target specific flags that become a tablegened value in
TargetRegisterClass.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D108767
@pull pull bot added the ⤵️ pull label Sep 1, 2021
@pull pull bot merged commit 6a75041 into MaxMood96:main Sep 1, 2021
pull bot pushed a commit that referenced this pull request Oct 14, 2021
The patch attempts to optimize a sequence of SIMD loads from the same
base pointer:

    %0 = gep float*, float* base, i32 4
    %1 = bitcast float* %0 to <4 x float>*
    %2 = load <4 x float>, <4 x float>* %1
    ...
    %n1 = gep float*, float* base, i32 N
    %n2 = bitcast float* %n1 to <4 x float>*
    %n3 = load <4 x float>, <4 x float>* %n2

For AArch64 the compiler generates a sequence of LDR Qt, [Xn, #16].
However, 32-bit NEON VLD1/VST1 lack the [Wn, #imm] addressing mode, so
the address is computed before every ld/st instruction:

    add r2, r0, #32
    add r0, r0, #16
    vld1.32 {d18, d19}, [r2]
    vld1.32 {d22, d23}, [r0]

This can be improved by computing address for the first load, and then
using a post-indexed form of VLD1/VST1 to load the rest:

    add r0, r0, #16
    vld1.32 {d18, d19}, [r0]!
    vld1.32 {d22, d23}, [r0]

In order to do that, the patch adds more patterns to DAGCombine:

  - (load (add ptr inc1)) and (add ptr inc2) are now folded if inc1
    and inc2 are constants.

  - (or ptr inc) is now recognized as a pointer increment if ptr is
    sufficiently aligned.

In addition to that, we now search for all possible base updates and
then pick the best one.

Differential Revision: https://reviews.llvm.org/D108988
pull bot pushed a commit that referenced this pull request Apr 15, 2022
…n a loop

When untagging the stack, the compiler may emit a sequence like:
```
        .LBB0_1:
          st2g sp, [sp], #32
          sub x8, x8, #32
          cbnz x8, .LBB0_1
          stg sp, [sp], #16
```
These stack adjustments cannot be described by CFI instructions.

This patch disables merging of SP update with untagging, i.e. makes the
compiler use an additional scratch register (there should be plenty
available at this point as we are in the epilogue) and generate:
```
            mov     x9, sp
            mov     x8, #256
            stg     x9, [x9], #16
    .LBB0_1:
            sub     x8, x8, #32
            st2g    x9, [x9], #32
            cbnz    x8, .LBB0_1
            add     sp, sp, #272
```
Merging is disabled only when we need to generate asynchronous unwind
tables.

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D114548
pull bot pushed a commit that referenced this pull request Aug 10, 2023
The example sequence

  add z0.h, z0.h, #32
  lsr z0.h, #6
  st1b z0.h, x1

can be replaced with

  rshrnb z0.b, #6
  st1b z0.h, x1

As the top half of the destination elements are truncated.

In similar fashion,

  add z0.s, z0.s, #32
  lsr z1.s, z1.s, #6
  add z1.s, z1.s, #32
  lsr z0.s, z0.s, #6
  uzp1 z0.h, z0.h, z1.h

Can be replaced with

  rshrnb z1.h, z1.s, #6
  rshrnb z0.h, z0.s, #6
  uzp1 z0.h, z0.h, z1.h

Differential Revision: https://reviews.llvm.org/D155299
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants