forked from MaxMood96/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pull] main from MaxMood96:main #752
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…iaction problems The thread identification problems in coroutines are addressed now in 1cedc51. Although the original problem occurs in LLVM optimizer. The C++ users have strong feeling about it. So it may be necessary to add a ReleaseNote in clang for it. Closes llvm#47177 Closes llvm#47179
This patch moves the judgement if the std c++ modules feature is enabled into the RenderModulesOptions function. It simplify the code a little bit further more. It also helps further patches.
Preliminary work on HLFIR. Introduce option that will allow testing lowering via HLFIR until this is ready to replace the current expression lowering. See https://reviews.llvm.org/D134285 for more context about the plan. Differential Revision: https://reviews.llvm.org/D135959
Also remove -mattr=-flat-for-global which is not needed for generated checks.
When looking for underlying objects, if we encounter one that we have already seen, then we should skip it (as it has already been checked) rather than bail out. In particular, this adds support for the case where we have a loop use of a phi recurrence.
Differential Revision: https://reviews.llvm.org/D135348
…ocker Fixes llvm#58283 When running in a docker container you can have fewer cores assigned to you than get_nrpoc would suggest. Since the test just wants to know that interception worked, allow any result > 0 and <= the global core count. Reviewed By: MaskRay, vitalybuka Differential Revision: https://reviews.llvm.org/D135677
This reverts commit 921a4d5. Due to buildbot failures on Arm and Arm64. https://lab.llvm.org/buildbot/#/builders/96/builds/30231
This reverts commit 3577e60. Due to buildbot failures on Arm and Arm64. https://lab.llvm.org/buildbot/#/builders/96/builds/30231
Every non-testcase use of OutputBuffer contains code to allocate an initial buffer (using either 128 or 1024 as initial guesses). There's now no need to do that, given recent changes to the buffer extension heuristics -- it allocates a 1k(ish) buffer on first need. Just pass in a buffer (if any) to the constructor. Thus the OutputBuffer's ownership of the buffer starts at its own lifetime start. We can reduce the lifetime of this object in several cases. That new constructor takes a 'size_t *' for the size argument, as all uses with a non-null buffer are passing through a malloc'd buffer from their own caller in this manner. The buffer reset member function is never used, and is deleted. Some adjustment to a couple of uses is needed, due to the lazy buffer creation of this patch. a) the Microsoft demangler can demangle empty strings to nothing, which it then memoizes. We need to avoid the UB of passing nullptr to memcpy. b) a unit test checks insertion of no characters into an empty buffer. We need to avoid UB when converting that to std::string. The original buffer initialization code would return a failure code if that first malloc failed. Existing code either ignored that, called std::terminate with a FIXME, or returned an error code. But that's not foolproof anyway, as a subsequent buffer extension failure ends up calling std::terminate. I am working on addressing that unfortunate failure mode in a manner more consistent with the C++ ABI design. Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D122604
LoopFlatten has been in the code base off by default for years, but this enables it to run by default. Downstream this has been running for years, so it has been exposed to quite some code. Then around the time we switched to the NPM, several fixes went in related to updating the MemorySSA state and we moved it to a loop pass manager, which both helped preventing rerunning certain analysis passes, and thus helped a bit with compile-times. About compile-times, adding a pass isn't free, but this should see only very minor increases. The pass is relatively simple and there shouldn't be anything algorithmically expensive because all it does is looking at inner/outer loops and it checks assumptions on loop increments and indices. If we see increases, I expect this to mainly come from invalidation of analysis info, and perhaps subsequent passes to trigger and do more. Despite its simplicity/restrictions, it triggers in most code-bases, which makes it worth to enable this by default. Differential Revision: https://reviews.llvm.org/D109958
… lowering The symbol may be used by use-association for multiple times such as one in module specification part and one in module procedure. Then in module procedure, the variable instantiation will be called for multiple times. But we only need to threadprivatize it once and use the threadprivatized value for the second time. Fix llvm#58379. Reviewed By: kiranchandramohan Differential Revision: https://reviews.llvm.org/D136035
…ify" This reverts commit 333246b. It looks like this patch causes a mis-compile: llvm#58401 Fixes llvm#58401.
Use PrintError to extend the error message with location information in LLVMIRConversionGen.cpp. Additionally, promote potentially user facing error messages from assertions to real errors. Reviewed By: ftynse Differential Revision: https://reviews.llvm.org/D136057
llvm-debuginfo-analyzer is a command line tool that processes debug info contained in a binary file and produces a debug information format agnostic “Logical View”, which is a high-level semantic representation of the debug info, independent of the low-level format. The code has been divided into the following patches: 1) Interval tree 2) Driver and documentation 3) Logical elements 4) Locations and ranges 5) Select elements 6) Warning and internal options 7) Compare elements 8) ELF Reader 9) CodeView Reader Full details: https://discourse.llvm.org/t/llvm-dev-rfc-llvm-dva-debug-information-visual-analyzer/62570 This patch: Driver and documentation - Command line options. - Full documentation. - String Pool table. Reviewed By: psamolysov, probinson Differential Revision: https://reviews.llvm.org/D125777
The revision adds support for importing the masked load/store and gather/scatter intrinsics from LLVM IR. To enable the import, the revision also includes an extension of the mlirBuilder code generation to support variadic arguments. Depends on D136057 Reviewed By: ftynse Differential Revision: https://reviews.llvm.org/D136058
…d semantics `EraseIdentityGenericOp` for `!hasBufferSemantics()` assumed fully tensor semantics and tried to access non-existent return values. Differential Revision: https://reviews.llvm.org/D135725
This reverts commit fe7a3ce.
Currently, Fortran attributes are mostly represented via the presence of named attribute with special names (fir.target, fir.contiguous, fir.optional...). Create an enum so that these attributes can be more easily and safely manipulated in FIR. This patch does not add usages for it yet. It is planned to use in it in the future HLFIR and fir.declare operations. This is added to FIR and not HLFIR because it is intended to be used on fir.declare that will be part of FIR, and seems also usefull for FIR operations. Differential Revision: https://reviews.llvm.org/D135961
InstSimplify currently checks whether the instruction simplifies back to itself, and returns undef in that case. Generally, this should only occur in unreachable code. However, this was also done for the simplifyInstructionWithOperands() API. In that case, the instruction only serves as a template that provides the opcode and other non-operand data. In this case, simplifying back to the same "instruction" may be expected. This caused PR58401 in conjunction with D134954. As such, move this check into simplifyInstruction() only. The only other caller of simplifyInstructionWithOperands() also handles the self-simplification case explicitly.
Relative to the previous attempt, this is rebased over the InstSimplify fix in ac74e7a, which addresses the miscompile reported in PR58401. ----- foldOpIntoPhi() currently only folds operations into the phi if all but one operands constant-fold. The two exceptions to this are freeze and select, where we allow more general simplification. This patch makes foldOpIntoPhi() generally simplification based and removes all the instruction-specific logic. We just try to simplify the instruction for each operand, and for the (potentially) one non-simplified operand, we move it into the new block with adjusted operands. This fixes llvm#57448, which was my original motivation for the change. Differential Revision: https://reviews.llvm.org/D134954
When CurrentLoadedOffset is less than TotalSize, current code will trigger unsigned overflow and will not return an "allocation failed" indicator. Google ref: b/248613299 Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D135192
Always use non-symbolizing disassembler for instruction encoding validation as symbols will be treated as undefined/zeros be the encoder and causing byte sequence mismatches. Reviewed By: Amir Differential Revision: https://reviews.llvm.org/D136118
Reviewed By: lntue, michaelrj Differential Revision: https://reviews.llvm.org/D136143
…d scaffolding The command is thread trace dump function-calls and as minimum will require printing to a file in json and non-json format I added a test Differential Revision: https://reviews.llvm.org/D135521
…nstruction algorithm This diff implements the reconstruction algorithm for the call tree and add tests. See TraceDumper.h for documentation and explanations. One important detail is that the tree objects are in TraceDumper, even though Trace.h is a better home. I'm leaving that as future work. Another detail is that this code is as slow as dumping the entire symolicated trace, which is not that bad tbh. The reason is that we use symbols throughout the algorithm and we are not being careful about memory and speed. This is also another area for future improvement. Lastly, I made sure that incomplete traces work, i.e. you start tracing very deep in the stack or failures randomly appear in the trace. Differential Revision: https://reviews.llvm.org/D135917
The JSON dumper is very minimalistic. It pretty much only shows the delimiting instruction IDs of every segment, so that further queries to the SBCursor can be used to make sense of the data. It's main purpose is to be serialized somewhat cheaply. I also renamed untracedSegment to untracedPrefixSegment, in case in the future we add an untracedSuffixSegment. In any case, this new name is more explicit, which I like. Differential Revision: https://reviews.llvm.org/D136034
So that we require `opt -passes=` syntax for instrumentation passes. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D135042
builds SSA cycle for compress insertion loop adds casting on index mismatch during push_back Reviewed By: Peiming Differential Revision: https://reviews.llvm.org/D136186
This differential splits the SparseTensorEnums library out from the SparseTensorRuntime library. The actual moving of files will be handled in the next differential. Depends On D135996 Reviewed By: aartbik Differential Revision: https://reviews.llvm.org/D136002
Symbols occur at non-zero offsets in a subsection if they are `.alt_entry` symbols, or if `.subsections_via_symbols` is omitted. It doesn't seem like ld64 supports folding those subsections either. Moreover, supporting this it makes `foldIdentical` a lot more complicated to implement. The existing implementation has some questionable behavior around STABS omission -- if a section with an non-zero offset symbol was folded into one without, we would omit the STABS entry for the non-zero offset symbol. I will be following up with a diff that makes `foldIdentical` zero out the symbol sizes for folded symbols. Again, this is much easier to implement if we don't have to worry about non-zero offsets. Reviewed By: #lld-macho, oontvoo Differential Revision: https://reviews.llvm.org/D136000
This matches ld64's behavior. I also extended the icf-stabs.s test to demonstrate that even though folded symbols have size zero, we cannot use the size-zero property in lieu of `wasIdenticalCodeFolded`, because size zero symbols should still get STABS entries. Reviewed By: #lld-macho, thakis Differential Revision: https://reviews.llvm.org/D136001
This patch changes the kernels generated by OpenMP to have protected visibility. This is unlikely to change anything functionally. However, protected visibility better matches the behaviour of these GPU kernels. We do not expect any pending shared library load to preempt these kernels so we can specify a more restrictive visibility. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D136198
Currently, we parse lines inside of a compiler `#pragma` the same way we parse any other line. This is fine for some cases, like separating expressions and adding proper spacing, but in others it causes some poor results from miscategorizing some tokens. For example, the OpenMP offloading uses certain clauses that contain special characters like `map(tofrom : A[0:N])`. This will be formatted poorly as it will be split between lines on the first colon. Additionally the subscript notation will lead to poor spacing. This can be seen in the OpenMP tests as the automatic clang formatting with inevitably ruin the formatting. For example, the following contrived example will be formatted poorly. ``` #pragma omp target teams distribute collapse(2) map(to: A[0 : M * K]) \ map(to: B[0:K * N]) map(tofrom:C[0:M*N]) firstprivate(Alpha) \ firstprivate(Beta) firstprivate(X) firstprivate(D) firstprivate(Y) \ firstprivate(E) firstprivate(Z) firstprivate(F) ``` This results in this when formatted, which is far from ideal. ``` #pragma omp target teams distribute collapse(2) map(to \ : A [0:M * K]) \ map(to \ : B [0:K * N]) map(tofrom \ : C [0:M * N]) firstprivate(Alpha) \ firstprivate(Beta) firstprivate(X) firstprivate(D) firstprivate(Y) \ firstprivate(E) firstprivate(Z) firstprivate(F) ``` This patch seeks to improve this by adding extra logic where the parsing goes awry. This is primarily caused by the colon being parsed as an inline-asm directive and the brackes an objective-C expressions. Also the line gets indented every single time the line is dropped. This doesn't implement true parsing handling for OpenMP statements. Reviewed By: HazardyKnusperkeks Differential Revision: https://reviews.llvm.org/D136100
D110005 renamed LIBUNWIND_SUPPORTS_* to CXX_SUPPORTS_*. Reviewed By: MaskRay, #libunwind, mstorsjo Differential Revision: https://reviews.llvm.org/D136131
Reviewed By: michaelrj Differential Revision: https://reviews.llvm.org/D135949
Move the SparseTensorEnums library out of the ExecutionEngine directory and into Dialect/SparseTensor/IR. Depends On D136002 Reviewed By: aartbik Differential Revision: https://reviews.llvm.org/D136005
…data` The `SimplifyExtractStridedMetadata` pass features a pattern that forward statically known information (offset, sizes, strides) to their respective users. This patch moves this pattern from this pass to the `extract_strided_metadata` folding patterns. Differential Revision: https://reviews.llvm.org/D135797
This removes another massive source of redundancy, and instead has the Merger.{h,cpp} reuse the SparseTensorEnums library. Depends On D136005 Reviewed By: Peiming Differential Revision: https://reviews.llvm.org/D136123
Support SV_DispatchThreadID attribute. Translate it into dx.thread.id in clang codeGen. Reviewed By: beanz, aaron.ballman Differential Revision: https://reviews.llvm.org/D133983
This saves clients some boilerplate compared to setting up the readers and writers manually. To obtain a BinaryStreamWriter / BinaryStreamReader for a given block, B, clients can now write: auto Reader = G.getBlockContentReader(B); and auto Writer = G.getBlockContentWriter(B); The latter will trigger a copy to mutable memory allocated on the graph's allocator if the block is currently marked as backed by read-only memory. This commit also introduces a new createMutableContentBlock overload that creates a block with a given size and zero-filled content (by default -- passing false for the ZeroInitialize bypasses initialization entirely). This overload is intended to be used with getBlockContentWriter above when creating new content for the graph.
This is a fix related to D135414. The original intention was to keep `BaseFS` as a member of the worker and conditionally overlay it with local in-memory FS. The `move` of ref-counted `BaseFS` was not intended, and it's a bug. Disabling parallelism in the "by-module-name" test reliably reproduces this, and the test itself doesn't *need* parallelism. (I think `-j 4` was cargo culted from another test.) Reusing that test to check for correct behavior... Reviewed By: DavidSpickett Differential Revision: https://reviews.llvm.org/D136124
This reverts commit 9572406. The author name is wrong.
…are/-pie or --relocatable modes Add some checks around this combination of flags Also, honor `--global-base` when specified in `--stack-first` mode rather than ignoring it. But error out if the specified base preseeds the end of the stack. Differential Revision: https://reviews.llvm.org/D136117
D128750 incorrectly skips constraints partial ordering for deduction guide. This patch reverts that part. Fixes llvm#58456.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot]
Can you help keep this open source service alive? 💖 Please sponsor : )