-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++20][Modules] Quote header unit name in preprocessor output (-E) #112883
[C++20][Modules] Quote header unit name in preprocessor output (-E) #112883
Conversation
Summary: Before this change clang produced output with header unit names that may conaint path separators, dots and other non-identifier characters. This diff prints header unit name in quotes and -E output can be compiled again. Also remove unnecessary space between header unit name and semi. Test Plan: check-clang
@llvm/pr-subscribers-clang-modules @llvm/pr-subscribers-clang Author: Dmitry Polukhin (dmpolukhin) ChangesSummary: Test Plan: check-clang Full diff: https://github.com/llvm/llvm-project/pull/112883.diff 4 Files Affected:
diff --git a/clang/lib/Frontend/PrintPreprocessedOutput.cpp b/clang/lib/Frontend/PrintPreprocessedOutput.cpp
index 383d4356084916..1005825441b3e6 100644
--- a/clang/lib/Frontend/PrintPreprocessedOutput.cpp
+++ b/clang/lib/Frontend/PrintPreprocessedOutput.cpp
@@ -952,13 +952,15 @@ static void PrintPreprocessedTokens(Preprocessor &PP, Token &Tok,
continue;
} else if (Tok.is(tok::annot_header_unit)) {
// This is a header-name that has been (effectively) converted into a
- // module-name.
+ // module-name, print them inside quote.
// FIXME: The module name could contain non-identifier module name
- // components. We don't have a good way to round-trip those.
+ // components and OS specific file paths components. We don't have a good
+ // way to round-trip those.
Module *M = reinterpret_cast<Module *>(Tok.getAnnotationValue());
std::string Name = M->getFullModuleName();
- Callbacks->OS->write(Name.data(), Name.size());
- Callbacks->HandleNewlinesInToken(Name.data(), Name.size());
+ *Callbacks->OS << '"';
+ Callbacks->OS->write_escaped(Name);
+ *Callbacks->OS << '"';
} else if (Tok.is(tok::annot_embed)) {
// Manually explode the binary data out to a stream of comma-delimited
// integer values. If the user passed -dE, that is handled by the
diff --git a/clang/lib/Lex/TokenConcatenation.cpp b/clang/lib/Lex/TokenConcatenation.cpp
index 865879d1805336..05f4203bd722bb 100644
--- a/clang/lib/Lex/TokenConcatenation.cpp
+++ b/clang/lib/Lex/TokenConcatenation.cpp
@@ -160,6 +160,10 @@ static char GetFirstChar(const Preprocessor &PP, const Token &Tok) {
bool TokenConcatenation::AvoidConcat(const Token &PrevPrevTok,
const Token &PrevTok,
const Token &Tok) const {
+ // No space is required between header unit name in quote and semi.
+ if (PrevTok.is(tok::annot_header_unit) && Tok.is(tok::semi))
+ return false;
+
// Conservatively assume that every annotation token that has a printable
// form requires whitespace.
if (PrevTok.isAnnotation())
diff --git a/clang/test/Headers/header_unit_preprocessed_output.cpp b/clang/test/Headers/header_unit_preprocessed_output.cpp
new file mode 100644
index 00000000000000..12b4ab6159f346
--- /dev/null
+++ b/clang/test/Headers/header_unit_preprocessed_output.cpp
@@ -0,0 +1,20 @@
+// RUN: rm -fR %t
+// RUN: split-file %s %t
+// RUN: cd %t
+// RUN: %clang_cc1 -verify -std=c++20 -emit-header-unit -xc++-user-header bz0.h
+// RUN: %clang_cc1 -verify -std=c++20 -fmodule-file=bz0.pcm -xc++-user-header bz1.h -E -o bz1.output.h
+// RUN: FileCheck %s < bz1.output.h
+// RUN: %clang_cc1 -std=c++20 -fmodule-file=bz0.pcm -emit-header-unit -xc++-user-header bz1.output.h
+
+//--- bz0.h
+// expected-no-diagnostics
+#pragma once
+
+void foo();
+
+//--- bz1.h
+// expected-no-diagnostics
+import "bz0.h";
+
+// CHECK: # 1 ".{{/|\\\\?}}bz1.h"
+// CHECK: import ".{{/|\\\\?}}bz0.h";
diff --git a/clang/test/Modules/cxx20-include-translation.cpp b/clang/test/Modules/cxx20-include-translation.cpp
index 7bf3184325829e..a607e7c0373f81 100644
--- a/clang/test/Modules/cxx20-include-translation.cpp
+++ b/clang/test/Modules/cxx20-include-translation.cpp
@@ -105,6 +105,6 @@ export void charlie() {
}
// CHECK: #pragma clang module import ".{{/|\\\\?}}h1.h"
-// CHECK: import .{{/|\\\\?}}h2.h
-// CHECK: import .{{/|\\\\?}}h3.h
+// CHECK: import ".{{/|\\\\?}}h2.h"
+// CHECK: import ".{{/|\\\\?}}h3.h"
// CHECK-NOT: #pragma clang module import ".{{/|\\\\?}}h4.h"
|
Module *M = reinterpret_cast<Module *>(Tok.getAnnotationValue()); | ||
std::string Name = M->getFullModuleName(); | ||
Callbacks->OS->write(Name.data(), Name.size()); | ||
Callbacks->HandleNewlinesInToken(Name.data(), Name.size()); | ||
*Callbacks->OS << '"'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the header was imported by <>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a good question but I don't see easy way to propagate isAngled
from Preprocessor::HandleHeaderIncludeOrImport
to tokens that will be printed here. This information cannot be putted to Module
because it is not a characteristic of the module but information how it was spelled (even pointer to Module*
is passed as void*
token annotation value). Spelling file path in quotes at least won't break C++ parsing of the output as it happens now. FIXME here already covers that information about spelled header unit should be preserved. But if you think it is required fix now, I'll create a new structure that will contain Module*, isAngled and spelled header-unit name to resolve FIXME completely - it will require more changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I don't want to block such issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Module *M = reinterpret_cast<Module *>(Tok.getAnnotationValue()); | ||
std::string Name = M->getFullModuleName(); | ||
Callbacks->OS->write(Name.data(), Name.size()); | ||
Callbacks->HandleNewlinesInToken(Name.data(), Name.size()); | ||
*Callbacks->OS << '"'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I don't want to block such issues.
…lvm#112883) Summary: Before this change clang produced output with header unit names that may conaint path separators, dots and other non-identifier characters. This diff prints header unit name in quotes and -E output can be compiled again. Also remove unnecessary space between header unit name and semi. Test Plan: check-clang
Summary:
Before this change clang produced output with header unit names that may conaint path separators, dots and other non-identifier characters. This diff prints header unit name in quotes and -E output can be compiled again. Also remove unnecessary space between header unit name and semi.
Test Plan: check-clang