Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++20][Modules] Quote header unit name in preprocessor output (-E) #112883

Merged
merged 1 commit into from
Oct 24, 2024

Conversation

dmpolukhin
Copy link
Contributor

Summary:
Before this change clang produced output with header unit names that may conaint path separators, dots and other non-identifier characters. This diff prints header unit name in quotes and -E output can be compiled again. Also remove unnecessary space between header unit name and semi.

Test Plan: check-clang

Summary:
Before this change clang produced output with header unit names that may
conaint path separators, dots and other non-identifier characters. This
diff prints header unit name in quotes and -E output can be compiled
again. Also remove unnecessary space between header unit name and semi.

Test Plan: check-clang
@dmpolukhin dmpolukhin marked this pull request as ready for review October 18, 2024 11:41
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:modules C++20 modules and Clang Header Modules labels Oct 18, 2024
@llvmbot
Copy link
Member

llvmbot commented Oct 18, 2024

@llvm/pr-subscribers-clang-modules

@llvm/pr-subscribers-clang

Author: Dmitry Polukhin (dmpolukhin)

Changes

Summary:
Before this change clang produced output with header unit names that may conaint path separators, dots and other non-identifier characters. This diff prints header unit name in quotes and -E output can be compiled again. Also remove unnecessary space between header unit name and semi.

Test Plan: check-clang


Full diff: https://github.com/llvm/llvm-project/pull/112883.diff

4 Files Affected:

  • (modified) clang/lib/Frontend/PrintPreprocessedOutput.cpp (+6-4)
  • (modified) clang/lib/Lex/TokenConcatenation.cpp (+4)
  • (added) clang/test/Headers/header_unit_preprocessed_output.cpp (+20)
  • (modified) clang/test/Modules/cxx20-include-translation.cpp (+2-2)
diff --git a/clang/lib/Frontend/PrintPreprocessedOutput.cpp b/clang/lib/Frontend/PrintPreprocessedOutput.cpp
index 383d4356084916..1005825441b3e6 100644
--- a/clang/lib/Frontend/PrintPreprocessedOutput.cpp
+++ b/clang/lib/Frontend/PrintPreprocessedOutput.cpp
@@ -952,13 +952,15 @@ static void PrintPreprocessedTokens(Preprocessor &PP, Token &Tok,
       continue;
     } else if (Tok.is(tok::annot_header_unit)) {
       // This is a header-name that has been (effectively) converted into a
-      // module-name.
+      // module-name, print them inside quote.
       // FIXME: The module name could contain non-identifier module name
-      // components. We don't have a good way to round-trip those.
+      // components and OS specific file paths components. We don't have a good
+      // way to round-trip those.
       Module *M = reinterpret_cast<Module *>(Tok.getAnnotationValue());
       std::string Name = M->getFullModuleName();
-      Callbacks->OS->write(Name.data(), Name.size());
-      Callbacks->HandleNewlinesInToken(Name.data(), Name.size());
+      *Callbacks->OS << '"';
+      Callbacks->OS->write_escaped(Name);
+      *Callbacks->OS << '"';
     } else if (Tok.is(tok::annot_embed)) {
       // Manually explode the binary data out to a stream of comma-delimited
       // integer values. If the user passed -dE, that is handled by the
diff --git a/clang/lib/Lex/TokenConcatenation.cpp b/clang/lib/Lex/TokenConcatenation.cpp
index 865879d1805336..05f4203bd722bb 100644
--- a/clang/lib/Lex/TokenConcatenation.cpp
+++ b/clang/lib/Lex/TokenConcatenation.cpp
@@ -160,6 +160,10 @@ static char GetFirstChar(const Preprocessor &PP, const Token &Tok) {
 bool TokenConcatenation::AvoidConcat(const Token &PrevPrevTok,
                                      const Token &PrevTok,
                                      const Token &Tok) const {
+  // No space is required between header unit name in quote and semi.
+  if (PrevTok.is(tok::annot_header_unit) && Tok.is(tok::semi))
+    return false;
+
   // Conservatively assume that every annotation token that has a printable
   // form requires whitespace.
   if (PrevTok.isAnnotation())
diff --git a/clang/test/Headers/header_unit_preprocessed_output.cpp b/clang/test/Headers/header_unit_preprocessed_output.cpp
new file mode 100644
index 00000000000000..12b4ab6159f346
--- /dev/null
+++ b/clang/test/Headers/header_unit_preprocessed_output.cpp
@@ -0,0 +1,20 @@
+// RUN: rm -fR %t
+// RUN: split-file %s %t
+// RUN: cd %t
+// RUN: %clang_cc1 -verify -std=c++20 -emit-header-unit -xc++-user-header bz0.h
+// RUN: %clang_cc1 -verify -std=c++20 -fmodule-file=bz0.pcm -xc++-user-header bz1.h -E -o bz1.output.h
+// RUN: FileCheck %s < bz1.output.h
+// RUN: %clang_cc1 -std=c++20 -fmodule-file=bz0.pcm -emit-header-unit -xc++-user-header bz1.output.h
+
+//--- bz0.h
+// expected-no-diagnostics
+#pragma once
+
+void foo();
+
+//--- bz1.h
+// expected-no-diagnostics
+import "bz0.h";
+
+// CHECK: # 1 ".{{/|\\\\?}}bz1.h"
+// CHECK: import ".{{/|\\\\?}}bz0.h";
diff --git a/clang/test/Modules/cxx20-include-translation.cpp b/clang/test/Modules/cxx20-include-translation.cpp
index 7bf3184325829e..a607e7c0373f81 100644
--- a/clang/test/Modules/cxx20-include-translation.cpp
+++ b/clang/test/Modules/cxx20-include-translation.cpp
@@ -105,6 +105,6 @@ export void charlie() {
 }
 
 // CHECK: #pragma clang module import ".{{/|\\\\?}}h1.h"
-// CHECK: import .{{/|\\\\?}}h2.h
-// CHECK: import .{{/|\\\\?}}h3.h
+// CHECK: import ".{{/|\\\\?}}h2.h"
+// CHECK: import ".{{/|\\\\?}}h3.h"
 // CHECK-NOT: #pragma clang module import ".{{/|\\\\?}}h4.h"

Module *M = reinterpret_cast<Module *>(Tok.getAnnotationValue());
std::string Name = M->getFullModuleName();
Callbacks->OS->write(Name.data(), Name.size());
Callbacks->HandleNewlinesInToken(Name.data(), Name.size());
*Callbacks->OS << '"';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the header was imported by <>?

Copy link
Contributor Author

@dmpolukhin dmpolukhin Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a good question but I don't see easy way to propagate isAngled from Preprocessor::HandleHeaderIncludeOrImport to tokens that will be printed here. This information cannot be putted to Module because it is not a characteristic of the module but information how it was spelled (even pointer to Module* is passed as void* token annotation value). Spelling file path in quotes at least won't break C++ parsing of the output as it happens now. FIXME here already covers that information about spelled header unit should be preserved. But if you think it is required fix now, I'll create a new structure that will contain Module*, isAngled and spelled header-unit name to resolve FIXME completely - it will require more changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I don't want to block such issues.

Copy link
Member

@ChuanqiXu9 ChuanqiXu9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Module *M = reinterpret_cast<Module *>(Tok.getAnnotationValue());
std::string Name = M->getFullModuleName();
Callbacks->OS->write(Name.data(), Name.size());
Callbacks->HandleNewlinesInToken(Name.data(), Name.size());
*Callbacks->OS << '"';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I don't want to block such issues.

@dmpolukhin dmpolukhin merged commit 0b7e8c2 into llvm:main Oct 24, 2024
12 of 14 checks passed
@frobtech frobtech mentioned this pull request Oct 25, 2024
NoumanAmir657 pushed a commit to NoumanAmir657/llvm-project that referenced this pull request Nov 4, 2024
…lvm#112883)

Summary:
Before this change clang produced output with header unit names that may
conaint path separators, dots and other non-identifier characters. This
diff prints header unit name in quotes and -E output can be compiled
again. Also remove unnecessary space between header unit name and semi.

Test Plan: check-clang
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:modules C++20 modules and Clang Header Modules clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants