-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[clangd] [C++20] [Modules] Introduce initial support for C++20 Modules #66462
Conversation
@llvm/pr-subscribers-clang-tools-extra @llvm/pr-subscribers-clang-modules Changes(to be edited)(not ready for review yet)Patch is 66.97 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/66462.diff 28 Files Affected:
diff --git a/clang-tools-extra/clangd/CMakeLists.txt b/clang-tools-extra/clangd/CMakeLists.txt index 3911fb6c6c746a8..bcfb49551a02591 100644 --- a/clang-tools-extra/clangd/CMakeLists.txt +++ b/clang-tools-extra/clangd/CMakeLists.txt @@ -97,6 +97,8 @@ add_clang_library(clangDaemon IncludeFixer.cpp InlayHints.cpp JSONTransport.cpp + ModuleDependencyScanner.cpp + ModuleFilesInfo.cpp PathMapping.cpp Protocol.cpp Quality.cpp @@ -161,6 +163,7 @@ clang_target_link_libraries(clangDaemon clangAST clangASTMatchers clangBasic + clangDependencyScanning clangDriver clangFormat clangFrontend diff --git a/clang-tools-extra/clangd/ClangdServer.cpp b/clang-tools-extra/clangd/ClangdServer.cpp index 13d788162817fb4..e4c85858b6882ae 100644 --- a/clang-tools-extra/clangd/ClangdServer.cpp +++ b/clang-tools-extra/clangd/ClangdServer.cpp @@ -202,6 +202,7 @@ ClangdServer::Options::operator TUScheduler::Options() const { Opts.UpdateDebounce = UpdateDebounce; Opts.ContextProvider = ContextProvider; Opts.PreambleThrottler = PreambleThrottler; + Opts.ExperimentalModulesSupport = ExperimentalModulesSupport; return Opts; } diff --git a/clang-tools-extra/clangd/ClangdServer.h b/clang-tools-extra/clangd/ClangdServer.h index a416602251428b0..dc546b118cb8f5e 100644 --- a/clang-tools-extra/clangd/ClangdServer.h +++ b/clang-tools-extra/clangd/ClangdServer.h @@ -112,6 +112,9 @@ class ClangdServer { /// This throttler controls which preambles may be built at a given time. clangd::PreambleThrottler *PreambleThrottler = nullptr; + /// Enable experimental support for modules. + bool ExperimentalModulesSupport = false; + /// If true, ClangdServer builds a dynamic in-memory index for symbols in /// opened files and uses the index to augment code completion results. bool BuildDynamicSymbolIndex = false; diff --git a/clang-tools-extra/clangd/GlobalCompilationDatabase.cpp b/clang-tools-extra/clangd/GlobalCompilationDatabase.cpp index d1833759917a30f..bcc8f4f0dd9e5ac 100644 --- a/clang-tools-extra/clangd/GlobalCompilationDatabase.cpp +++ b/clang-tools-extra/clangd/GlobalCompilationDatabase.cpp @@ -729,6 +729,20 @@ DirectoryBasedGlobalCompilationDatabase::getProjectInfo(PathRef File) const { return Res->PI; } +std::vector<std::string> +DirectoryBasedGlobalCompilationDatabase::getAllFilesInProjectOf( + PathRef File) const { + CDBLookupRequest Req; + Req.FileName = File; + Req.ShouldBroadcast = false; + Req.FreshTime = Req.FreshTimeMissing = + std::chrono::steady_clock::time_point::min(); + auto Res = lookupCDB(Req); + if (!Res) + return {}; + return Res->CDB->getAllFiles(); +} + OverlayCDB::OverlayCDB(const GlobalCompilationDatabase *Base, std::vector<std::string> FallbackFlags, CommandMangler Mangler) @@ -805,6 +819,13 @@ std::optional<ProjectInfo> DelegatingCDB::getProjectInfo(PathRef File) const { return Base->getProjectInfo(File); } +std::vector<std::string> +DelegatingCDB::getAllFilesInProjectOf(PathRef File) const { + if (!Base) + return {}; + return Base->getAllFilesInProjectOf(File); +} + tooling::CompileCommand DelegatingCDB::getFallbackCommand(PathRef File) const { if (!Base) return GlobalCompilationDatabase::getFallbackCommand(File); diff --git a/clang-tools-extra/clangd/GlobalCompilationDatabase.h b/clang-tools-extra/clangd/GlobalCompilationDatabase.h index 2bf8c973c534c6f..eaeff8d627a0960 100644 --- a/clang-tools-extra/clangd/GlobalCompilationDatabase.h +++ b/clang-tools-extra/clangd/GlobalCompilationDatabase.h @@ -45,6 +45,10 @@ class GlobalCompilationDatabase { return std::nullopt; } + virtual std::vector<std::string> getAllFilesInProjectOf(PathRef File) const { + return {}; + } + /// Makes a guess at how to build a file. /// The default implementation just runs clang on the file. /// Clangd should treat the results as unreliable. @@ -75,6 +79,7 @@ class DelegatingCDB : public GlobalCompilationDatabase { getCompileCommand(PathRef File) const override; std::optional<ProjectInfo> getProjectInfo(PathRef File) const override; + std::vector<std::string> getAllFilesInProjectOf(PathRef File) const override; tooling::CompileCommand getFallbackCommand(PathRef File) const override; @@ -121,6 +126,7 @@ class DirectoryBasedGlobalCompilationDatabase /// Returns the path to first directory containing a compilation database in /// \p File's parents. std::optional<ProjectInfo> getProjectInfo(PathRef File) const override; + std::vector<std::string> getAllFilesInProjectOf(PathRef File) const override; bool blockUntilIdle(Deadline Timeout) const override; diff --git a/clang-tools-extra/clangd/ModuleDependencyScanner.cpp b/clang-tools-extra/clangd/ModuleDependencyScanner.cpp new file mode 100644 index 000000000000000..d706d2eb2fc8d6e --- /dev/null +++ b/clang-tools-extra/clangd/ModuleDependencyScanner.cpp @@ -0,0 +1,86 @@ +//===---------------- ModuleDependencyScanner.cpp ----------------*- C++-*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// + +#include "ModuleDependencyScanner.h" + +namespace clang { +namespace clangd { +using P1689Rule = tooling::dependencies::P1689Rule; + +std::optional<P1689Rule> ModuleDependencyScanner::scan(PathRef FilePath) { + if (ScanningCache.count(FilePath)) + return ScanningCache[FilePath]; + + std::optional<tooling::CompileCommand> Cmd = CDB.getCompileCommand(FilePath); + + if (!Cmd) + return std::nullopt; + + using namespace clang::tooling::dependencies; + + llvm::SmallString<128> FilePathDir(FilePath); + llvm::sys::path::remove_filename(FilePathDir); + DependencyScanningTool ScanningTool( + Service, + TFS ? TFS->view(FilePathDir) : llvm::vfs::createPhysicalFileSystem()); + + llvm::Expected<P1689Rule> Result = + ScanningTool.getP1689ModuleDependencyFile(*Cmd, Cmd->Directory); + + if (auto E = Result.takeError()) { + // Ignore any error. + llvm::consumeError(std::move(E)); + return std::nullopt; + } + + if (Result->Provides) + ModuleNameToSourceMapper[Result->Provides->ModuleName] = FilePath; + + ScanningCache[FilePath] = *Result; + return *Result; +} + +void ModuleDependencyScanner::globalScan(PathRef File) { + std::vector<std::string> AllFiles = CDB.getAllFilesInProjectOf(File); + + for (auto &File : AllFiles) + scan(File); +} + +PathRef ModuleDependencyScanner::getSourceForModuleName(StringRef ModuleName) const { + if (!ModuleNameToSourceMapper.count(ModuleName)) + return {}; + + return ModuleNameToSourceMapper[ModuleName]; +} + +std::vector<std::string> +ModuleDependencyScanner::getRequiredModules(PathRef File) const { + if (!ScanningCache.count(File)) + return {}; + + const P1689Rule &CachedResult = ScanningCache[File]; + std::vector<std::string> Result; + + for (const auto &Info : CachedResult.Requires) + Result.push_back(Info.ModuleName); + + return Result; +} + +StringRef ModuleDependencyScanner::getModuleName(PathRef File) const { + if (!ScanningCache.count(File)) + return {}; + + if (!ScanningCache[File].Provides) + return {}; + + return ScanningCache[File].Provides->ModuleName; +} +} // namespace clangd +} // namespace clang diff --git a/clang-tools-extra/clangd/ModuleDependencyScanner.h b/clang-tools-extra/clangd/ModuleDependencyScanner.h new file mode 100644 index 000000000000000..1d6eb58fda59e20 --- /dev/null +++ b/clang-tools-extra/clangd/ModuleDependencyScanner.h @@ -0,0 +1,78 @@ +//===-------------- ModuleDependencyScanner.h --------------------*- C++-*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// + +#ifndef LLVM_CLANG_TOOLS_EXTRA_CLANGD_MODULEDEPENDENCYSCANNER_H +#define LLVM_CLANG_TOOLS_EXTRA_CLANGD_MODULEDEPENDENCYSCANNER_H + +#include "GlobalCompilationDatabase.h" +#include "support/Path.h" +#include "support/ThreadsafeFS.h" + +#include "clang/Tooling/DependencyScanning/DependencyScanningService.h" +#include "clang/Tooling/DependencyScanning/DependencyScanningTool.h" + +#include "llvm/ADT/SmallString.h" +#include "llvm/ADT/StringMap.h" + +namespace clang { +namespace clangd { + +/// A scanner to produce P1689 format for C++20 Modules. +/// +/// The scanner can scan a single file with `scan(PathRef)` member function +/// or scan the whole project with `globalScan(PathRef)` member function. See +/// the comments of `globalScan` to see the details. +class ModuleDependencyScanner { +public: + ModuleDependencyScanner(const GlobalCompilationDatabase &CDB, + const ThreadsafeFS *TFS) + : CDB(CDB), TFS(TFS), + Service(tooling::dependencies::ScanningMode::CanonicalPreprocessing, + tooling::dependencies::ScanningOutputFormat::P1689) {} + + /// Scanning the single file specified by \param FilePath. + std::optional<clang::tooling::dependencies::P1689Rule> scan(PathRef FilePath); + + /// Scanning every source file in the current project to get the + /// <module-name> to <module-unit-source> map. + /// It looks unefficiency to scan the whole project especially for + /// every version of every file! + /// TODO: We should find a efficient method to get the <module-name> + /// to <module-unit-source> map. We can make it either by providing + /// a global module dependency scanner to monitor every file. Or we + /// can simply require the build systems (or even if the end users) + /// to provide the map. + void globalScan(PathRef File); + + PathRef getSourceForModuleName(StringRef ModuleName) const; + + /// Return the direct required modules. Indirect required modules are not + /// included. + std::vector<std::string> getRequiredModules(PathRef File) const; + StringRef getModuleName(PathRef File) const; + + const ThreadsafeFS *getThreadsafeFS() const { return TFS; } + + const GlobalCompilationDatabase &getCompilationDatabase() const { return CDB; } + +private: + const GlobalCompilationDatabase &CDB; + const ThreadsafeFS *TFS; + + clang::tooling::dependencies::DependencyScanningService Service; + + // Map source file to P1689 Result. + llvm::StringMap<clang::tooling::dependencies::P1689Rule> ScanningCache; + // Map module name to source file path. + llvm::StringMap<std::string> ModuleNameToSourceMapper; +}; + +} // namespace clangd +} // namespace clang + +#endif diff --git a/clang-tools-extra/clangd/ModuleFilesInfo.cpp b/clang-tools-extra/clangd/ModuleFilesInfo.cpp new file mode 100644 index 000000000000000..845ff01ca09dff3 --- /dev/null +++ b/clang-tools-extra/clangd/ModuleFilesInfo.cpp @@ -0,0 +1,282 @@ +//===----------------- ModuleFilesInfo.cpp -----------------------*- C++-*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// + +#include "ModuleFilesInfo.h" +#include "support/Logger.h" + +#include "clang/Frontend/FrontendAction.h" +#include "clang/Frontend/FrontendActions.h" +#include "clang/Serialization/ASTReader.h" + +namespace clang { +namespace clangd { + +namespace { +llvm::SmallString<128> getAbsolutePath(const tooling::CompileCommand &Cmd) { + llvm::SmallString<128> AbsolutePath; + if (llvm::sys::path::is_absolute(Cmd.Filename)) { + AbsolutePath = Cmd.Filename; + } else { + AbsolutePath = Cmd.Directory; + llvm::sys::path::append(AbsolutePath, Cmd.Filename); + llvm::sys::path::remove_dots(AbsolutePath, true); + } + return AbsolutePath; +} +} // namespace + +ModuleFilesInfo::ModuleFilesInfo(PathRef MainFile, + const GlobalCompilationDatabase &CDB) { + std::optional<ProjectInfo> PI = CDB.getProjectInfo(MainFile); + if (!PI) + return; + + llvm::SmallString<128> Result(PI->SourceRoot); + llvm::sys::path::append(Result, ".cache"); + llvm::sys::path::append(Result, "clangd"); + llvm::sys::path::append(Result, "module_files"); + llvm::sys::fs::create_directories(Result, /*IgnoreExisting=*/true); + + llvm::sys::path::append(Result, llvm::sys::path::filename(MainFile)); + llvm::sys::fs::createUniqueDirectory(Result, UniqueModuleFilesPathPrefix); + + log("Initialized module files to {0}", UniqueModuleFilesPathPrefix.str()); +} + +ModuleFilesInfo::~ModuleFilesInfo() { + DependentModuleNames.clear(); + Successed = false; + + if (UniqueModuleFilesPathPrefix.empty()) + return; + + llvm::sys::fs::remove_directories(UniqueModuleFilesPathPrefix); + UniqueModuleFilesPathPrefix.clear(); +} + +llvm::SmallString<256> +ModuleFilesInfo::getModuleFilePath(StringRef ModuleName) const { + llvm::SmallString<256> ModuleFilePath; + + ModuleFilePath = UniqueModuleFilesPathPrefix; + auto [PrimaryModuleName, PartitionName] = ModuleName.split(':'); + llvm::sys::path::append(ModuleFilePath, PrimaryModuleName); + if (!PartitionName.empty()) { + ModuleFilePath.append("-"); + ModuleFilePath.append(PartitionName); + } + ModuleFilePath.append(".pcm"); + + return ModuleFilePath; +} + +bool ModuleFilesInfo::IsModuleUnitBuilt(StringRef ModuleName) const { + if (!DependentModuleNames.count(ModuleName)) + return false; + + auto BMIPath = getModuleFilePath(ModuleName); + if (llvm::sys::fs::exists(BMIPath)) + return true; + + Successed = false; + + DependentModuleNames.erase(ModuleName); + return false; +} + +void ModuleFilesInfo::ReplaceHeaderSearchOptions( + HeaderSearchOptions &Options) const { + if (!IsInited()) + return; + + Options.PrebuiltModulePaths.insert(Options.PrebuiltModulePaths.begin(), + UniqueModuleFilesPathPrefix.str().str()); + + for (auto Iter = Options.PrebuiltModuleFiles.begin(); + Iter != Options.PrebuiltModuleFiles.end();) { + if (IsModuleUnitBuilt(Iter->first)) { + Iter = Options.PrebuiltModuleFiles.erase(Iter); + continue; + } + + Iter++; + } +} + +void ModuleFilesInfo::ReplaceCompileCommands( + tooling::CompileCommand &Cmd) const { + if (!IsInited()) + return; + + std::vector<std::string> CommandLine(std::move(Cmd.CommandLine)); + + Cmd.CommandLine.emplace_back(CommandLine[0]); + Cmd.CommandLine.emplace_back( + llvm::Twine("-fprebuilt-module-path=" + UniqueModuleFilesPathPrefix) + .str()); + + for (std::size_t I = 1; I < CommandLine.size(); I++) { + const std::string &Arg = CommandLine[I]; + const auto &[LHS, RHS] = StringRef(Arg).split("="); + + // Remove original `-fmodule-file=<module-name>=<module-path>` form if it + // already built. + if (LHS == "-fmodule-file" && RHS.contains("=")) { + const auto &[ModuleName, _] = RHS.split("="); + if (IsModuleUnitBuilt(ModuleName)) + continue; + } + + Cmd.CommandLine.emplace_back(Arg); + } +} + +void ModuleFilesInfo::ReplaceCompileCommands(tooling::CompileCommand &Cmd, + StringRef OutputModuleName) const { + if (!IsInited()) + return; + + ReplaceCompileCommands(Cmd); + + Cmd.Output = getModuleFilePath(OutputModuleName).str().str(); +} + +bool ModuleFilesInfo::buildModuleFile(PathRef ModuleUnitFileName, + ModuleDependencyScanner &Scanner) { + if (ModuleUnitFileName.empty()) + return false; + + for (auto &ModuleName : Scanner.getRequiredModules(ModuleUnitFileName)) { + // Return early if there are errors building the module file. + if (!IsModuleUnitBuilt(ModuleName) && + !buildModuleFile(Scanner.getSourceForModuleName(ModuleName), Scanner)) { + log("Failed to build module {0}", ModuleName); + return false; + } + } + + auto Cmd = + Scanner.getCompilationDatabase().getCompileCommand(ModuleUnitFileName); + if (!Cmd) + return false; + + ReplaceCompileCommands(*Cmd, Scanner.getModuleName(ModuleUnitFileName)); + + ParseInputs Inputs; + Inputs.TFS = Scanner.getThreadsafeFS(); + Inputs.CompileCommand = std::move(*Cmd); + + IgnoreDiagnostics IgnoreDiags; + auto CI = buildCompilerInvocation(Inputs, IgnoreDiags); + if (!CI) + return false; + + auto FS = Inputs.TFS->view(Inputs.CompileCommand.Directory); + auto AbsolutePath = getAbsolutePath(Inputs.CompileCommand); + auto Buf = FS->getBufferForFile(AbsolutePath); + if (!Buf) + return false; + + // Hash the contents of input files and store the hash value to the BMI files. + // So that we can check if the files are still valid when we want to reuse the + // BMI files. + CI->getHeaderSearchOpts().ValidateASTInputFilesContent = true; + + CI->getFrontendOpts().OutputFile = Inputs.CompileCommand.Output; + auto Clang = + prepareCompilerInstance(std::move(CI), /*Preamble=*/nullptr, + std::move(*Buf), std::move(FS), IgnoreDiags); + if (!Clang) + retu... |
ec7d378
to
36f7d8d
Compare
36f7d8d
to
a4e9cae
Compare
The bot shows some escaping code issue on windows:
And I'm like to skip the tests on windows temporarily since I don't have a windows development environment. I hope someone else can fix it later and I feel it may not be hard. |
a4e9cae
to
df7c71d
Compare
Oh nice going to try this :) |
@sam-mccall gentle ping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about the delay - this looks like a much more manageable scope!
Comments mostly on structure still...
@@ -45,6 +45,10 @@ class GlobalCompilationDatabase { | |||
return std::nullopt; | |||
} | |||
|
|||
virtual std::vector<std::string> getAllFilesInProjectOf(PathRef File) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The layering here looks close, but not quite right to me.
Rather than having GCDB expose "get all files" and building module-scanning on top of that, I think GCDB should expose module data: e.g. through a subobject interface like shared_ptr<ProjectModules> GlobalCDB::getModulesDatabase(PathRef File)
, returning something like the current ModulesDependencyScanner behind an interface.
(If the interface to ModuleDependencyScanner was narrower, it could also expose those queries directly)
Reasons to prefer this:
- suppose we augment
compile_commands.json
with some module manifest, now we want to use a different strategy for the projects that have one - which is something we only know inside DirectoryBasedCDB - GlobalCDB is our general "build system integration" point - if we need to swap out the high level "what are the projects and how do we talk to their build systems" strategy, we likely need to reconsider module discovery too
- getAllFilesInProjectOf exposes more internal structure than I'd like for general GlobalCDB: we use this abstraction to integrate in cases where there's no "project" structure and source files are not enumerable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
compile_commands.json
probably isn't the best place for it as it cannot be filled out at the same time (e.g., CMake knows the compile commands once it generates; it does not know the relevant module information until the build happens and updating the database would be a huge bottleneck).
I have a prototype (that needs some more work) that outlines such a module_commands.json
-like file in this CMake branch. Of note that doesn't work:
- flags is not filled out; and
- the commands that merge the per-target databases into a top-level one does not work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is the example for module_commands.json
? So that we can get a rough image.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done by adding ProjectModules class in ProjectModules.cpp file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably easiest to see it by the writer code here. Basically, it is:
{
"version": 1,
"revision": 0,
"module_sets": [
{
"name": "unique name of module set",
"visible_module_sets": [
"module sets",
"which can satisfy",
"required modules within this module set",
],
"modules": [
{
"work-directory": "/path/to/working/directory",
"source": "path/to/source/file",
"name": "name.of.provided.module",
"private": false,
"requires": [
"list",
"of",
"required",
"modules"
],
"flag_sets": [
"-fstandalone-flag",
["-fflag", "with-arg"],
]
}
]
}
]
}
The semantics are:
- module sets may refer to other module sets named in order to satisfy their required modules;
- however, private modules in other module sets are not usable;
work-directory
is optional;private
defaults tofalse
;- flag sets need to be used to compile the BMI in question (Daniel Ruoso's "local preprocessor arguments");
- flags may be deduplicated/merged by top-level identities.
tooling::dependencies::ScanningOutputFormat::P1689) {} | ||
|
||
/// Scanning the single file specified by \param FilePath. | ||
std::optional<clang::tooling::dependencies::P1689Rule> scan(PathRef FilePath); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a little detail, but P1689Rule is a completely opaque name.
It's also undocumented (what's "PrimaryOutput", especially in a clangd context?), and doesn't include the source file, which would help make the interface here much more regular:
optional<ModuleNode> getModuleFromSource(PathRef);
optional<ModuleNode> getModuleFromName(StringRef);
WDYT about defining our own struct here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I added ModuleDependencyInfo
struct.
struct ModuleDependencyInfo {
// The name of the module if the file is a module unit.
std::optional<std::string> ModuleName;
// A list of names for the modules that the file directly depends.
std::vector<std::string> RequiredModules;
};
The complete P1689 information may be redundant for us.
I didn't add the suggested interfaces. Since I found the following two interfaces are sufficient and more straight forward in practice (from my mind. I feel like we don't need to look at the definition of ModuleNode
now when we use the scanner):
PathRef getSourceForModuleName(StringRef ModuleName);
std::vector<std::string> getRequiredModules(PathRef File);
class ModuleDependencyScanner { | ||
public: | ||
ModuleDependencyScanner(const GlobalCompilationDatabase &CDB, | ||
const ThreadsafeFS *TFS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFS is not optional and should also be a reference
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
namespace clang { | ||
namespace clangd { | ||
|
||
/// A scanner to produce P1689 format for C++20 Modules. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P1689 describes a JSON format, we're not producing that here.
This is used to query the module graph, and the implementation strategy is scanning all source files in the CDB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
/// a global module dependency scanner to monitor every file. Or we | ||
/// can simply require the build systems (or even if the end users) | ||
/// to provide the map. | ||
void globalScan(PathRef File); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is used for its side-effect, but we don't describe what that is (i.e. that getSourceForModuleName only works after calling this function).
We can either:
- make that explicit, and have getSourceForModuleName assert that globalScan has been called
- hide that detail, and make getSourceForModuleName call globalScan if that hasn't happened already
even though it hides important performance characteristics, I think I prefer the second as it provides an interface that makes sense for other (non-scanning) strategies
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now the scanner are only used by ScanningAllProjectModules
class. In the scanner itself, I add the assertion in getSourceForModuleName
. And in the ScanningAllProjectModules
class, I make it to call globalScan if that hasn't happened already.
/// The users should call `ReplaceHeaderSearchOptions(...)` or `ReplaceCompileCommands(CompileCommand&)` | ||
/// member function to update the compilation commands to select the built module files first. | ||
struct ModuleFilesInfo { | ||
ModuleFilesInfo() = default; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why expose the default constructor if it's not meant to be used? similarly, why is this class movable?
edit: I see you use an empty instance when the feature is off. This contradicts the "should only be initialized" comment, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Now the default constructor is private. But it is still movable since we need to assign it to Preamble.
/// Return true if the modile file specified by ModuleName is built. | ||
/// Note that this interface will only check the existence of the module | ||
/// file instead of checking the validness of the module file. | ||
bool IsModuleUnitBuilt(StringRef ModuleName) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this function isn't used externally, remove?
(If it was used externally, I'm not sure how callers are meant to interpret the note)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is used in unittests. I've added a comment.
if (!IsInited()) | ||
return true; | ||
|
||
if (!Successed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having the behavior of this function depend on whether IsModuleUnitBuilt() was called is surprising and seems undesirable.
When the module files are not created, isn't there some other way to detect that error? Is it really too expensive to eagerly check whether they were created, given that we were going to write them?
Do we really want CanReuse to return false if the compilation failed, causing us to repeat it over and over again even though inputs haven't changed?
My suspicion is that Successed
should just be removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done by removing Successed
.
StringRef ModuleName = Iter.first(); | ||
auto BMIPath = getModuleFilePath(ModuleName); | ||
auto ReadResult = | ||
Clang.getASTReader()->ReadAST(BMIPath, serialization::MK_MainFile, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is too expensive to be a good end state for "can reuse preamble" - if we can't work out yet how to do it cheaply, maybe it's better just not to validate whether modules are out-of-date yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel it sounds a little bit crazy to not validate whether modules are out-of-date to me. But maybe we can add an option to control this later. Also the compiler tried a lot to make it pretty fast to load the module file. In my experience, it won't take 1s to load a 200+MB modules. So I feel it may not be so hurting in practice. And after all, it should not be bad as the initial version of the patch.
if (!ModulesInfo.buildModuleFile( | ||
Scanner.getSourceForModuleName(Info.ModuleName), Scanner)) { | ||
log("Failed to build module {0}", Info.ModuleName); | ||
return ModulesInfo; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like we should either carry on and build all the modules we can, or return an empty/failed set - what's the value of returning half the modules?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@sam-mccall thanks for your high qualified comments! I think I've addressed most of them. I am not so familiar with the Github review system. Tell me if you're more happy with other practices. |
@sam-mccall gentle ping~ |
I highly recommend talking to @mathstuf of the CMake project about a way for the build system can emit a next generation set of build graph metadata (think modules appropriate compile commands, including dependencies). I expect it will be more efficient for the build system, compiler, and LSP to all work together compared to implementing large subsets of a build system in the clangd LSP. |
Yeah, yeah, of course. This is listed as the second step of |
@sam-mccall gentle ping~ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this looks really close to me!
Tried to focus on how the components fit together, the details of module building and reuse can easily be tweaked later if needed.
virtual PathRef getSourceForModuleName(StringRef ModuleName) = 0; | ||
}; | ||
|
||
class ScanningAllProjectModules : public ProjectModules { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this class only implements an interface, we could just expose a factory function from the header instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
GlobalScanned && | ||
"We should only call getSourceForModuleName after calling globalScan()"); | ||
|
||
if (!ModuleNameToSource.count(ModuleName)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: rather if (auto It = find(); It != end())
to avoid the double-lookup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
llvm::sys::path::remove_filename(FilePathDir); | ||
DependencyScanningTool ScanningTool(Service, TFS.view(FilePathDir)); | ||
|
||
llvm::Expected<P1689Rule> P1689Result = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: again, drop P1689 from the var name?
We can't do anything about the function being misnamed, but...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
if (!PI) | ||
return; | ||
|
||
llvm::SmallString<128> Result(PI->SourceRoot); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PI->SourceRoot can be empty, depending on the CDB strategy. For now it's OK to bail out in this case, but we need a FIXME
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done by adding a FIXME.
llvm::SmallString<128> Result(PI->SourceRoot); | ||
llvm::sys::path::append(Result, ".cache"); | ||
llvm::sys::path::append(Result, "clangd"); | ||
llvm::sys::path::append(Result, "module_files"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: just modules
I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I prefer using the term module files
to describe on disk BMI files. Since the term "modules" may have too many meanings.
} | ||
|
||
llvm::SmallString<256> | ||
PrerequisiteModules::getModuleFilePath(StringRef ModuleName) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
giving the dir a random name and the modules predictable names, rather than just random names for the modules is an interesting choice
It makes things more complicated, but I can see maybe it's easier to debug.
Leave a comment somewhere motivating the layout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done by adding comments to the definition of PrerequisiteModules
.
return false; | ||
|
||
auto BMIPath = getModuleFilePath(ModuleName); | ||
if (llvm::sys::fs::exists(BMIPath)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a FIXME somewhere in this file to make this logic VFS-safe
(You're mixing physical writes and reads with VFS-based reads. It's messy to resolve because the VFS infra is read-only and because DirBasedCDB is real-FS based IIRC. So let's not try to totally sort it out now)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done by adding the FIXME to the constructor of PrerequisiteModules
std::vector<std::string> CommandLine(std::move(Cmd.CommandLine)); | ||
|
||
Cmd.CommandLine.emplace_back(CommandLine[0]); | ||
Cmd.CommandLine.emplace_back( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need this line? we should be passing all paths explicitly with -fmodule-file=, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Originally I just felt it was simpler. And now I feel passing -fmodule-file=<module-name>=<module-file>
looks more neat indeed.
c367f6b
to
190e160
Compare
@sam-mccall Thanks for you high quality comments! I think all the comments (except few ones I explained in inline comments) should be addressed in the latest force-push commit. I didn't click the The "significant" changes in the latest commit are mainly interface changes. They include:
I can image, in the end of the day, we need to have a "public" ModuleFile class which are owned by |
Thank you for your work, @ChuanqiXu9, have the best of luck! |
hi, clangd got error |
i tried your branch (and modified my xmake a little to gen the module_map.txt, https://github.com/Arthapz/xmake/tree/clangd-module-map), i did get some of LSP support (neovim) but the logs are full of errors like
and after some time some errors show up i used these parameters for clangd
|
@Arthapz @guijiyang thanks for reporting. But could you try to give feedbacks to that repo instead of this page. I feel this page it too long now. (My browser needs seconds to load it). And this page is indeed not a good place to track that. @guijiyang it looks like the reason why it can't work is that the clang tools can't handle with MSVC's option '/ifcOutput' and '/interface' . It may be better if you can compile your project with clang.exe on Windows and use a compile commands from that. And also, as far as I know, clang can't compile std module from MSSTL. Then clangd can't handle that too. Maybe we have to wait for clang to compile std module from MS STL. @Arthapz For the first error, it says the module unit doesn't declare that module. Maybe it will helpful to check that. I noticed that there is a For the second error, it should be a inconsistent configuration in the clangd side. https://clang.llvm.org/docs/StandardCPlusPlusModules.html#object-definition-consistency. We skipped ODR check by default in clang but it looks like we forgot it in clangd. I've updated it in https://github.com/ChuanqiXu9/clangd-for-modules. You can give it another try. |
Ok, i got it, and which repo can give feedback to you? |
Please use this one: https://github.com/ChuanqiXu9/clangd-for-modules |
@kadircet Comments addressed. Maybe due to some github's issue, my replies are separated and didn't form a standalone list. Hope this won't make it harder. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks a lot, LGTM!
i think the only big item remaining is injecting resource-dir correctly when running clang-scan-deps.
the most important follow up (for functionality, rather than optimizations and usability) is indexing the PCMs as we build them (similar to preamble-indexing we run today). For any future travelers, please do start a discussion in advance for that one.
|
||
#include "ModulesBuilder.h" | ||
#include "ScanningProjectModules.h" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: drop spaces between includes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
// Currently we simply bail out. | ||
if (ModuleUnitFileName.empty()) | ||
return llvm::createStringError( | ||
llvm::formatv("Failed to get the primary source")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for formatv
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
if (llvm::Error Err = buildModuleFile(RequiredModuleName, CDB, TFS, MDB, | ||
ModuleFilesPrefix, BuiltModuleFiles)) | ||
return llvm::createStringError( | ||
llvm::formatv("Failed to build dependency {0}", RequiredModuleName)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you also add the underlying error? e.g.: "Failed to build dependency {0}: {1}", RequiredModuleName, llvm::toString(std::move(Err))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
auto CI = buildCompilerInvocation(Inputs, IgnoreDiags); | ||
if (!CI) | ||
return llvm::createStringError( | ||
llvm::formatv("Failed to build compiler invocation")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for formatv
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
auto FS = Inputs.TFS->view(Inputs.CompileCommand.Directory); | ||
auto Buf = FS->getBufferForFile(Inputs.CompileCommand.Filename); | ||
if (!Buf) | ||
return llvm::createStringError(llvm::formatv("Failed to create buffer")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for formatv
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
std::move(*Buf), std::move(FS), IgnoreDiags); | ||
if (!Clang) | ||
return llvm::createStringError( | ||
llvm::formatv("Failed to prepare compiler instance")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for formatv
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Clang->ExecuteAction(Action); | ||
|
||
if (Clang->getDiagnostics().hasErrorOccurred()) | ||
return llvm::createStringError(llvm::formatv("Compilation failed")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for formatv
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Thanks for reviewing! For future plans, I still like to chase for optimizations and usability first. For example, we need to reuse built module files across opened files. Otherwise every time we open a new file, we need to build all the modules dependent. This can't be tolerant in real workloads. For example, some modules and their dependent modules need 10+ seconds to build. Then it is a nightmare. For index the PCMs, what's your suggestion and the path to do that? I don't have an idea for that yet. |
Given this is approved, I'd like to land this to give it more baking times. I think the current one is usable as long as the users can be tolerant about the long latency for openning each file. If users want it to be more faster by reusing module files, users can try https://github.com/ChuanqiXu9/clangd-for-modules. The changes there will be contributed to the upstream repo step by step. But the early days testing is pretty meaningful. There will be many dark corners can only be found in practice. |
@@ -161,6 +163,7 @@ clang_target_link_libraries(clangDaemon | |||
clangAST | |||
clangASTMatchers | |||
clangBasic | |||
clangDependencyScanning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This dependency makes clangd depend (transitively) on clangCodeGen (and its dependencies, e.g. LLVM_TARGETS_TO_BUILD
), which it didn't depend on previously.
This makes clangd take ~30% longer to build, increases its binary size, and it's conceptually strange that a language server contains code generation code.
Any chance we could cut the dependency on clangCodeGen again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like it comes from
llvm-project/clang/lib/Tooling/DependencyScanning/DependencyScanningWorker.cpp
Lines 489 to 490 in b634e05
PCHContainerOps->registerReader( | |
std::make_unique<ObjectFilePCHContainerReader>()); |
but clangd didn't touch this function. So it might be possible to split that. I file an issue to track this: #99479
#66462) Summary: Alternatives to https://reviews.llvm.org/D153114. Try to address clangd/clangd#1293. See the links for design ideas and the consensus so far. We want to have some initial support in clang18. This is the initial support for C++20 Modules in clangd. As suggested by sammccall in https://reviews.llvm.org/D153114, we should minimize the scope of the initial patch to make it easier to review and understand so that every one are in the same page: > Don't attempt any cross-file or cross-version coordination: i.e. don't > try to reuse BMIs between different files, don't try to reuse BMIs > between (preamble) reparses of the same file, don't try to persist the > module graph. Instead, when building a preamble, synchronously scan > for the module graph, build the required PCMs on the single preamble > thread with filenames private to that preamble, and then proceed to > build the preamble. This patch reflects the above opinions. # Testing in real-world project I tested this with a modularized library: https://github.com/alibaba/async_simple/tree/CXX20Modules. This library has 3 modules (async_simple, std and asio) and 65 module units. (Note that a module consists of multiple module units). Both `std` module and `asio` module have 100k+ lines of code (maybe more, I didn't count). And async_simple itself has 8k lines of code. This is the scale of the project. The result shows that it works pretty well, ..., well, except I need to wait roughly 10s after opening/editing any file. And this falls in our expectations. We know it is hard to make it perfect in the first move. # What this patch does in detail - Introduced an option `--experimental-modules-support` for the support for C++20 Modules. So that no matter how bad this is, it wouldn't affect current users. Following off the page, we'll assume the option is enabled. - Introduced two classes `ModuleFilesInfo` and `ModuleDependencyScanner`. Now `ModuleDependencyScanner` is only used by `ModuleFilesInfo`. - The class `ModuleFilesInfo` records the built module files for specific single source file. The module files can only be built by the static member function `ModuleFilesInfo::buildModuleFilesInfoFor(PathRef File, ...)`. - The class `PreambleData` adds a new member variable with type `ModuleFilesInfo`. This refers to the needed module files for the current file. It means the module files info is part of the preamble, which is suggested in the first patch too. - In `isPreambleCompatible()`, we add a call to `ModuleFilesInfo::CanReuse()` to check if the built module files are still up to date. - When we build the AST for a source file, we will load the built module files from ModuleFilesInfo. # What we need to do next Let's split the TODOs into clang part and clangd part to make things more clear. The TODOs in the clangd part include: 1. Enable reusing module files across source files. The may require us to bring a ModulesManager like thing which need to handle `scheduling`, `the possibility of BMI version conflicts` and `various events that can invalidate the module graph`. 2. Get a more efficient method to get the `<module-name> -> <module-unit-source>` map. Currently we always scan the whole project during `ModuleFilesInfo::buildModuleFilesInfoFor(PathRef File, ...)`. This is clearly inefficient even if the scanning process is pretty fast. I think the potential solutions include: - Make a global scanner to monitor the state of every source file like I did in the first patch. The pain point is that we need to take care of the data races. - Ask the build systems to provide the map just like we ask them to provide the compilation database. 3. Persist the module files. So that we can reuse module files across clangd invocations or even across clangd instances. TODOs in the clang part include: 1. Clang should offer an option/mode to skip writing/reading the bodies of the functions. Or even if we can requrie the parser to skip parsing the function bodies. And it looks like we can say the support for C++20 Modules is initially workable after we made (1) and (2) (or even without (2)). Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D60250983
This is the following patch of #66462 to optimize its performance. # Motivation To avoid data races, we choose "per file owns its dependent modules" model. That said, every TU will own all the required module files so that we don't need to worry about thread safety. And it looks like we succeeded that we focus on the interfaces and structure of modules support in clangd. But after all, this model is not good for performance. Image we have 10000 TUs import std, we will have 10000 std.pcm in the memory. That is terrible both in time and space. Given the current modules support in clangd works pretty well (almost every issue report I received is more or less a clang's issue), I'd like to improve the performance. # High Level Changes After this patch, the built module files will be owned by the module builder and each TU will only have a reference to the built module files. The module builder have a map from module names to built module files. When a new TU ask for a module file, the module builder will check if the module file lives in the map and if the module file are up to date. If yes, the module file will be returned. If no, the module file entry would be erased in the module builder. We use `shared_ptr<>` to track module file here so that the other TU owning the out dated module file won't be affected. The out dated module file will be removed automatically if other TU gets update or closed. (I know the out dated module file may not exist due to the `CanReuse` mechanism. But the design here is natural and can be seen as a redundant design to make it more robust.) When we a build a module, we will use the mutex and the condition variable in the working thread to build it exclusively. All other threads that also want the module file would have to wait for that working thread. It might not sounds great but I think if we want to make it asynchronous, we have to refactor TUScheduler as far as I know. # Code Structure Changes Thanks for the previous hard working reviewing, the interfaces almost don't change in this patch. Almost all the work are isolated in ModulesBuilder.cpp. A outliner is that we convert `ModulesBuilder` to an abstract class since the implementation class needs to own the module files. And the core function to review is `ReusableModulesBuilder::getOrBuildModuleFile`. It implements the core logic to fetch the module file from the cache or build it if the module file is not in the cache or out of date. And other important entities are `BuildingModuleMutexes`, `BuildingModuleCVs`, `BuildingModules` and `ModulesBuildingMutex`. These are mutexes and condition variables to make sure the thread safety. # User experience I've implemented this in our downstream and ask our users to use it. I also sent it https://github.com/ChuanqiXu9/clangd-for-modules here as pre-version. The feedbacks are pretty good. And I didn't receive any bug reports (about the reusable modules builder) yet. # Other potential improvement The are other two potential improvements can be done: 1. Scanning cache and a mechanism to get the required module information more quickly. (Like the module maps in https://github.com/ChuanqiXu9/clangd-for-modules) 2. Persist the module files. So that after we close the vscode and reopen it, we can reuse the built module files since the last invocation.
This is the following patch of llvm#66462 to optimize its performance. # Motivation To avoid data races, we choose "per file owns its dependent modules" model. That said, every TU will own all the required module files so that we don't need to worry about thread safety. And it looks like we succeeded that we focus on the interfaces and structure of modules support in clangd. But after all, this model is not good for performance. Image we have 10000 TUs import std, we will have 10000 std.pcm in the memory. That is terrible both in time and space. Given the current modules support in clangd works pretty well (almost every issue report I received is more or less a clang's issue), I'd like to improve the performance. # High Level Changes After this patch, the built module files will be owned by the module builder and each TU will only have a reference to the built module files. The module builder have a map from module names to built module files. When a new TU ask for a module file, the module builder will check if the module file lives in the map and if the module file are up to date. If yes, the module file will be returned. If no, the module file entry would be erased in the module builder. We use `shared_ptr<>` to track module file here so that the other TU owning the out dated module file won't be affected. The out dated module file will be removed automatically if other TU gets update or closed. (I know the out dated module file may not exist due to the `CanReuse` mechanism. But the design here is natural and can be seen as a redundant design to make it more robust.) When we a build a module, we will use the mutex and the condition variable in the working thread to build it exclusively. All other threads that also want the module file would have to wait for that working thread. It might not sounds great but I think if we want to make it asynchronous, we have to refactor TUScheduler as far as I know. # Code Structure Changes Thanks for the previous hard working reviewing, the interfaces almost don't change in this patch. Almost all the work are isolated in ModulesBuilder.cpp. A outliner is that we convert `ModulesBuilder` to an abstract class since the implementation class needs to own the module files. And the core function to review is `ReusableModulesBuilder::getOrBuildModuleFile`. It implements the core logic to fetch the module file from the cache or build it if the module file is not in the cache or out of date. And other important entities are `BuildingModuleMutexes`, `BuildingModuleCVs`, `BuildingModules` and `ModulesBuildingMutex`. These are mutexes and condition variables to make sure the thread safety. # User experience I've implemented this in our downstream and ask our users to use it. I also sent it https://github.com/ChuanqiXu9/clangd-for-modules here as pre-version. The feedbacks are pretty good. And I didn't receive any bug reports (about the reusable modules builder) yet. # Other potential improvement The are other two potential improvements can be done: 1. Scanning cache and a mechanism to get the required module information more quickly. (Like the module maps in https://github.com/ChuanqiXu9/clangd-for-modules) 2. Persist the module files. So that after we close the vscode and reopen it, we can reuse the built module files since the last invocation.
The
The following tests seem to have the invalid argument error and the argument unused warning:
And a few other tests suffer only the warning. The following diff seem to fix the problem:
The errors and warnings do not seem to impact the behavior of the test. Are the errors expected? |
The scanning failure is not expected and the ignored unused argument for And we'd better fix them in clangd instead of changing the test since the test mimics actual compilation databases. |
Thanks! I will open a Github issue. |
Alternatives to https://reviews.llvm.org/D153114.
Try to address clangd/clangd#1293.
See the links for design ideas and the consensus so far. We want to have some initial support in clang18.
This is the initial support for C++20 Modules in clangd.
As suggested by sammccall in https://reviews.llvm.org/D153114,
we should minimize the scope of the initial patch to make it easier
to review and understand so that every one are in the same page:
This patch reflects the above opinions.
Testing in real-world project
I tested this with a modularized library: https://github.com/alibaba/async_simple/tree/CXX20Modules. This library has 3 modules (async_simple, std and asio) and 65 module units. (Note that a module consists of multiple module units). Both
std
module andasio
module have 100k+ lines of code (maybe more, I didn't count). And async_simple itself has 8k lines of code. This is the scale of the project.The result shows that it works pretty well, ..., well, except I need to wait roughly 10s after opening/editing any file. And this falls in our expectations. We know it is hard to make it perfect in the first move.
What this patch does in detail
--experimental-modules-support
for the support for C++20 Modules. So that no matter how bad this is, it wouldn't affect current users. Following off the page, we'll assume the option is enabled.ModuleFilesInfo
andModuleDependencyScanner
. NowModuleDependencyScanner
is only used byModuleFilesInfo
.ModuleFilesInfo
records the built module files for specific single source file. The module files can only be built by the static member functionModuleFilesInfo::buildModuleFilesInfoFor(PathRef File, ...)
.PreambleData
adds a new member variable with typeModuleFilesInfo
. This refers to the needed module files for the current file. It means the module files info is part of the preamble, which is suggested in the first patch too.isPreambleCompatible()
, we add a call toModuleFilesInfo::CanReuse()
to check if the built module files are still up to date.What we need to do next
Let's split the TODOs into clang part and clangd part to make things more clear.
The TODOs in the clangd part include:
scheduling
,the possibility of BMI version conflicts
andvarious events that can invalidate the module graph
.<module-name> -> <module-unit-source>
map. Currently we always scan the whole project duringModuleFilesInfo::buildModuleFilesInfoFor(PathRef File, ...)
. This is clearly inefficient even if the scanning process is pretty fast. I think the potential solutions include:TODOs in the clang part include:
And it looks like we can say the support for C++20 Modules is initially workable after we made (1) and (2) (or even without (2)).
CC: @HighCommander4 @ilya-biryukov