Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[clangd] [Modules] Support Reusable Modules Builder #106683

Merged
merged 7 commits into from
Nov 12, 2024

Conversation

ChuanqiXu9
Copy link
Member

This is the following patch of #66462 to optimize its performance.

Motivation

To avoid data races, we choose "per file owns its dependent modules" model. That said, every TU will own all the required module files so that we don't need to worry about thread safety. And it looks like we succeeded that we focus on the interfaces and structure of modules support in clangd. But after all, this model is not good for performance. Image we have 10000 TUs import std, we will have 10000 std.pcm in the memory. That is terrible both in time and space.

Given the current modules support in clangd works pretty well (almost every issue report I received is more or less a clang's issue), I'd like to improve the performance.

High Level Changes

After this patch, the built module files will be owned by the module builder and each TU will only have a reference to the built module files.

The module builder have a map from module names to built module files. When a new TU ask for a module file, the module builder will check if the module file lives in the map and if the module file are up to date. If yes, the module file will be returned. If no, the module file entry would be erased in the module builder. We use shared_ptr<> to track module file here so that the other TU owning the out dated module file won't be affected. The out dated module file will be removed automatically if other TU gets update or closed.

(I know the out dated module file may not exist due to the CanReuse mechanism. But the design here is natural and can be seen as a redundant design to make it more robust.)

When we a build a module, we will use the mutex and the condition variable in the working thread to build it exclusively. All other threads that also want the module file would have to wait for that working thread. It might not sounds great but I think if we want to make it asynchronous, we have to refactor TUScheduler as far as I know.

Code Structure Changes

Thanks for the previous hard working reviewing, the interfaces almost don't change in this patch. Almost all the work are isolated in ModulesBuilder.cpp. A outliner is that we convert ModulesBuilder to an abstract class since the implementation class needs to own the module files.

And the core function to review is ReusableModulesBuilder::getOrBuildModuleFile. It implements the core logic to fetch the module file from the cache or build it if the module file is not in the cache or out of date. And other important entities are BuildingModuleMutexes, BuildingModuleCVs, BuildingModules and ModulesBuildingMutex. These are mutexes and condition variables to make sure the thread safety.

User experience

I've implemented this in our downstream and ask our users to use it. I also sent it https://github.com/ChuanqiXu9/clangd-for-modules here as pre-version. The feedbacks are pretty good. And I didn't receive any bug reports (about the reusable modules builder) yet.

Other potential improvement

The are other two potential improvements can be done:

  1. Scanning cache and a mechanism to get the required module information more quickly. (Like the module maps in https://github.com/ChuanqiXu9/clangd-for-modules)
  2. Persist the module files. So that after we close the vscode and reopen it, we can reuse the built module files since the last invocation.

@ChuanqiXu9 ChuanqiXu9 added clangd clang:modules C++20 modules and Clang Header Modules labels Aug 30, 2024
@ChuanqiXu9 ChuanqiXu9 self-assigned this Aug 30, 2024
@llvmbot
Copy link
Member

llvmbot commented Aug 30, 2024

@llvm/pr-subscribers-clang-tools-extra
@llvm/pr-subscribers-clangd

@llvm/pr-subscribers-clang-modules

Author: Chuanqi Xu (ChuanqiXu9)

Changes

This is the following patch of #66462 to optimize its performance.

Motivation

To avoid data races, we choose "per file owns its dependent modules" model. That said, every TU will own all the required module files so that we don't need to worry about thread safety. And it looks like we succeeded that we focus on the interfaces and structure of modules support in clangd. But after all, this model is not good for performance. Image we have 10000 TUs import std, we will have 10000 std.pcm in the memory. That is terrible both in time and space.

Given the current modules support in clangd works pretty well (almost every issue report I received is more or less a clang's issue), I'd like to improve the performance.

High Level Changes

After this patch, the built module files will be owned by the module builder and each TU will only have a reference to the built module files.

The module builder have a map from module names to built module files. When a new TU ask for a module file, the module builder will check if the module file lives in the map and if the module file are up to date. If yes, the module file will be returned. If no, the module file entry would be erased in the module builder. We use shared_ptr&lt;&gt; to track module file here so that the other TU owning the out dated module file won't be affected. The out dated module file will be removed automatically if other TU gets update or closed.

(I know the out dated module file may not exist due to the CanReuse mechanism. But the design here is natural and can be seen as a redundant design to make it more robust.)

When we a build a module, we will use the mutex and the condition variable in the working thread to build it exclusively. All other threads that also want the module file would have to wait for that working thread. It might not sounds great but I think if we want to make it asynchronous, we have to refactor TUScheduler as far as I know.

Code Structure Changes

Thanks for the previous hard working reviewing, the interfaces almost don't change in this patch. Almost all the work are isolated in ModulesBuilder.cpp. A outliner is that we convert ModulesBuilder to an abstract class since the implementation class needs to own the module files.

And the core function to review is ReusableModulesBuilder::getOrBuildModuleFile. It implements the core logic to fetch the module file from the cache or build it if the module file is not in the cache or out of date. And other important entities are BuildingModuleMutexes, BuildingModuleCVs, BuildingModules and ModulesBuildingMutex. These are mutexes and condition variables to make sure the thread safety.

User experience

I've implemented this in our downstream and ask our users to use it. I also sent it https://github.com/ChuanqiXu9/clangd-for-modules here as pre-version. The feedbacks are pretty good. And I didn't receive any bug reports (about the reusable modules builder) yet.

Other potential improvement

The are other two potential improvements can be done:

  1. Scanning cache and a mechanism to get the required module information more quickly. (Like the module maps in https://github.com/ChuanqiXu9/clangd-for-modules)
  2. Persist the module files. So that after we close the vscode and reopen it, we can reuse the built module files since the last invocation.

Patch is 27.29 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/106683.diff

6 Files Affected:

  • (modified) clang-tools-extra/clangd/ClangdLSPServer.cpp (+2-2)
  • (modified) clang-tools-extra/clangd/ClangdLSPServer.h (+1-1)
  • (modified) clang-tools-extra/clangd/ModulesBuilder.cpp (+294-63)
  • (modified) clang-tools-extra/clangd/ModulesBuilder.h (+6-5)
  • (modified) clang-tools-extra/clangd/tool/Check.cpp (+3-3)
  • (modified) clang-tools-extra/clangd/unittests/PrerequisiteModulesTest.cpp (+56-14)
diff --git a/clang-tools-extra/clangd/ClangdLSPServer.cpp b/clang-tools-extra/clangd/ClangdLSPServer.cpp
index 06573a57554245..53a18bd4ecd1a7 100644
--- a/clang-tools-extra/clangd/ClangdLSPServer.cpp
+++ b/clang-tools-extra/clangd/ClangdLSPServer.cpp
@@ -567,8 +567,8 @@ void ClangdLSPServer::onInitialize(const InitializeParams &Params,
               std::move(Mangler));
 
   if (Opts.EnableExperimentalModulesSupport) {
-    ModulesManager.emplace(*CDB);
-    Opts.ModulesManager = &*ModulesManager;
+    ModulesManager = ModulesBuilder::getModulesBuilder(*CDB);
+    Opts.ModulesManager = ModulesManager.get();
   }
 
   {
diff --git a/clang-tools-extra/clangd/ClangdLSPServer.h b/clang-tools-extra/clangd/ClangdLSPServer.h
index 0b8e4720f53236..4b8f048b430fe9 100644
--- a/clang-tools-extra/clangd/ClangdLSPServer.h
+++ b/clang-tools-extra/clangd/ClangdLSPServer.h
@@ -327,7 +327,7 @@ class ClangdLSPServer : private ClangdServer::Callbacks,
   // The ClangdServer is created by the "initialize" LSP method.
   std::optional<ClangdServer> Server;
   // Manages to build module files.
-  std::optional<ModulesBuilder> ModulesManager;
+  std::unique_ptr<ModulesBuilder> ModulesManager;
 };
 } // namespace clangd
 } // namespace clang
diff --git a/clang-tools-extra/clangd/ModulesBuilder.cpp b/clang-tools-extra/clangd/ModulesBuilder.cpp
index 1eeff468ef1236..82aa7d181f3f5e 100644
--- a/clang-tools-extra/clangd/ModulesBuilder.cpp
+++ b/clang-tools-extra/clangd/ModulesBuilder.cpp
@@ -12,6 +12,7 @@
 #include "clang/Frontend/FrontendAction.h"
 #include "clang/Frontend/FrontendActions.h"
 #include "clang/Serialization/ASTReader.h"
+#include "llvm/ADT/ScopeExit.h"
 
 namespace clang {
 namespace clangd {
@@ -173,33 +174,27 @@ bool IsModuleFilesUpToDate(
   });
 }
 
-// StandalonePrerequisiteModules - stands for PrerequisiteModules for which all
+// ReusablePrerequisiteModules - stands for PrerequisiteModules for which all
 // the required modules are built successfully. All the module files
-// are owned by the StandalonePrerequisiteModules class.
-//
-// Any of the built module files won't be shared with other instances of the
-// class. So that we can avoid worrying thread safety.
-//
-// We don't need to worry about duplicated module names here since the standard
-// guarantees the module names should be unique to a program.
-class StandalonePrerequisiteModules : public PrerequisiteModules {
+// are owned by the modules builder.
+class ReusablePrerequisiteModules : public PrerequisiteModules {
 public:
-  StandalonePrerequisiteModules() = default;
+  ReusablePrerequisiteModules() = default;
 
-  StandalonePrerequisiteModules(const StandalonePrerequisiteModules &) = delete;
-  StandalonePrerequisiteModules
-  operator=(const StandalonePrerequisiteModules &) = delete;
-  StandalonePrerequisiteModules(StandalonePrerequisiteModules &&) = delete;
-  StandalonePrerequisiteModules
-  operator=(StandalonePrerequisiteModules &&) = delete;
+  ReusablePrerequisiteModules(const ReusablePrerequisiteModules &) = delete;
+  ReusablePrerequisiteModules
+  operator=(const ReusablePrerequisiteModules &) = delete;
+  ReusablePrerequisiteModules(ReusablePrerequisiteModules &&) = delete;
+  ReusablePrerequisiteModules
+  operator=(ReusablePrerequisiteModules &&) = delete;
 
-  ~StandalonePrerequisiteModules() override = default;
+  ~ReusablePrerequisiteModules() override = default;
 
   void adjustHeaderSearchOptions(HeaderSearchOptions &Options) const override {
     // Appending all built module files.
     for (auto &RequiredModule : RequiredModules)
       Options.PrebuiltModuleFiles.insert_or_assign(
-          RequiredModule.ModuleName, RequiredModule.ModuleFilePath);
+          RequiredModule->ModuleName, RequiredModule->ModuleFilePath);
   }
 
   bool canReuse(const CompilerInvocation &CI,
@@ -209,54 +204,31 @@ class StandalonePrerequisiteModules : public PrerequisiteModules {
     return BuiltModuleNames.contains(ModuleName);
   }
 
-  void addModuleFile(llvm::StringRef ModuleName,
-                     llvm::StringRef ModuleFilePath) {
-    RequiredModules.emplace_back(ModuleName, ModuleFilePath);
-    BuiltModuleNames.insert(ModuleName);
+  void addModuleFile(std::shared_ptr<ModuleFile> BMI) {
+    BuiltModuleNames.insert(BMI->ModuleName);
+    RequiredModules.emplace_back(std::move(BMI));
   }
 
 private:
-  llvm::SmallVector<ModuleFile, 8> RequiredModules;
+  mutable llvm::SmallVector<std::shared_ptr<ModuleFile>, 8> RequiredModules;
   // A helper class to speedup the query if a module is built.
   llvm::StringSet<> BuiltModuleNames;
 };
 
-// Build a module file for module with `ModuleName`. The information of built
-// module file are stored in \param BuiltModuleFiles.
-llvm::Error buildModuleFile(llvm::StringRef ModuleName,
+/// Build a module file for module with `ModuleName`. The information of built
+/// module file are stored in \param BuiltModuleFiles.
+llvm::Expected<ModuleFile> buildModuleFile(llvm::StringRef ModuleName,
+                            PathRef ModuleUnitFileName,
                             const GlobalCompilationDatabase &CDB,
-                            const ThreadsafeFS &TFS, ProjectModules &MDB,
+                            const ThreadsafeFS &TFS,
                             PathRef ModuleFilesPrefix,
-                            StandalonePrerequisiteModules &BuiltModuleFiles) {
-  if (BuiltModuleFiles.isModuleUnitBuilt(ModuleName))
-    return llvm::Error::success();
-
-  PathRef ModuleUnitFileName = MDB.getSourceForModuleName(ModuleName);
-  // It is possible that we're meeting third party modules (modules whose
-  // source are not in the project. e.g, the std module may be a third-party
-  // module for most projects) or something wrong with the implementation of
-  // ProjectModules.
-  // FIXME: How should we treat third party modules here? If we want to ignore
-  // third party modules, we should return true instead of false here.
-  // Currently we simply bail out.
-  if (ModuleUnitFileName.empty())
-    return llvm::createStringError("Failed to get the primary source");
-
+                            const ReusablePrerequisiteModules &BuiltModuleFiles) {
   // Try cheap operation earlier to boil-out cheaply if there are problems.
   auto Cmd = CDB.getCompileCommand(ModuleUnitFileName);
   if (!Cmd)
     return llvm::createStringError(
         llvm::formatv("No compile command for {0}", ModuleUnitFileName));
 
-  for (auto &RequiredModuleName : MDB.getRequiredModules(ModuleUnitFileName)) {
-    // Return early if there are errors building the module file.
-    if (llvm::Error Err = buildModuleFile(RequiredModuleName, CDB, TFS, MDB,
-                                          ModuleFilesPrefix, BuiltModuleFiles))
-      return llvm::createStringError(
-          llvm::formatv("Failed to build dependency {0}: {1}",
-                        RequiredModuleName, llvm::toString(std::move(Err))));
-  }
-
   Cmd->Output = getModuleFilePath(ModuleName, ModuleFilesPrefix);
 
   ParseInputs Inputs;
@@ -297,14 +269,164 @@ llvm::Error buildModuleFile(llvm::StringRef ModuleName,
   if (Clang->getDiagnostics().hasErrorOccurred())
     return llvm::createStringError("Compilation failed");
 
-  BuiltModuleFiles.addModuleFile(ModuleName, Inputs.CompileCommand.Output);
-  return llvm::Error::success();
+  return ModuleFile{ModuleName, Inputs.CompileCommand.Output};
+}
+
+class ReusableModulesBuilder : public ModulesBuilder {
+public:
+  ReusableModulesBuilder(const GlobalCompilationDatabase &CDB) : CDB(CDB) {}
+
+  ReusableModulesBuilder(const ReusableModulesBuilder &) = delete;
+  ReusableModulesBuilder(ReusableModulesBuilder &&) = delete;
+
+  ReusableModulesBuilder &operator=(const ReusableModulesBuilder &) = delete;
+  ReusableModulesBuilder &operator=(ReusableModulesBuilder &&) = delete;
+
+  std::unique_ptr<PrerequisiteModules>
+  buildPrerequisiteModulesFor(PathRef File, const ThreadsafeFS &TFS) override;
+
+private:
+  llvm::Error getOrBuildModuleFile(StringRef ModuleName, const ThreadsafeFS &TFS,
+                            ProjectModules &MDB,
+                            ReusablePrerequisiteModules &RequiredModules);
+
+  std::shared_ptr<ModuleFile>
+  getValidModuleFile(StringRef ModuleName, ProjectModules &MDB,
+                     const ThreadsafeFS &TFS,
+                     PrerequisiteModules &BuiltModuleFiles);
+  /// This should only be called by getValidModuleFile. This is unlocked version
+  /// of getValidModuleFile. The function is extracted to avoid dead locks when
+  /// recursing.
+  std::shared_ptr<ModuleFile>
+  isValidModuleFileUnlocked(StringRef ModuleName, ProjectModules &MDB,
+                            const ThreadsafeFS &TFS,
+                            PrerequisiteModules &BuiltModuleFiles);
+
+  llvm::StringMap<std::shared_ptr<ModuleFile>> ModuleFiles;
+  // Mutex to guard accesses to ModuleFiles.
+  std::mutex ModuleFilesMutex;
+
+  // We should only build a unique module at most at the same time.
+  // When we want to build a module, use the mutex to lock it and use the
+  // condition variable to notify other threads the status of the build results.
+  //
+  // Store the mutex and the condition_variable in shared_ptr since they may be
+  // accessed by many threads.
+  llvm::StringMap<std::shared_ptr<std::mutex>> BuildingModuleMutexes;
+  llvm::StringMap<std::shared_ptr<std::condition_variable>> BuildingModuleCVs;
+  // The building modules set. A successed built module or a failed module or
+  // an unbuilt module shouldn't be in this set.
+  // This set is helpful to control the behavior of the condition variables.
+  llvm::StringSet<> BuildingModules;
+  // Lock when we access BuildingModules, BuildingModuleMutexes and BuildingModuleCVs.
+  std::mutex ModulesBuildingMutex;
+
+  void startBuildingModule(StringRef ModuleName) {
+    std::lock_guard<std::mutex> _(ModulesBuildingMutex);
+    BuildingModules.insert(ModuleName);
+  }
+  void endBuildingModule(StringRef ModuleName) {
+    std::lock_guard<std::mutex> _(ModulesBuildingMutex);
+    BuildingModules.erase(ModuleName);
+  }
+  bool isBuildingModule(StringRef ModuleName) {
+    std::lock_guard<std::mutex> _(ModulesBuildingMutex);
+    return BuildingModules.contains(ModuleName);
+  }
+
+  /// An RAII object to guard the process to build a specific module.
+  struct ModuleBuildingSharedOwner {
+  public:
+    ModuleBuildingSharedOwner(StringRef ModuleName,
+                              std::shared_ptr<std::mutex> &Mutex,
+                              std::shared_ptr<std::condition_variable> &CV,
+                              ReusableModulesBuilder &Builder)
+        : ModuleName(ModuleName), Mutex(Mutex), CV(CV), Builder(Builder) {
+      IsFirstTask = (Mutex.use_count() == 2);
+    }
+
+    ~ModuleBuildingSharedOwner();
+
+    bool isUniqueBuildingOwner() { return IsFirstTask; }
+
+    std::mutex &getMutex() { return *Mutex; }
+
+    std::condition_variable &getCV() { return *CV; }
+
+  private:
+    StringRef ModuleName;
+    std::shared_ptr<std::mutex> Mutex;
+    std::shared_ptr<std::condition_variable> CV;
+    ReusableModulesBuilder &Builder;
+    bool IsFirstTask;
+  };
+
+  ModuleBuildingSharedOwner
+  getOrCreateModuleBuildingOwner(StringRef ModuleName);
+
+  const GlobalCompilationDatabase &CDB;
+};
+
+ReusableModulesBuilder::ModuleBuildingSharedOwner::
+    ~ModuleBuildingSharedOwner() {
+  std::lock_guard<std::mutex> _(Builder.ModulesBuildingMutex);
+
+  Mutex.reset();
+  CV.reset();
+
+  // Try to release the memory in builder if possible.
+  if (auto Iter = Builder.BuildingModuleCVs.find(ModuleName);
+      Iter != Builder.BuildingModuleCVs.end() &&
+      Iter->getValue().use_count() == 1)
+    Builder.BuildingModuleCVs.erase(Iter);
+
+  if (auto Iter = Builder.BuildingModuleMutexes.find(ModuleName);
+      Iter != Builder.BuildingModuleMutexes.end() &&
+      Iter->getValue().use_count() == 1)
+    Builder.BuildingModuleMutexes.erase(Iter);
+}
+
+std::shared_ptr<ModuleFile> ReusableModulesBuilder::isValidModuleFileUnlocked(
+    StringRef ModuleName, ProjectModules &MDB, const ThreadsafeFS &TFS,
+    PrerequisiteModules &BuiltModuleFiles) {
+  auto Iter = ModuleFiles.find(ModuleName);
+  if (Iter != ModuleFiles.end()) {
+    if (!IsModuleFileUpToDate(Iter->second->ModuleFilePath, BuiltModuleFiles)) {
+      log("Found not-up-date module file {0} for module {1} in cache",
+          Iter->second->ModuleFilePath, ModuleName);
+      ModuleFiles.erase(Iter);
+      return nullptr;
+    }
+
+    if (llvm::any_of(
+            MDB.getRequiredModules(MDB.getSourceForModuleName(ModuleName)),
+            [&MDB, &TFS, &BuiltModuleFiles, this](auto &&RequiredModuleName) {
+              return !isValidModuleFileUnlocked(RequiredModuleName, MDB, TFS,
+                                                BuiltModuleFiles);
+            })) {
+      ModuleFiles.erase(Iter);
+      return nullptr;
+    }
+
+    return Iter->second;
+  }
+
+  log("Don't find {0} in cache", ModuleName);
+
+  return nullptr;
+}
+
+std::shared_ptr<ModuleFile> ReusableModulesBuilder::getValidModuleFile(
+    StringRef ModuleName, ProjectModules &MDB, const ThreadsafeFS &TFS,
+    PrerequisiteModules &BuiltModuleFiles) {
+  std::lock_guard<std::mutex> _(ModuleFilesMutex);
+
+  return isValidModuleFileUnlocked(ModuleName, MDB, TFS, BuiltModuleFiles);
 }
-} // namespace
 
 std::unique_ptr<PrerequisiteModules>
-ModulesBuilder::buildPrerequisiteModulesFor(PathRef File,
-                                            const ThreadsafeFS &TFS) const {
+ReusableModulesBuilder::buildPrerequisiteModulesFor(PathRef File,
+                                            const ThreadsafeFS &TFS) {
   std::unique_ptr<ProjectModules> MDB = CDB.getProjectModules(File);
   if (!MDB) {
     elog("Failed to get Project Modules information for {0}", File);
@@ -313,20 +435,19 @@ ModulesBuilder::buildPrerequisiteModulesFor(PathRef File,
 
   std::vector<std::string> RequiredModuleNames = MDB->getRequiredModules(File);
   if (RequiredModuleNames.empty())
-    return std::make_unique<StandalonePrerequisiteModules>();
+    return std::make_unique<ReusablePrerequisiteModules>();
 
   llvm::SmallString<256> ModuleFilesPrefix = getUniqueModuleFilesPath(File);
 
   log("Trying to build required modules for {0} in {1}", File,
       ModuleFilesPrefix);
 
-  auto RequiredModules = std::make_unique<StandalonePrerequisiteModules>();
+  auto RequiredModules = std::make_unique<ReusablePrerequisiteModules>();
 
   for (llvm::StringRef RequiredModuleName : RequiredModuleNames) {
     // Return early if there is any error.
-    if (llvm::Error Err =
-            buildModuleFile(RequiredModuleName, CDB, TFS, *MDB.get(),
-                            ModuleFilesPrefix, *RequiredModules.get())) {
+    if (llvm::Error Err = 
+        getOrBuildModuleFile(RequiredModuleName, TFS, *MDB.get(), *RequiredModules.get())) {
       elog("Failed to build module {0}; due to {1}", RequiredModuleName,
            toString(std::move(Err)));
       return std::make_unique<FailedPrerequisiteModules>();
@@ -338,7 +459,111 @@ ModulesBuilder::buildPrerequisiteModulesFor(PathRef File,
   return std::move(RequiredModules);
 }
 
-bool StandalonePrerequisiteModules::canReuse(
+ReusableModulesBuilder::ModuleBuildingSharedOwner
+ReusableModulesBuilder::getOrCreateModuleBuildingOwner(
+    StringRef ModuleName) {
+  std::lock_guard<std::mutex> _(ModulesBuildingMutex);
+
+  auto MutexIter = BuildingModuleMutexes.find(ModuleName);
+  if (MutexIter == BuildingModuleMutexes.end())
+    MutexIter = BuildingModuleMutexes
+                    .try_emplace(ModuleName, std::make_shared<std::mutex>())
+                    .first;
+
+  auto CVIter = BuildingModuleCVs.find(ModuleName);
+  if (CVIter == BuildingModuleCVs.end())
+    CVIter = BuildingModuleCVs
+                 .try_emplace(ModuleName,
+                              std::make_shared<std::condition_variable>())
+                 .first;
+
+  return ModuleBuildingSharedOwner(ModuleName, MutexIter->getValue(),
+                                   CVIter->getValue(), *this);
+}
+
+llvm::Error ReusableModulesBuilder::getOrBuildModuleFile(
+    StringRef ModuleName, const ThreadsafeFS &TFS, ProjectModules &MDB,
+    ReusablePrerequisiteModules &BuiltModuleFiles) {
+  if (BuiltModuleFiles.isModuleUnitBuilt(ModuleName))
+    return llvm::Error::success();
+
+  PathRef ModuleUnitFileName = MDB.getSourceForModuleName(ModuleName);
+  /// It is possible that we're meeting third party modules (modules whose
+  /// source are not in the project. e.g, the std module may be a third-party
+  /// module for most project) or something wrong with the implementation of
+  /// ProjectModules.
+  /// FIXME: How should we treat third party modules here? If we want to ignore
+  /// third party modules, we should return true instead of false here.
+  /// Currently we simply bail out.
+  if (ModuleUnitFileName.empty())
+    return llvm::createStringError(llvm::formatv("Don't get the module unit for module {0}", ModuleName));
+
+  for (auto &RequiredModuleName : MDB.getRequiredModules(ModuleUnitFileName))
+    // Return early if there are errors building the module file.
+    if (!getOrBuildModuleFile(RequiredModuleName, TFS, MDB, BuiltModuleFiles))
+      return llvm::createStringError(llvm::formatv("Failed to build module {0}",
+                                                   RequiredModuleName));
+
+  if (std::shared_ptr<ModuleFile> Cached =
+          getValidModuleFile(ModuleName, MDB, TFS, BuiltModuleFiles)) {
+    log("Reusing module {0} from {1}", ModuleName, Cached->ModuleFilePath);
+    BuiltModuleFiles.addModuleFile(Cached);
+    return llvm::Error::success();
+  }
+
+  ModuleBuildingSharedOwner ModuleBuildingOwner =
+      getOrCreateModuleBuildingOwner(ModuleName);
+
+  std::condition_variable &CV = ModuleBuildingOwner.getCV();
+  std::unique_lock<std::mutex> lk(ModuleBuildingOwner.getMutex());
+  if (!ModuleBuildingOwner.isUniqueBuildingOwner()) {
+    log("Waiting other task for module {0}", ModuleName);
+    CV.wait(lk, [this, ModuleName] { return !isBuildingModule(ModuleName); });
+
+    // Try to access the built module files from other threads manually.
+    // We don't call getValidModuleFile here since it may be too heavy.
+    std::lock_guard<std::mutex> _(ModuleFilesMutex);
+    auto Iter = ModuleFiles.find(ModuleName);
+    if (Iter != ModuleFiles.end()) {
+      log("Got module file from other task building {0}", ModuleName);
+      BuiltModuleFiles.addModuleFile(Iter->second);
+      return llvm::Error::success();
+    }
+
+    // If the module file is not in the cache, it indicates that the building
+    // from other thread failed, so we give up earlier in this case to avoid
+    // wasting time.
+    return llvm::createStringError(llvm::formatv("The module file {0} may be failed to build in other thread.",
+                                   ModuleName));
+  }
+
+  log("Building module {0}", ModuleName);
+  startBuildingModule(ModuleName);
+
+  auto _ = llvm::make_scope_exit([&]() {
+    endBuildingModule(ModuleName);
+    CV.notify_all();
+  });
+
+  llvm::SmallString<256> ModuleFilesPrefix =
+      getUniqueModuleFilesPath(ModuleUnitFileName);
+
+  llvm::Expected<ModuleFile> MF = buildModuleFile(ModuleName, ModuleUnitFileName, CDB, TFS,
+                      ModuleFilesPrefix, BuiltModuleFiles);
+  if (llvm::Error Err = MF.takeError())
+    return Err;
+
+  log("Built module {0} to {1}", ModuleName, MF->ModuleFilePath);
+
+  std::lock_guard<std::mutex> __(ModuleFilesMutex);
+  auto BuiltModuleFile = std::make_shared<ModuleFile>(std::move(*MF));
+  ModuleFiles.insert_or_assign(ModuleName, BuiltModuleFile);
+  BuiltModuleFiles.addModuleFile(std::move(BuiltModuleFile));
+
+  return llvm::Error::success();
+}
+
+bool ReusablePrerequisiteModules::canReuse(
     const CompilerInvocation &CI,
     llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem> VFS) const {
   if (RequiredModules.empty())
@@ -346,9 +571,15 @@ bool StandalonePrerequisiteModules::canReuse(
 
   SmallVector<StringRef> BMIPaths;
   for (auto &MF : RequiredModules)
-    BMIPaths.push_back(MF.ModuleFilePath);
+    BMIPaths.push_back(MF->ModuleFilePath);
   return IsModuleFilesUpToDate(BMIPaths, *this);
 }
+} // namespace
+
+std::unique_ptr<ModulesBuilder>
+ModulesBuilder::getModulesBuilder(const GlobalCompilationDatabase &CDB) {
+  return std::make_unique<ReusableModulesBu...
[truncated]

Copy link

github-actions bot commented Aug 30, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@ChuanqiXu9 ChuanqiXu9 force-pushed the ReusableModulesBuilder branch from 1cd6f55 to facbe53 Compare August 30, 2024 08:18
@ChuanqiXu9
Copy link
Member Author

@sam-mccall @kadircet @HighCommander4 ping~

@ChuanqiXu9
Copy link
Member Author

@ChuanqiXu9
Copy link
Member Author

@kadircet ping~ It will be pretty good for the users if we can have this for clang20.

@ChuanqiXu9
Copy link
Member Author

I'd like to land this in 2 weeks if no more comments came in. Given:

  • In our downstream, we've landed this patch for more than a year and it seems running well. And also in the open source world, I tried to send it to https://github.com/ChuanqiXu9/clangd-for-modules . Everyone experienced it didn't complain about it for bugs. So I believe this is well tested.
  • There are other following patches and I really wish we can have something workable in the next release.
  • This patch doesn't change the shape of modules support in clangd. The major change lives in ModulesBuilder.cpp, which we can think it as an implementation detail instead of some global things.

For policy related comments, please sent it to https://discourse.llvm.org/t/rfc-directions-for-modules-support-in-clangd/78072/6

@ChuanqiXu9 ChuanqiXu9 force-pushed the ReusableModulesBuilder branch from facbe53 to 15aa6af Compare October 31, 2024 03:43
Copy link
Member

@kadircet kadircet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks I think this LG in terms of module-builder interfaces, but I think we can make some more changes to implementation to ensure it's easier to maintain going forward.

speaking of maintenance, @HighCommander4 is definitely doing more of that than me recently. so i'd like to make sure he's also ok with this new complexity. i also would like to highlight @ChuanqiXu9 is definitely a responsible member of community, hence i believe he'll also be around if we have bug reports about this bits.

@ChuanqiXu9
Copy link
Member Author

thanks I think this LG in terms of module-builder interfaces, but I think we can make some more changes to implementation to ensure it's easier to maintain going forward.

speaking of maintenance, @HighCommander4 is definitely doing more of that than me recently. so i'd like to make sure he's also ok with this new complexity. i also would like to highlight @ChuanqiXu9 is definitely a responsible member of community, hence i believe he'll also be around if we have bug reports about this bits.

Thanks : )

@ChuanqiXu9 ChuanqiXu9 force-pushed the ReusableModulesBuilder branch from 7421a61 to 4e0f1e3 Compare November 12, 2024 06:02
Copy link
Member

@kadircet kadircet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, i think this looks great!

some little comments/nits.

Comment on lines 355 to 359
auto Iter = ModuleFiles.find(ModuleName);
if (Iter == ModuleFiles.end())
return;

ModuleFiles.erase(Iter);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just Modulefiles.erase(ModuleName) ?

Comment on lines 468 to 471
llvm::SmallString<256> ModuleFilesPrefix = getUniqueModuleFilesPath(File);

log("Trying to build required modules for {0} in {1}", File,
ModuleFilesPrefix);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is no longer the case, you can drop this (and the following log after the build)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member Author

@ChuanqiXu9 ChuanqiXu9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reviewing!

Comment on lines 468 to 471
llvm::SmallString<256> ModuleFilesPrefix = getUniqueModuleFilesPath(File);

log("Trying to build required modules for {0} in {1}", File,
ModuleFilesPrefix);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

@kadircet kadircet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks a lot for bearing with me! LGTM, let's ship it!

@ChuanqiXu9
Copy link
Member Author

Thank you very much !

@ChuanqiXu9 ChuanqiXu9 merged commit e385e0d into llvm:main Nov 12, 2024
5 of 7 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Nov 12, 2024

LLVM Buildbot has detected a new failure on builder libc-aarch64-ubuntu-fullbuild-dbg running on libc-aarch64-ubuntu while building clang-tools-extra at step 4 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/71/builds/10350

Here is the relevant piece of the build log for the reference
Step 4 (annotate) failure: 'python ../llvm-zorg/zorg/buildbot/builders/annotated/libc-linux.py ...' (failure)
...
[       OK ] LlvmLibcStatTest.NonExistentFile (3 us)
Ran 2 tests.  PASS: 2  FAIL: 0
[757/881] Running unit test libc.test.src.sys.stat.fstat_test
[==========] Running 2 tests from 1 test suite.
[ RUN      ] LlvmLibcFStatTest.CreatAndReadMode
[       OK ] LlvmLibcFStatTest.CreatAndReadMode (69 us)
[ RUN      ] LlvmLibcFStatTest.NonExistentFile
[       OK ] LlvmLibcFStatTest.NonExistentFile (2 us)
Ran 2 tests.  PASS: 2  FAIL: 0
[758/881] Running unit test libc.test.src.sys.statvfs.linux.statvfs_test
FAILED: projects/libc/test/src/sys/statvfs/linux/CMakeFiles/libc.test.src.sys.statvfs.linux.statvfs_test 
cd /home/libc-buildbot/libc-aarch64-ubuntu/libc-aarch64-ubuntu-fullbuild-dbg/build/projects/libc/test/src/sys/statvfs/linux && /home/libc-buildbot/libc-aarch64-ubuntu/libc-aarch64-ubuntu-fullbuild-dbg/build/projects/libc/test/src/sys/statvfs/linux/libc.test.src.sys.statvfs.linux.statvfs_test.__build__
[==========] Running 2 tests from 1 test suite.
[ RUN      ] LlvmLibcSysStatvfsTest.StatvfsBasic
[       OK ] LlvmLibcSysStatvfsTest.StatvfsBasic (10 us)
[ RUN      ] LlvmLibcSysStatvfsTest.StatvfsInvalidPath
/home/libc-buildbot/libc-aarch64-ubuntu/libc-aarch64-ubuntu-fullbuild-dbg/llvm-project/libc/test/src/sys/statvfs/linux/statvfs_test.cpp:37: FAILURE
Failed to match LIBC_NAMESPACE::mkdirat(AT_FDCWD, TEST_DIR, S_IRWXU) against Succeeds(0).
Expected return value to be equal to 0 but got -1.
Expected errno to be equal to "Success" but got "File exists".
[  FAILED  ] LlvmLibcSysStatvfsTest.StatvfsInvalidPath
Ran 2 tests.  PASS: 1  FAIL: 1
[759/881] Running unit test libc.test.src.sys.stat.lstat_test
[==========] Running 2 tests from 1 test suite.
[ RUN      ] LlvmLibcLStatTest.CreatAndReadMode
[       OK ] LlvmLibcLStatTest.CreatAndReadMode (57 us)
[ RUN      ] LlvmLibcLStatTest.NonExistentFile
[       OK ] LlvmLibcLStatTest.NonExistentFile (3 us)
Ran 2 tests.  PASS: 2  FAIL: 0
[760/881] Running unit test libc.test.src.sys.statvfs.linux.fstatvfs_test
[==========] Running 2 tests from 1 test suite.
[ RUN      ] LlvmLibcSysFStatvfsTest.FStatvfsBasic
[       OK ] LlvmLibcSysFStatvfsTest.FStatvfsBasic (13 us)
[ RUN      ] LlvmLibcSysFStatvfsTest.FStatvfsInvalidPath
[       OK ] LlvmLibcSysFStatvfsTest.FStatvfsInvalidPath (78 us)
Ran 2 tests.  PASS: 2  FAIL: 0
[761/881] Running unit test libc.test.src.string.bcopy_test.__unit__
[==========] Running 7 tests from 1 test suite.
[ RUN      ] LlvmLibcBcopyTest.MoveZeroByte
[       OK ] LlvmLibcBcopyTest.MoveZeroByte (1 us)
[ RUN      ] LlvmLibcBcopyTest.DstAndSrcPointToSameAddress
[       OK ] LlvmLibcBcopyTest.DstAndSrcPointToSameAddress (1 us)
[ RUN      ] LlvmLibcBcopyTest.DstStartsBeforeSrc
[       OK ] LlvmLibcBcopyTest.DstStartsBeforeSrc (0 ns)
[ RUN      ] LlvmLibcBcopyTest.DstStartsAfterSrc
[       OK ] LlvmLibcBcopyTest.DstStartsAfterSrc (1 us)
[ RUN      ] LlvmLibcBcopyTest.SrcFollowDst
[       OK ] LlvmLibcBcopyTest.SrcFollowDst (1 us)
[ RUN      ] LlvmLibcBcopyTest.DstFollowSrc
Step 8 (libc-unit-tests) failure: libc-unit-tests (failure)
...
[       OK ] LlvmLibcStatTest.NonExistentFile (3 us)
Ran 2 tests.  PASS: 2  FAIL: 0
[757/881] Running unit test libc.test.src.sys.stat.fstat_test
[==========] Running 2 tests from 1 test suite.
[ RUN      ] LlvmLibcFStatTest.CreatAndReadMode
[       OK ] LlvmLibcFStatTest.CreatAndReadMode (69 us)
[ RUN      ] LlvmLibcFStatTest.NonExistentFile
[       OK ] LlvmLibcFStatTest.NonExistentFile (2 us)
Ran 2 tests.  PASS: 2  FAIL: 0
[758/881] Running unit test libc.test.src.sys.statvfs.linux.statvfs_test
FAILED: projects/libc/test/src/sys/statvfs/linux/CMakeFiles/libc.test.src.sys.statvfs.linux.statvfs_test 
cd /home/libc-buildbot/libc-aarch64-ubuntu/libc-aarch64-ubuntu-fullbuild-dbg/build/projects/libc/test/src/sys/statvfs/linux && /home/libc-buildbot/libc-aarch64-ubuntu/libc-aarch64-ubuntu-fullbuild-dbg/build/projects/libc/test/src/sys/statvfs/linux/libc.test.src.sys.statvfs.linux.statvfs_test.__build__
[==========] Running 2 tests from 1 test suite.
[ RUN      ] LlvmLibcSysStatvfsTest.StatvfsBasic
[       OK ] LlvmLibcSysStatvfsTest.StatvfsBasic (10 us)
[ RUN      ] LlvmLibcSysStatvfsTest.StatvfsInvalidPath
/home/libc-buildbot/libc-aarch64-ubuntu/libc-aarch64-ubuntu-fullbuild-dbg/llvm-project/libc/test/src/sys/statvfs/linux/statvfs_test.cpp:37: FAILURE
Failed to match LIBC_NAMESPACE::mkdirat(AT_FDCWD, TEST_DIR, S_IRWXU) against Succeeds(0).
Expected return value to be equal to 0 but got -1.
Expected errno to be equal to "Success" but got "File exists".
[  FAILED  ] LlvmLibcSysStatvfsTest.StatvfsInvalidPath
Ran 2 tests.  PASS: 1  FAIL: 1
[759/881] Running unit test libc.test.src.sys.stat.lstat_test
[==========] Running 2 tests from 1 test suite.
[ RUN      ] LlvmLibcLStatTest.CreatAndReadMode
[       OK ] LlvmLibcLStatTest.CreatAndReadMode (57 us)
[ RUN      ] LlvmLibcLStatTest.NonExistentFile
[       OK ] LlvmLibcLStatTest.NonExistentFile (3 us)
Ran 2 tests.  PASS: 2  FAIL: 0
[760/881] Running unit test libc.test.src.sys.statvfs.linux.fstatvfs_test
[==========] Running 2 tests from 1 test suite.
[ RUN      ] LlvmLibcSysFStatvfsTest.FStatvfsBasic
[       OK ] LlvmLibcSysFStatvfsTest.FStatvfsBasic (13 us)
[ RUN      ] LlvmLibcSysFStatvfsTest.FStatvfsInvalidPath
[       OK ] LlvmLibcSysFStatvfsTest.FStatvfsInvalidPath (78 us)
Ran 2 tests.  PASS: 2  FAIL: 0
[761/881] Running unit test libc.test.src.string.bcopy_test.__unit__
[==========] Running 7 tests from 1 test suite.
[ RUN      ] LlvmLibcBcopyTest.MoveZeroByte
[       OK ] LlvmLibcBcopyTest.MoveZeroByte (1 us)
[ RUN      ] LlvmLibcBcopyTest.DstAndSrcPointToSameAddress
[       OK ] LlvmLibcBcopyTest.DstAndSrcPointToSameAddress (1 us)
[ RUN      ] LlvmLibcBcopyTest.DstStartsBeforeSrc
[       OK ] LlvmLibcBcopyTest.DstStartsBeforeSrc (0 ns)
[ RUN      ] LlvmLibcBcopyTest.DstStartsAfterSrc
[       OK ] LlvmLibcBcopyTest.DstStartsAfterSrc (1 us)
[ RUN      ] LlvmLibcBcopyTest.SrcFollowDst
[       OK ] LlvmLibcBcopyTest.SrcFollowDst (1 us)
[ RUN      ] LlvmLibcBcopyTest.DstFollowSrc

@llvm-ci
Copy link
Collaborator

llvm-ci commented Nov 12, 2024

LLVM Buildbot has detected a new failure on builder libc-x86_64-debian-fullbuild-dbg-asan running on libc-x86_64-debian-fullbuild while building clang-tools-extra at step 4 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/171/builds/10157

Here is the relevant piece of the build log for the reference
Step 4 (annotate) failure: 'python ../llvm-zorg/zorg/buildbot/builders/annotated/libc-linux.py ...' (failure)
...
[       OK ] LlvmLibcRoundToIntegerTest.SubnormalRange (944 ms)
Ran 3 tests.  PASS: 3  FAIL: 0
[962/1095] Running unit test libc.test.src.sys.prctl.linux.prctl_test
[==========] Running 2 tests from 1 test suite.
[ RUN      ] LlvmLibcSysPrctlTest.GetSetName
[       OK ] LlvmLibcSysPrctlTest.GetSetName (34 us)
[ RUN      ] LlvmLibcSysPrctlTest.GetTHPDisable
[       OK ] LlvmLibcSysPrctlTest.GetTHPDisable (8 us)
Ran 2 tests.  PASS: 2  FAIL: 0
[963/1095] Running unit test libc.test.src.sys.statvfs.linux.statvfs_test
FAILED: projects/libc/test/src/sys/statvfs/linux/CMakeFiles/libc.test.src.sys.statvfs.linux.statvfs_test /home/llvm-libc-buildbot/buildbot-worker/libc-x86_64-debian-fullbuild/libc-x86_64-debian-fullbuild-dbg-asan/build/projects/libc/test/src/sys/statvfs/linux/CMakeFiles/libc.test.src.sys.statvfs.linux.statvfs_test 
cd /home/llvm-libc-buildbot/buildbot-worker/libc-x86_64-debian-fullbuild/libc-x86_64-debian-fullbuild-dbg-asan/build/projects/libc/test/src/sys/statvfs/linux && /home/llvm-libc-buildbot/buildbot-worker/libc-x86_64-debian-fullbuild/libc-x86_64-debian-fullbuild-dbg-asan/build/projects/libc/test/src/sys/statvfs/linux/libc.test.src.sys.statvfs.linux.statvfs_test.__build__
[==========] Running 2 tests from 1 test suite.
[ RUN      ] LlvmLibcSysStatvfsTest.StatvfsBasic
[       OK ] LlvmLibcSysStatvfsTest.StatvfsBasic (14 us)
[ RUN      ] LlvmLibcSysStatvfsTest.StatvfsInvalidPath
/home/llvm-libc-buildbot/buildbot-worker/libc-x86_64-debian-fullbuild/libc-x86_64-debian-fullbuild-dbg-asan/llvm-project/libc/test/src/sys/statvfs/linux/statvfs_test.cpp:37: FAILURE
Failed to match LIBC_NAMESPACE::mkdirat(AT_FDCWD, TEST_DIR, S_IRWXU) against Succeeds(0).
Expected return value to be equal to 0 but got -1.
Expected errno to be equal to "Success" but got "File exists".
[  FAILED  ] LlvmLibcSysStatvfsTest.StatvfsInvalidPath
Ran 2 tests.  PASS: 1  FAIL: 1
[964/1095] Running unit test libc.test.src.sys.auxv.linux.getauxval_test
[==========] Running 1 test from 1 test suite.
[ RUN      ] LlvmLibcGetauxvalTest.Basic
[       OK ] LlvmLibcGetauxvalTest.Basic (117 us)
Ran 1 tests.  PASS: 1  FAIL: 0
[965/1095] Running unit test libc.test.src.sys.epoll.linux.epoll_create1_test
[==========] Running 2 tests from 1 test suite.
[ RUN      ] LlvmLibcEpollCreate1Test.Basic
[       OK ] LlvmLibcEpollCreate1Test.Basic (53 us)
[ RUN      ] LlvmLibcEpollCreate1Test.CloseOnExecute
[       OK ] LlvmLibcEpollCreate1Test.CloseOnExecute (11 us)
Ran 2 tests.  PASS: 2  FAIL: 0
[966/1095] Running unit test libc.test.src.unistd.access_test
[==========] Running 2 tests from 1 test suite.
[ RUN      ] LlvmLibcAccessTest.CreateAndTest
[       OK ] LlvmLibcAccessTest.CreateAndTest (194 us)
[ RUN      ] LlvmLibcAccessTest.AccessNonExistentFile
[       OK ] LlvmLibcAccessTest.AccessNonExistentFile (8 us)
Ran 2 tests.  PASS: 2  FAIL: 0
[967/1095] Running unit test libc.test.src.sys.statvfs.linux.fstatvfs_test
[==========] Running 2 tests from 1 test suite.
[ RUN      ] LlvmLibcSysFStatvfsTest.FStatvfsBasic
[       OK ] LlvmLibcSysFStatvfsTest.FStatvfsBasic (39 us)
[ RUN      ] LlvmLibcSysFStatvfsTest.FStatvfsInvalidPath
[       OK ] LlvmLibcSysFStatvfsTest.FStatvfsInvalidPath (259 us)
Ran 2 tests.  PASS: 2  FAIL: 0
[968/1095] Running unit test libc.test.src.sys.wait.wait4_test
Step 8 (libc-unit-tests) failure: libc-unit-tests (failure)
...
[       OK ] LlvmLibcRoundToIntegerTest.SubnormalRange (944 ms)
Ran 3 tests.  PASS: 3  FAIL: 0
[962/1095] Running unit test libc.test.src.sys.prctl.linux.prctl_test
[==========] Running 2 tests from 1 test suite.
[ RUN      ] LlvmLibcSysPrctlTest.GetSetName
[       OK ] LlvmLibcSysPrctlTest.GetSetName (34 us)
[ RUN      ] LlvmLibcSysPrctlTest.GetTHPDisable
[       OK ] LlvmLibcSysPrctlTest.GetTHPDisable (8 us)
Ran 2 tests.  PASS: 2  FAIL: 0
[963/1095] Running unit test libc.test.src.sys.statvfs.linux.statvfs_test
FAILED: projects/libc/test/src/sys/statvfs/linux/CMakeFiles/libc.test.src.sys.statvfs.linux.statvfs_test /home/llvm-libc-buildbot/buildbot-worker/libc-x86_64-debian-fullbuild/libc-x86_64-debian-fullbuild-dbg-asan/build/projects/libc/test/src/sys/statvfs/linux/CMakeFiles/libc.test.src.sys.statvfs.linux.statvfs_test 
cd /home/llvm-libc-buildbot/buildbot-worker/libc-x86_64-debian-fullbuild/libc-x86_64-debian-fullbuild-dbg-asan/build/projects/libc/test/src/sys/statvfs/linux && /home/llvm-libc-buildbot/buildbot-worker/libc-x86_64-debian-fullbuild/libc-x86_64-debian-fullbuild-dbg-asan/build/projects/libc/test/src/sys/statvfs/linux/libc.test.src.sys.statvfs.linux.statvfs_test.__build__
[==========] Running 2 tests from 1 test suite.
[ RUN      ] LlvmLibcSysStatvfsTest.StatvfsBasic
[       OK ] LlvmLibcSysStatvfsTest.StatvfsBasic (14 us)
[ RUN      ] LlvmLibcSysStatvfsTest.StatvfsInvalidPath
/home/llvm-libc-buildbot/buildbot-worker/libc-x86_64-debian-fullbuild/libc-x86_64-debian-fullbuild-dbg-asan/llvm-project/libc/test/src/sys/statvfs/linux/statvfs_test.cpp:37: FAILURE
Failed to match LIBC_NAMESPACE::mkdirat(AT_FDCWD, TEST_DIR, S_IRWXU) against Succeeds(0).
Expected return value to be equal to 0 but got -1.
Expected errno to be equal to "Success" but got "File exists".
[  FAILED  ] LlvmLibcSysStatvfsTest.StatvfsInvalidPath
Ran 2 tests.  PASS: 1  FAIL: 1
[964/1095] Running unit test libc.test.src.sys.auxv.linux.getauxval_test
[==========] Running 1 test from 1 test suite.
[ RUN      ] LlvmLibcGetauxvalTest.Basic
[       OK ] LlvmLibcGetauxvalTest.Basic (117 us)
Ran 1 tests.  PASS: 1  FAIL: 0
[965/1095] Running unit test libc.test.src.sys.epoll.linux.epoll_create1_test
[==========] Running 2 tests from 1 test suite.
[ RUN      ] LlvmLibcEpollCreate1Test.Basic
[       OK ] LlvmLibcEpollCreate1Test.Basic (53 us)
[ RUN      ] LlvmLibcEpollCreate1Test.CloseOnExecute
[       OK ] LlvmLibcEpollCreate1Test.CloseOnExecute (11 us)
Ran 2 tests.  PASS: 2  FAIL: 0
[966/1095] Running unit test libc.test.src.unistd.access_test
[==========] Running 2 tests from 1 test suite.
[ RUN      ] LlvmLibcAccessTest.CreateAndTest
[       OK ] LlvmLibcAccessTest.CreateAndTest (194 us)
[ RUN      ] LlvmLibcAccessTest.AccessNonExistentFile
[       OK ] LlvmLibcAccessTest.AccessNonExistentFile (8 us)
Ran 2 tests.  PASS: 2  FAIL: 0
[967/1095] Running unit test libc.test.src.sys.statvfs.linux.fstatvfs_test
[==========] Running 2 tests from 1 test suite.
[ RUN      ] LlvmLibcSysFStatvfsTest.FStatvfsBasic
[       OK ] LlvmLibcSysFStatvfsTest.FStatvfsBasic (39 us)
[ RUN      ] LlvmLibcSysFStatvfsTest.FStatvfsInvalidPath
[       OK ] LlvmLibcSysFStatvfsTest.FStatvfsInvalidPath (259 us)
Ran 2 tests.  PASS: 2  FAIL: 0
[968/1095] Running unit test libc.test.src.sys.wait.wait4_test

Groverkss pushed a commit to iree-org/llvm-project that referenced this pull request Nov 15, 2024
This is the following patch of
llvm#66462 to optimize its
performance.

# Motivation

To avoid data races, we choose "per file owns its dependent modules"
model. That said, every TU will own all the required module files so
that we don't need to worry about thread safety. And it looks like we
succeeded that we focus on the interfaces and structure of modules
support in clangd. But after all, this model is not good for
performance. Image we have 10000 TUs import std, we will have 10000
std.pcm in the memory. That is terrible both in time and space.

Given the current modules support in clangd works pretty well (almost
every issue report I received is more or less a clang's issue), I'd like
to improve the performance.

# High Level Changes

After this patch, the built module files will be owned by the module
builder and each TU will only have a reference to the built module
files.

The module builder have a map from module names to built module files.
When a new TU ask for a module file, the module builder will check if
the module file lives in the map and if the module file are up to date.
If yes, the module file will be returned. If no, the module file entry
would be erased in the module builder. We use `shared_ptr<>` to track
module file here so that the other TU owning the out dated module file
won't be affected. The out dated module file will be removed
automatically if other TU gets update or closed.

(I know the out dated module file may not exist due to the `CanReuse`
mechanism. But the design here is natural and can be seen as a redundant
design to make it more robust.)

When we a build a module, we will use the mutex and the condition
variable in the working thread to build it exclusively. All other
threads that also want the module file would have to wait for that
working thread. It might not sounds great but I think if we want to make
it asynchronous, we have to refactor TUScheduler as far as I know.

# Code Structure Changes

Thanks for the previous hard working reviewing, the interfaces almost
don't change in this patch. Almost all the work are isolated in
ModulesBuilder.cpp. A outliner is that we convert `ModulesBuilder` to an
abstract class since the implementation class needs to own the module
files.

And the core function to review is
`ReusableModulesBuilder::getOrBuildModuleFile`. It implements the core
logic to fetch the module file from the cache or build it if the module
file is not in the cache or out of date. And other important entities
are `BuildingModuleMutexes`, `BuildingModuleCVs`, `BuildingModules` and
`ModulesBuildingMutex`. These are mutexes and condition variables to
make sure the thread safety.

# User experience

I've implemented this in our downstream and ask our users to use it. I
also sent it https://github.com/ChuanqiXu9/clangd-for-modules here as
pre-version. The feedbacks are pretty good. And I didn't receive any bug
reports (about the reusable modules builder) yet.

# Other potential improvement

The are other two potential improvements can be done:
1. Scanning cache and a mechanism to get the required module information
more quickly. (Like the module maps in
https://github.com/ChuanqiXu9/clangd-for-modules)
2. Persist the module files. So that after we close the vscode and
reopen it, we can reuse the built module files since the last
invocation.
@Arthapz
Copy link

Arthapz commented Nov 22, 2024

i tried it for a while, it work very well ! thx u

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:modules C++20 modules and Clang Header Modules clang-tools-extra clangd
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants