-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add option to use precompiled headers for faster builds #100
Conversation
Hi, thanks for the contribution.
The changes look ok. Is this really necessary (namely, do we have a user who will need two different executables/libraries for their applications)? Will this be advantageous than just running
This seems to be okay at the moment, but are we promoting a good practice or a bad practice by doing this? From the users' perspective, If you look at the documentation of These are just based on my experience. I am also interested in your experience and the motivation behind these changes. |
🫡
It's not really necessary, but would improve the build efficiency by reusing object files. If you build the Python module, you can also build the executable and shared library almost for free. I've experimented with this on this branch. Such a feature would be mainly useful for developers and downstream packagers, etc., who might want to build an executable and a shared library at the same time. If you don't see too many concrete advantages coming from this, I can take it out.
Then I have some encouraging numbers for you!! The following benchmarks were collected on a node with 48 cores (cascade lake); GCC 12.3.0. Perhaps GCC has made some improvements recently. Building the python extension with default settings, -j 48, no precompiled headers: Building the python extension with default settings, -j 48, WITH precompiled headers: Pre-compiling the headers takes under 10 seconds at the beginning of the build, but the total build time is reduced by 1 min 30 s (about 18%). Building the executable with default settings, -j 48, no precompiled headers: Building the executable with default settings, -j 48, WITH precompiled headers: There was also a noticeable reduction in the peak memory usage of roughly 10~20%. These improvements not huge, but they could still be considered significant.
To be a bit clearer, here's the situation I am actually facing: Let's say I want to build a wheel against the school cluster's MPI installation so that some colleagues can install it in their Python environments. If I make a wheel or a conda package, I can use auditwheel or conda-build to make sure it's relocatable. It's not as easy to distribute the custom block2 if I build it manually with CMake. I have to run patchelf so that the python extension can link to whatever MKL is in my colleagues' environment, give instructions about PYTHONPATH, etc. In a case like this, it would be more convenient to edit setup.py and then build the wheel. The benefit of the proposed change is: it'd save a tiny amount of typing if the A more straightforward example is, if somebody wants to build block2 as a conda package with a specific number of processes other than 2 (perhaps the number is set by some CI provider) they would be able to do so without patching setup.py. I guess it is worth mentioning that adding scikit-build-core as an alternate build backend could help with this. It supports passing additional CMake options on the command line: |
Thanks for sharing your ideas and motivations. These are all very helpful.
Most of the time as a developer I will only debug with one kind of interface (because it does not make sense to write the test file in two languages for debugging, and compile two libraries, and change two lines in two test files and then recompiling the two libraries). Since in your new branch you have to introduce a new
These numbers are indeed impressive. As this change does not have the backward compatibility problem, it is very decoupled with other things, and it alleviated a real problem, I am happy to accept it (if it does not create problem for non-gcc compilers).
Yes, what you described is a valid usage case, and it is the same case as what I do in github actions: https://github.com/block-hczhai/block2-preview/blob/master/.github/workflows/build.yml#L270-L288 And what I did there is exactly directly editing
For this I do not get your point. With the current
So far in my opinion the "build backend" is a "not-a-problem" problem. It will not be very worthy if you end up with a complicated backend co-existing solution with increased code length and dependence. I think you are already familiar with the current build script and it worked well for all your purpose, which is already a miracle in a modern software environment full of artificial complexities and pitfalls. Just have a look at the PR list of The first one PR 793 is to deprecate things, for fixing issue 747. But issue 747 is not an issue. (And "fixing" this issue will create real issues for applications depending on it.) In the end, lots of efforts were wasted in (a) jumping between alternative things; (b) removing useful things; and (c) adding imaginary things. Let's hope we can do better. |
I've gotten rid of everything in the PR except the option to use precompiled headers. It's off by default and doesn't seem to break backward compatibility, so hopefully it is compatible with the current project goals. Honestly, I am relieved that I didn't 100% waste your time! Also, while testing with clang on linux64, I found that clang was not able to utilize libgomp. Changing line 445 so that clang could use libiomp5 or libomp seemed to fix this in a natural way. If you know a better way to deal with this issue, perhaps without changing CMakeLists.txt, I can use that instead.
I apologize---the part about saving typing was supposed to be a dumb joke. Indeed, no effort is really saved. I just need to get better at sed 😆
It does look like a mess. I can see why you feel that depending on scikit-build-core (and keeping up to date with all the changes) is unwise for block2 at the moment. It is useful for small projects that want to get started quickly---time will tell if it can become a stable and reliable tool. Thanks for all the time and effort you spent responding to my proposals so thoroughly. I really appreciate it! |
Thanks for your understanding.
It is not a waste of time. For me it is good to know other people's experience and requests. Every ugly line in
If you used If you used Also note that if you use
So you probably need three wheels. The |
Ok, I see what you mean!! The change was meant to help clang link libiomp5 by default, but I will use
Thank you for the warning. I have not noticed any problems from this yet, but I'll keep it in mind. It seems like PySCF chooses its OpenMP library this way. Probably it uses the openmp runtime that comes with the compiler. I don't think there is way to change this aside from editing the CMakeLists.txt. The wheels available on PyPI ship libgomp-a34b3233.so.1.0.0. Since libiomp5 is binary compatible with libgomp, I guess you could replace PySCF's libgomp.so with a symlink to block2's libgomp or libiomp5. I haven't tested this but it's worth a shot.
Thanks a lot for these notes and instructions. They're very helpful! Most of my colleagues are using conda environments with the conda-forge channel. Conda-forge has a special mechanism to ensure that all packages use the same openmp. So if I build conda packages of pyscf and block2 with conda-forge toolchain (or use prebuilt pyscf and only build block2) that should take care of openmp consistency. But the MPI / mpi4py and python versions will definitely be tedious. (speaking of which, do you think adding block2 to conda-forge is a good idea?) |
Thanks for letting me know all these details.
So in order to let the user be aware of this pitfall, I think it makes sense to explicitly require the user to set
So you see why
If you are able to handle and maintain that, it would be great. If there has to be some updates in github actions or |
Yes!
Sure, I don't mind doing this. Although conda has issues of its own, I think it will be convenient for many users, and it should be possible to avoid the worst hazards (like conflicting openmp libraries).
|
Three things are done in this PR.
The executable, shared library, and Python module are now different CMake targets. Name conflicts are avoided by setting the OUTPUT_NAME property to the correct value. Currently, this should not change anything for users (if it does, that's a mistake). However, with a few more changes to CMakeLists.txt, it should be possible to build multiple targets at once (e.g. the python module and executable together).
Add an option (
-DUSE_PCH=ON
) to precompile the headers in src/instantiation/*.hpp. Precompiling these headers can make builds faster if the user has a compiler that supports it. It's disabled by default.In setup.py, the default number of build processes is still 2, but the user can override it by setting the environment variable CMAKE_BUILD_PARALLEL_LEVEL.
Let me know if there are points I overlooked which need to be addressed, or if you have other suggestions / concerns!