Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try to make exename portable. Refactor searchForIncludePath. #5185

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

fruffy
Copy link
Collaborator

@fruffy fruffy commented Mar 19, 2025

Fixes #3812.

Ran into this while working on a PR. Turns out exename doesn't seem to properly work. Maybe this approach works better? Still trying to figure out the right approach for getExecutablePath.

Also use std::filesystem::path to set the include path instead of char manipulation.

This change preserves all APIs, still marking as breaking in case there are some differences in behavior.

@fruffy fruffy added the core Topics concerning the core segments of the compiler (frontend, midend, parser) label Mar 19, 2025
@fruffy fruffy force-pushed the fruffy/exename branch 3 times, most recently from 5d6bb23 to 8f2c3b6 Compare March 19, 2025 13:36
@fruffy fruffy added the breaking-change This change may break assumptions of compiler back ends. label Mar 19, 2025
@fruffy fruffy marked this pull request as ready for review March 19, 2025 17:01
@fruffy fruffy requested review from ChrisDodd, asl and vlstill March 19, 2025 17:01
@fruffy fruffy added run-ubuntu18 Use this tag to trigger a Ubuntu-18 CI run. run-validation Use this tag to trigger a Validation CI run. run-sanitizer Use this tag to run a Clang+Sanitzers CI run. run-static Use this tag to trigger static build CI run. labels Mar 19, 2025
@fruffy fruffy force-pushed the fruffy/exename branch 3 times, most recently from 8106355 to 6ceaed7 Compare March 20, 2025 12:43
Copy link
Contributor

@asl asl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments

bool ParserOptions::searchForIncludePath(const char *&includePathOut,
std::vector<cstring> userSpecifiedPaths,
const std::vector<cstring> &userSpecifiedPaths,
const char *exename) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can exename be converted to std::filesystem::path while here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would break downstream use of that function, figured I should do that in a separate PR. Ditto for all the cstring typing of variables that should be a path.

bool ParserOptions::searchForIncludePath(const char *&includePathOut,
std::vector<cstring> userSpecifiedPaths,
const std::vector<cstring> &userSpecifiedPaths,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to vector of std::filesystem::path?

const char *separator = cwd[cwdLen - 1] == '/' ? "" : "/";
std::filesystem::path getExecutablePath(const std::filesystem::path &suggestedPath) {
#if defined(__linux__) || defined(__FreeBSD__) || defined(__NetBSD__)
// Find the path of the executable. We use a number of techniques that may fail or work on
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Figured we shouldn't depend on external dependencies for a utility like this. Particularly boost which we want to get rid of. Let me take a look at whereami

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Though there is bunch of domain knowledge here :(

@fruffy fruffy force-pushed the fruffy/exename branch 3 times, most recently from 3d7dbe3 to 6b18259 Compare March 21, 2025 21:56
@fruffy fruffy requested a review from asl March 22, 2025 05:42
@fruffy
Copy link
Collaborator Author

fruffy commented Mar 22, 2025

The Ubuntu failure is unrelated, should be fixed with #5177.

@asl
Copy link
Contributor

asl commented Mar 24, 2025

I just realized that exename is horrible broken. In few places exename is called without any arguments. Then it could result in some bogus exename determination. What is terrible is that it caches the result. So any subsequence calls of exename(argv[0]) produces same bogus result (!).

Now let's take a look into crash.cpp. This unwanted side effect is always there as exename() is called in the early beginning when we initialize crash logger.

How it could be wrong without argv[0]? Well, one of the case is:

    } else if (getenv("_")) {
        strncpy(buffer, getenv("_"), sizeof(buffer));
        buffer[sizeof(buffer) - 1] = 0;
    } else {

This means if we'd execute p4c via some wrapper script, then it will place the full path of that wrapper script there and not p4c itself(!)

This is exactly what was in described in #3812

And the problem is that include paths could be silently dependent on the enclosing shell, etc.

@fruffy
Copy link
Collaborator Author

fruffy commented Mar 25, 2025

I just realized that exename is horrible broken. In few places exename is called without any arguments. Then it could result in some bogus exename determination. What is terrible is that it caches the result. So any subsequence calls of exename(argv[0]) produces same bogus result (!).

Now let's take a look into crash.cpp. This unwanted side effect is always there as exename() is called in the early beginning when we initialize crash logger.

How it could be wrong without argv[0]? Well, one of the case is:

    } else if (getenv("_")) {
        strncpy(buffer, getenv("_"), sizeof(buffer));
        buffer[sizeof(buffer) - 1] = 0;
    } else {

This means if we'd execute p4c via some wrapper script, then it will place the full path of that wrapper script there and not p4c itself(!)

This is exactly what was in described in #3812

And the problem is that include paths could be silently dependent on the enclosing shell, etc.

The new version should address most of these. I believe the getenv("_") use case is intentional but should only be triggered in special cases. It's tricky to write tests for this...

@asl
Copy link
Contributor

asl commented Mar 25, 2025

I believe the getenv("_") use case is intentional but should only be triggered in special cases.

I'd remove it. Together with caching it could produce wrong results. Caching should be removed as well, this way we always stuck to incorrect answer if it happen to appear first.

@fruffy
Copy link
Collaborator Author

fruffy commented Mar 25, 2025

I believe the getenv("_") use case is intentional but should only be triggered in special cases.

I'd remove it. Together with caching it could produce wrong results. Caching should be removed as well, this way we always stuck to incorrect answer if it happen to appear first.

Some context: #5185 (comment)

At least cashing was removed in the new implementation? Unless you see caching somewhere else.

@asl
Copy link
Contributor

asl commented Mar 25, 2025

I used to see problems with "old" version. Yes, indeed caching was removed which is good.

}

} // namespace

bool ParserOptions::searchForIncludePath(const char *&includePathOut,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This functions should really get a better interface overall, the output const char * argument that is heap-allocated is quite ugly. But I agree with doing it in a separate PR.

lib/exename.h Outdated
* argv0 if provided and nothing better can be found */
/// Return the full path to the binary being executed.
/// @returns std::nullopt if unable to determine the path.
std::filesystem::path getExecutablePath(const std::filesystem::path &suggestedPath = "");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: I would also explicitly separate the no-argument overload (let it call the normal one with empty path in the .cpp, but let this ugly detail be hidden).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you want two functions or require always a suggested path?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to have two overloads.

std::filesystem::path getExecutablePath(const std::filesystem::path &suggestedPath);
std::filesystem::path getExecutablePath();

The reason is that it is not obvious what the empty value for suggested path means (I'd say it is not natural "none" value of paths as it is not even a valid path).

@ChrisDodd
Copy link
Contributor

I believe the getenv("_") use case is intentional but should only be triggered in special cases.

I'd remove it. Together with caching it could produce wrong results. Caching should be removed as well, this way we always stuck to incorrect answer if it happen to appear first.

If nothing else works, it is better than nothing, but I agree it should not be cached -- only a result that is definitively correct (eg, comes from /proc filesystem or syscall) should be cached.

fruffy added 7 commits April 7, 2025 21:25
Signed-off-by: fruffy <fruffy@nyu.edu>
Signed-off-by: fruffy <fruffy@nyu.edu>
Signed-off-by: fruffy <fruffy@nyu.edu>
Signed-off-by: fruffy <fruffy@nyu.edu>
Signed-off-by: fruffy <fruffy@nyu.edu>
Signed-off-by: fruffy <fruffy@nyu.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking-change This change may break assumptions of compiler back ends. core Topics concerning the core segments of the compiler (frontend, midend, parser) run-sanitizer Use this tag to run a Clang+Sanitzers CI run. run-static Use this tag to trigger static build CI run. run-ubuntu18 Use this tag to trigger a Ubuntu-18 CI run. run-validation Use this tag to trigger a Validation CI run.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

exename does not work on Mac OS X
4 participants