-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use sys.path
for package discovery
#9849
base: main
Are you sure you want to change the base?
Conversation
sigh Locally, I'm hitting rust-lang/rust-analyzer#16959 trying to diagnose what's happening in the two spots that the most recent commit has added more granular All the tests run via the VS Code UI for debug end up panicking before running with:
Looks like I'll be falling back to using regular |
Doing a brain dump since I need a break now -- it took me a bit of time to separate a bunch of tooling failures/noise, and actually understand what's broken here... uv/crates/uv-virtualenv/src/lib.rs Line 71 in 5903ce5
The virtual environment's An easy reproducer for this, with this PR, is using This used to be fine earlier since the logic didn't care at all about sys.path and the interpreter's prefix was being updated which was effectively "updating" the site-packages path that'd be used for the lookups. Ignoring to update sys.path is not viable, at least, not with the goal of this change of using appropriately using sys.path for package discovery. Still thinking through what to do here... The "slow" option here would be to recreate the |
Ok, an alternative that I haven't fully thought through the implications of... I think this is more of a reasonable guess of what sys.path will be, but it might just work. Here's the idea: Instead of querying the interpreter again, use the ❯ /opt/pradyunsg/bin/python3.12 -S -c "import sys; print(sys.path)"
['', '/opt/pradyunsg/lib/python312.zip', '/opt/pradyunsg/lib/python3.12', '/opt/pradyunsg/lib/python3.12/lib-dynload']
❯ /opt/pradyunsg/bin/python3.12 -c "import sys; print(sys.path)"
['', '/opt/pradyunsg/lib/python312.zip', '/opt/pradyunsg/lib/python3.12', '/opt/pradyunsg/lib/python3.12/lib-dynload', '/opt/pradyunsg/lib/python3.12/site-packages'] Let me know if this seems reasonable, and also... I'd be very happy to hear if someone can think of a scenario where this might be broken! |
Broken might be a bit harsh, but 'incomplete'?
I do note that the former usage is much more common than the latter. If the 'correct' thing to do is unacceptably slow -- and the guess is good enough for most use cases -- maybe an acceptable compromise is to do the 'good enough' thing by default and provide an option for those who really need that extra correctness? |
You won't have these in an environment that was just-created in-process by uv, as I understand it. You could have these in the base environment though, so the "guess" model doesn't work in general for environments with I think that's fine. We could stick with this computational approach for environments with |
I think it's ok to re-query information if we have to. We cache it anyway, and ideally we only do this when |
Do you need any guidance on the code changes, beyond that? |
A-ha! Maybe an elegant solution would be to tune the cache key for That way the 'expensive'/side-effectful recomputation will be done at most once, and ideally not at all (ideally cache entry populated at end of venv creation) The cache key could be some sort of serialised struct that encapsulates the myriad things that could influence
|
Not at the moment. I'll only be able to pick this back up early next year. |
cdb36f2
to
b8d0038
Compare
🤦🏽 Figured it out. |
It looks like the |
Looks like it's wrong at the point of querying the interpreter.
The most recent commit "fixes" that to:
|
This better reflects the intended behavior of this object, which is to serve as a database of InstalledPackages rather than a representation of the relevant environment/interpreter path.
When a virtual environment is created, `sys.path` is modified. Instead of reusing the base interpreter's `sys.path`, query the interpreter again to get the new `sys.path` values.
bd4a58a
to
0294ee9
Compare
Import site and query `sys.path` after all the other values. This does not work since `site` is the piece handling `pyvenv.cfg` and, therefore, this breaks the detection of virtual environments.
0294ee9
to
0af7b7e
Compare
…ualenv" This reverts commit 0af7b7e.
By the way, I think this might fix $ uv venv --seed
Using PyPy 3.10.16 interpreter at: /usr/local/bin/python
Creating virtual environment with seed packages at: .venv
+ pip==25.0.1
+ setuptools==75.8.0
+ wheel==0.45.1
$ uv pip list
Package Version
---------- -------
pip 25.0.1
setuptools 75.8.0
wheel 0.45.1
$ .venv/bin/pip list
Package Version
---------- -----------
cffi 1.18.0.dev0
greenlet 0.4.13
hpy 0.9.0
pip 25.0.1
readline 6.2.4.1
setuptools 75.8.0
wheel 0.45.1 And, by the same token, might break |
We want to use `sys.path` for package discovery (#2500, #9849). For that, we need to know the correct value of `sys.path`. `sys.path` is a runtime-changeable value, which gets influenced from a lot of different sources: Environment variables, CLI arguments, `.pth` files with scripting, `sys.path.append()` at runtime, etc. We cannot capture them all accurately, especially since it's possible to change `sys.path` mid-execution. Instead, we do a best effort attempt at matching the user's expectation. A common way to influence `sys.path` that is not using venvs is setting `PYTHONPATH`. To support this we're capturing `PYTHONPATH` as part of the cache invalidation, i.e. we refresh the interpreter metadata if it changed. For completeness, we're also capturing other environment variables documented as influencing `sys.path` or other fields in the interpreter info. This PR does not include reading registry values for `sys.path` additions on Windows as documented in https://docs.python.org/3.11/using/windows.html#finding-modules. It notably also does not include parsing of python CLI arguments, we only consider their environment variable versions for package installation and listing. We could try parsing CLI flags in `uv run python`, but we'd still miss them when Python is launched indirectly through a script, and it's more consistent to only consider uv's own arguments and environment variables, similar to uv's behavior in other places.
We want to use `sys.path` for package discovery (#2500, #9849). For that, we need to know the correct value of `sys.path`. `sys.path` is a runtime-changeable value, which gets influenced from a lot of different sources: Environment variables, CLI arguments, `.pth` files with scripting, `sys.path.append()` at runtime, etc. We cannot capture them all accurately, especially since it's possible to change `sys.path` mid-execution. Instead, we do a best effort attempt at matching the user's expectation. A common way to influence `sys.path` that is not using venvs is setting `PYTHONPATH`. To support this we're capturing `PYTHONPATH` as part of the cache invalidation, i.e. we refresh the interpreter metadata if it changed. For completeness, we're also capturing other environment variables documented as influencing `sys.path` or other fields in the interpreter info. This PR does not include reading registry values for `sys.path` additions on Windows as documented in https://docs.python.org/3.11/using/windows.html#finding-modules. It notably also does not include parsing of python CLI arguments, we only consider their environment variable versions for package installation and listing. We could try parsing CLI flags in `uv run python`, but we'd still miss them when Python is launched indirectly through a script, and it's more consistent to only consider uv's own arguments and environment variables, similar to uv's behavior in other places.
We want to use `sys.path` for package discovery (#2500, #9849). For that, we need to know the correct value of `sys.path`. `sys.path` is a runtime-changeable value, which gets influenced from a lot of different sources: Environment variables, CLI arguments, `.pth` files with scripting, `sys.path.append()` at runtime, a distributor patching Python, etc. We cannot capture them all accurately, especially since it's possible to change `sys.path` mid-execution. Instead, we do a best effort attempt at matching the user's expectation. The assumption is that package installation generally happens in venv site-packages, system/user site-packages (including pypy shipping packages with std), and `PYTHONPATH`. Specifically, we reuse `PYTHONPATH` as dedicated way for users to tell uv to include specific directories in package discovery. A common way to influence `sys.path` that is not using venvs is setting `PYTHONPATH`. To support this we're capturing `PYTHONPATH` as part of the cache invalidation, i.e. we refresh the interpreter metadata if it changed. For completeness, we're also capturing other environment variables documented as influencing `sys.path` or other fields in the interpreter info. This PR does not include reading registry values for `sys.path` additions on Windows as documented in https://docs.python.org/3.11/using/windows.html#finding-modules. It notably also does not include parsing of python CLI arguments, we only consider their environment variable versions for package installation and listing. We could try parsing CLI flags in `uv run python`, but we'd still miss them when Python is launched indirectly through a script, and it's more consistent to only consider uv's own arguments and environment variables, similar to uv's behavior in other places.
Resolves #2500
Intend to resolve #4466 before coming out of draft mode here.
x-ref #2413, since this PR removes an assumption introduced in that PR that a missing site-packages directory means that there are no packages to be found.
Summary
sys.path
rather than inferred paths based onsysconfig
'spurelib
andplatlib
directories.site_packages
we were computing for this is not used anywhere else.Test Plan
Only manually tested as of now.
The plan for automated tests is: Use a
.pth
file as an extra path to be injected to the virtual environment being modified, with the extra path pointing to site-packages from another virtual environment.This should be less complicated than the proposed approaches of using
sitecustomize.py
as well as relying on system-site-packages with a mock system interpreter -- all these approaches modify thesys.path
in a manner visible to the interpreter introspection script and using a.pth
file is likely to be the easiest to reason about.