Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow accessing requirements directly by labels #414

Closed
person142 opened this issue Feb 7, 2021 · 10 comments
Closed

Allow accessing requirements directly by labels #414

person142 opened this issue Feb 7, 2021 · 10 comments

Comments

@person142
Copy link
Contributor

person142 commented Feb 7, 2021

🚀 feature request

Relevant Rules

py_binary, py_library, and py_test

Description

Per the README accessing requirements by labels is not supported:

To avoid breakage, please use requirement() instead of depending directly on wheel repo labels.

The requirements also have somewhat unintuitive labels since they prepend pypi__ onto the (sanitized) package name. Not working with labels directly has some consequences:

  • It doesn't work with buildifier (Buildifier doesn't sort requirements from rules_python buildtools#955)
  • It doesn't work with buildozer (Adding a non-string to a list buildtools#565 (comment))
  • In the next release it will be hard to work with all_whl_requirements without digging into the requirements workspace, so the abstraction is leaky. (Though uncommon, I find I occasionally need to dig through the requirements workspace in some other settings too.)
  • It's fairly obfuscated; i.e. it adds an extra layer on top of the normal mechanisms for referring to targets from external workspaces
  • It doesn't scale well-there used to be requirement, now there's whl_requirement, if you started wrapping console scripts then there would need to be a new function for that, etc. etc.

Describe the solution you'd like

Is it possible to:

  • Shorten the labels so that it's @<workspace>//<sanitized-package-name> (For the short form of the label.)
  • Officially support working with the labels directly

Describe alternatives you've considered

We don't enforce that Python deps be sorted. When I need to buildozer a requirement I do some hacky find-replace to stringify it and then unstringify it.

@person142
Copy link
Contributor Author

Ok, so the reason for the status quo is mentioned in the wheel extracting code:

Further, rules-python automatically adds the repository root to the PYTHONPATH, meaning a package that has the same name as a module is picked up. We workaround this by prefixing with pypi__. Alternatively we could require --noexperimental_python_import_all_repositories be set, however this breaks rules_docker. See: bazelbuild/bazel#2636

This happens inside the shim for launching a py_binary:

# Returns repository roots to add to the import path.
def GetRepositoriesImports(module_space, import_all):
  if import_all:
    repo_dirs = [os.path.join(module_space, d) for d in os.listdir(module_space)]
    repo_dirs.sort()
    return [d for d in repo_dirs if os.path.isdir(d)]
  return [os.path.join(module_space, '__main__')]

the thing I'm not sure of is why the external repository root is added to the Python path-that seems undesirable? (And here forces us to add prefixes to things to work around it.)

@thundergolfer
Copy link

Thanks for the issue @person142, this is an important misalignment with the rest of the rules ecosystem.

Checking this locally for myself, I can indeed run a py_binary and see an entry in the sys.path like:

'/private/var/tmp/_bazel_jonathon/06afea7962d8d117f8909d0040da389c/execroot/rp_foo/bazel-out/darwin-fastbuild/bin/tools/bar/main.runfiles/pypi',

where @pypi is the external repository created by pip_install.

It does seem undesirable that this ends up on the PYTHONPATH. We need to track down why it ended up like this.

@person142
Copy link
Contributor Author

One interesting thing is that pip_parse sidesteps the issue since everything is unpacked into the top level of the generated repo. Currently if you do something like

pip_parse(
   name = <name>,
   requirements_lock = "//:requirements.txt",
)

then the generated repos will have names like <name>_pypi__<package>, but that could be shortened to <name>_<package> or even just <package> (so long as the user takes care to avoid name collisions).

I'm working on a PR to make the prefix configurable so that the above shortened forms could be tried out.

@person142
Copy link
Contributor Author

Aforementioned PR: #459.

@person142
Copy link
Contributor Author

And working at it from the other end, bazelbuild/bazel#2636 (comment) on the Bazel side.

So if we can get the --incompatible_repo_roots_are_not_on_pythonpath flag added and move ahead with #459, then users can flip the flag and adjust the prefixes to be nicer, which would get us started on the process of improving the situation.

@thundergolfer
Copy link

We probably want something pretty close to what rules_jvm_external provides which is an external repo per external dep and then each external dep is exposed to users via a single main external repository. So theres:

  • "@maven//:org_mockito_mockito_core" (user facing)
  • @org_mockito_mockito_core_3_10_0//file:file (hidden from user)

If an alias rule is used, does this ensure pip_parse still sidesteps the the issue? I haven't checked.

@alexeagle
Copy link
Contributor

alexeagle commented Sep 8, 2021

The requirements.bzl helper produced by pip_install has another critical flaw not mentioned earlier: it eager-fetches all the wheels from pypi when the load() statement is evaluated. So if the user has a BUILD file with a py_library and a go_library, and only asks bazel to compile the Go code, they'll still have to wait for pip to install the complete set of requirements.

At a minimum I think we should stop saying "always use the requirement() helper" and instead say "you can use the __pypi names but note that a future release could change these, in that case we'll note it as a breaking change and give you a buildozer command to easily update your code"

The requirements.bzl helper produced by pip_parse on the other hand is fine from this eager-fetching perspective.

@hrfuller
Copy link
Contributor

hrfuller commented Jan 6, 2022

@person142 should this be closed now that we have landed the change to pick your prefix for pip_parse?

@person142
Copy link
Contributor Author

Yes, the situation for pip_parse is quite nice now; I've switched to using labels directly.

@aschleck
Copy link

aschleck commented Apr 19, 2023

Update: I'm guessing this is what incompatible_generate_aliases = True is for, though not sure what it's incompatible with. Unfortunately it still isn't working for me but I think I must be doing something wrong.

Update 2: the new label format under incompatible_generate_aliases is @pip//pytest/@pip//pytest:pkg, etc

Is just writing the label still supposed to be supported under bzlmod? If not, should the "Using third_party packages as dependencies" section of the README be updated to recommend against doing this?

I see requirement() under bzlmod is now generating labels like @@rules_python~0.20.0~pip~pip//:pytest_pkg, and all of my rules depending on things like @pip_pytest//:pkg are totally broken. Hope I am just doing something wrong...

With

pip = use_extension("@rules_python//python:extensions.bzl", "pip")

pip.parse(
    name = "pip",
    requirements_lock = "//:requirements_lock.txt",
)

use_repo(pip, "pip")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants