-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request]: Address corner cases of when venv cannot be created #26792
Comments
I don't see why it shouldn't be a dependency. I would assume that in most cases where it's not installed it's simply because it's not needed otherwise. An explicit config flag would be fine, too. Such a flag should cause Beam to install venv if it's not there. |
As I understand, an explicit config flag would mean that Flink users who need separate environments would need to explicitly enable it. Note that it would be a change in behavior. @phoerious Do you have an idea how frequently this is necessary? If not very common, then it's ok to make it opt in. |
I think when it's not installed, it's because some distros try to save on the size of Python installation and try to reduce which modules are installed. It doesn't make sense to me. venv seems like a very basic and necessary functionality nowadays. |
Specifically, in your use case, you'd have to set some environment variable in the Flink environment, such as:
|
just trying to think through what reasonable defaults should be. But people who use a custom image with added boot.go on top of it, wouldn't inherit this setting. |
@phoerious in your case, does the SDK harness run in Beam container or as a standalone process? In the latter case, setting the environment variable in Beam container wouldn't suffice. |
cc: @robertwb |
Its a separate Beam container. If you make it an option, I think it should be opt-out and on by default. Also I don't think adding venv by default adds much to the overall size. |
I spot checked several container images. Whenever python was provided, venv was also available. I did find that canonical ubuntu os images (https://ubuntu.com/server/docs/cloud-images/google-cloud-engine) included python but not venv, but didn't find any container image that had similar issue. This said and given the feedback so far, I'll keep the venv creation enabled by default, with opt out capability, and preserve the prior behavior where pipelines fail when venv not available, but with additional logging that shows how to disable separate venvs. |
Moving this to 2.49 if there is any more work we want to do for this feature |
all work is completed with 2.48.0 |
What would you like to happen?
#16658 made a change to Python SDK harness container boot sequence to launch SDK processes in separately created virtual environments.
It appears that the venv dependency is sometimes not available on non-beam Python container images. Users who supply custom containers may run into errors when python3-venv is not installed, and need to install it separately, which is inconvenient.
Creating a venv is not strictly required on some runners, therefore #26753 changed the behavior to use global environment where venv was not available.
There is a concern that falling back to global environment may have adverse effects on the group of users which benefitted from the separate venv, see: #26778 (comment) .
Possible failure modes:
Possible avenues to address :
ENV RUN_ ...=1
cc: @phoerious (who was working on #21123).
Issue Priority
Priority: 2 (default / most feature requests should be filed as P2)
Issue Components
The text was updated successfully, but these errors were encountered: