-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature request] Consider installing the same artifact in Beam Python RC Docker containers as is being published to PyPi #28084
Comments
cc: @AnandInguva |
Avenues to address:
|
To clarify, what do you mean by this ? I could run a Python x-lang pipeline on the RC by installing the SDK via the zip file and providing the same file to the "--sdk_location" flag. https://dist.apache.org/repos/dist/dev/beam/2.50.0/python/apache-beam-2.50.0.zip |
Can we skip this condition check when the sdk is an RC? |
Beam python downloads the sdk sdist or wheel from pypi and stages it to the staging environment. Then the boot.go looks for this staged file and installs it on Dataflow. We do this for Runner v1 since the default container for runner v1 doesn't contain Beam but from 2.50.0, runner v1 is deprecated, so we stopped staging SDK since Runner v2 containers have Beam installed in them and there is no need to stage Beam SDK. If a tarball is passed to the |
I'm basically always +1 on reverting in situations like this; I think it is almost always the fastest and safest thing to do, especially when the feature is helping us but not providing customer value. Reverting a revert is easy, and then we can pin an extra commit on that PR to get the fix forward in the next release. I put up #28094 to do this
I'm +1 on this as the long term fix. Its simple and still allows us to exercise all the functionality we need to do validation. |
Discussed offline, we're going to try rolling forward and only revert if we run into issues. @tvalentyn is going to take this forward |
fix merged to master and CP created for release branch. Repurposing the issue for potential follow up work:
|
CP Merged in. I'm going to start the RC2 process soon. Please validate fix once RC2 is available. |
What happened?
Pipeline I ran failed with an error:
Pipeline construction environment and pipeline runtime environment are not compatible. If you use a custom container image, check that the Python interpreter minor version and the Apache Beam version in your image match the versions used at pipeline construction time. Submission environment: beam:version:sdk_base:apache/beam_python3.11_sdk:2.50.0rc1. Runtime environment: beam:version:sdk_base:apache/beam_python3.11_sdk:2.50.0. Worker ID: beamapp-valentyn-08220117-08211817-m76c-harness-v38w
The rootcause is: starting from 2.50.0, we no longer stage Beam SDK. Starting from several releases back we also check that submission and runtime versions match. However Python Docker containers we build for RCs don't install the SDK RC version of Beam SDK tarball.
This issue blocks further validation of RC1 for Python Dataflow pipelines.
Issue Priority
Priority: 1 (data loss / total loss of function)
Issue Components
The text was updated successfully, but these errors were encountered: