-
Notifications
You must be signed in to change notification settings - Fork 253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deterministic builds #239
Comments
Hi @indygreg ! Thanks for your proposal. I definitely agree with the spirit of the suggestion - deterministic software is a good thing™. This was something that I always thought "Oh, we'll get to v1.0 and then think about it", but perhaps because we haven't had many reports of dependencies breaking things it's fallen by the wayside. This is where we are now, as I see it
This is where we could be by specifying versions of as much as possible...
One thing I'm less sure of now is the merits of pinning the versions of things like
It relies on APIs from PyPI, who since the rewrite, should be keeping things backward-compatible(?), but the big breaking changes to that ecosystem seem to happen when security fixes require updates - I think SSL updates were a big one. So I'd be curious if anyone has any experience/insight into pinning pip and how that worked in a build system. On an implementation note, this could present a maintenance burden, unless we can figure out a clever way to bump the versions to their latest before release? Could pip 'constraint' files help us here? |
Just my 2 cents: currently, the nice thing is that we don't have to release a new version each time some dependencies get updated, and |
There are compelling arguments that can be made that One potential strategy here is to check in 2 versions of a pip requirements file: one listing just the packages we require and another expanded to contain versions and hashes. For all of my Python projects these days, I use That would leave I'll throw out Mercurial's Linux (https://www.mercurial-scm.org/repo/hg/file/5685ce2ea3bf/contrib/automation/hgautomation/linux.py#l40) and Windows (https://www.mercurial-scm.org/repo/hg/file/5685ce2ea3bf/contrib/install-windows-dependencies.ps1) CI bootstrap scripts for examples of how we (hopefully deterministically) bootstrap the CI environment. tl;dr I think pinning |
@indygreg, could you quickly explain how this is related to security? I'm not a big expert on version and dependency management, but I thought getting the latest version results in plugged security risks? I quite like @joerick's idea of constraint files: https://pip.pypa.io/en/stable/user_guide/#constraints-files. In this way, we could have a new option ( That doesn't solve I'll ping @mayeut as well, who seems to be always be the first one to notice Python versions need to be updated, and might have something to add here? :-) |
Security concerns:
The fact that software like You are also correct that pulling the latest version of say |
Thanks both.
I like this design too. I was also dinking around with get-pip, and it seems that supports constraint files too! (env) joerick@joerick2 /tmp> cat constraints.txt
pip==19.2.0
(env) joerick@joerick2 /tmp> curl https://bootstrap.pypa.io/get-pip.py | python3 - -c constraints.txt
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1734k 100 1734k 0 0 1227k 0 0:00:01 0:00:01 --:--:-- 1227k
Collecting pip==19.2.0
Using cached https://files.pythonhosted.org/packages/3a/6f/35de4f49ae5c7fdb2b64097ab195020fb48faa8ad3a85386ece6953c11b1/pip-19.2-py2.py3-none-any.whl
Collecting wheel
Using cached https://files.pythonhosted.org/packages/00/83/b4a77d044e78ad1a45610eb88f745be2fd2c6d658f9798a15e384b7d57c9/wheel-0.33.6-py2.py3-none-any.whl
Installing collected packages: pip, wheel
Found existing installation: pip 19.0.3
Uninstalling pip-19.0.3:
Successfully uninstalled pip-19.0.3
Successfully installed pip-19.2 wheel-0.33.6 The relevant bit is At that point, the only thing that isn't pinnable is But I think that gets us to a point where all the software running in cibuildwheel's process is pinned. The question I'm wondering is - is it materially different security-wise if get-pip is installed from bootstrap.pypa.io or from a github raw URL? Personally, I'd rather stick with pypa's URL since it's designed to do exactly this, but there might be another consideration I'm missing? |
If you do go with a constraints file, I highly recommend populating it with SHA-256 hashes so content is verified. Again, I recommend As for pinning |
Another idea, given that we'll probably be dropping Python 2.7 soon - to use ensurepip instead of get-pip - I think you've looked at this previously @YannickJadoul? According to the docs there's a bundled version of pip in every distribution of CPython, which can be installed with this command. |
And yes, Example: (env) joerick@joerick2 /tmp> cat constraints.in
pip
setuptools
(env) joerick@joerick2 /tmp> pip-compile --no-header --allow-unsafe --generate-hashes --upgrade constraints.in
# The following packages are considered to be unsafe in a requirements file:
pip==19.3.1 \
--hash=sha256:21207d76c1031e517668898a6b46a9fb1501c7a4710ef5dfd6a40ad9e6757ea7 \
--hash=sha256:6917c65fc3769ecdc61405d3dfd97afdedd75808d200b2838d7d961cebc0c2c7
setuptools==44.0.0 \
--hash=sha256:180081a244d0888b0065e18206950d603f6550721bd6f8c0a10221ed467dd78e \
--hash=sha256:e5baf7723e5bb8382fc146e33032b241efc63314211a3a120aaa55d62d2bb008 |
Yeah, I did try to use On the security issue: I do appreciate the concerns here; security is way too often overlooked. But to some degree, downloading over HTTPS should already give us sóme authentication, no? Let's definitely add hashes if it's that easy, but I'm not sure I'm very worried about |
I think I agree - and even without https://bootstrap.pypa.io/get-pip.py being pinned we can still claim to have deterministic builds I believe. What do you think @indygreg ? |
In order to claim determinism, you must verify hashes of every asset downloaded from the Internet. That includes As for the security of HTTPS, the encryption protects against man-in-the-middle tampering. But the verification of the endpoint (CA validation) is ensure the certificate presented by the remote server was signed by a trusted root certificate authority. Nearly everybody in open source from Linux distributions to Python uses a set of trusted root certificates maintained by Mozilla (https://www.mozilla.org/en-US/about/governance/policies/security-group/certs/). This set of certificates is optimized for user convenience of people browsing the web. i.e. people want their browser to recognize root CAs being used to sign popular sites. In this root certificate list are some CAs maintained by or under the influence of some governments with questionable track records. If these governments/CAs wanted to, they could issue a new certificate for pretty much any hostname and clients would validate the certificate without issue since it chains up to a trusted root CA. There are some controls/mitigations in place to prevent this. But the trusted root CA system is intrinsically based on (delegated) trust of CAs and if a CA does bad things, security is compromised. A chain is only as strong as its weakest link, etc. While root CA verification might be good enough for most, if you really want to be secure, you need to do something more, like verify content integrity by checking against hashes (and hey - determinism is also a useful property to have) or pin the certificate hash of the server you are connecting to. This is what robust, must-be-secure software does. e.g. Firefox's update mechanism pins the certificate hash of the Mozilla operated update server and verifies the SHA-256 of content downloaded from a CDN because neither the trusted root CA store or a CDN can be ultimately trusted. As for other options, using |
@indygreg Problem with pinning pip version is that it force to wait on new release of cibuildwheel even if some bug in pip is fixed. You can still pin version of pip using If we thing about pining version there much more things to be pinned:
But pining all of this versions force to often release new version and split development on two branches. EDIT. on quay.io there is no option to get previous manylinux docker images.They use only tag lastest. |
Yes, pinning requires new
|
There is option to pin requirements. There is For example there is no option for pin manylinux docker images: https://quay.io/repository/pypa/manylinux2010_x86_64?tab=tags. They do not provide tags for previous version. And they modify it often: https://quay.io/repository/pypa/manylinux2010_x86_64?tab=history. So it is not easy to pin everything for all systems. Or maybe you have some suggestion what could be don better other than |
@Czaki BEFORE_BUILD isn't enough because it is already using versions of pip/setuptools that were installed from latest. Plus it doesn't affect the test environment. Vendoring a get-pip.py would be a neat solution, I think. The manylinux images are a concern, though. If we can't pin to a specific version of those, Linux will never have determinism. I guess we'll need to ping the pypa folks and ask if they can start tagging their images on each release. |
So if there is decision to pin version there also will be decision to use master/develop model? EDIT: The problem which I can see with pinned manylinux are centos repositories. If you need to install anything from it you should not use pinned version of manylinux. Or you can create own image with Dockerfile and then have everything pinned. |
Yeah, I'm assuming
Good thing is: we already have support for this anyway :-)
If I understood correctly, @joerick means that by that time, you already have a new version of
That sounds a bit tricky to me. It would be good to finally get #156 merged, but eh, I don't know.
Good point; I like this idea! And again, this would already be supported using the options for custom manylinux images. So we'd just need to document this! :-)
|
Deterministic builds was released in v1.4.0! |
I think
cibuildwheel
should strive to be deterministic in its behavior. i.e. if you runcibuildwheel
tomorrow, it will have identical behavior to today. Put another way, the user expectation ofcibuildwheel
is that it only changes in significant ways when its version is updated. Having deterministic behavior makes CI and release pipelines more predictable and reproducible. This reduces frustration and is better from a security perspective.Fully deterministic output is hard to achieve. Especially when you don't control the base VM image being executed on. But this doesn't mean
cibuildwheel
shouldn't strive to be deterministic wherever possible.One of the areas where
cibuildwheel
isn't deterministic today is downloading 3rd party dependencies.For example, installing pip on macOS always retrieves the latest stable version of pip (https://github.com/joerick/cibuildwheel/blob/651f6a9172020aa9a2b0c9eb50dfca06d865ace4/cibuildwheel/macos.py#L35). A better solution here is to fetch an explicit version of
get-pip.py
from e.g. https://github.com/pypa/get-pip/raw/309a56c5fd94bd1134053a541cb4657a4e47e09d/get-pip.py (corresponds to pip 19.2.3).Another example of non-deterministic behavior is with
pip install
. I thinkcibuildwheel
should be pinning versions universally (ideally with hashes for additional security protections). Otherwise, the exact installed package version could vary over time. An example where versions aren't being pinned is https://github.com/joerick/cibuildwheel/blob/651f6a9172020aa9a2b0c9eb50dfca06d865ace4/cibuildwheel/linux.py#L89 and https://github.com/joerick/cibuildwheel/blob/651f6a9172020aa9a2b0c9eb50dfca06d865ace4/cibuildwheel/windows.py#L144.Is the
cibuildwheel
project receptive to making behavior more deterministic (and secure) by making downloads (and possibly other behavior) more deterministic?The text was updated successfully, but these errors were encountered: