Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

azure-cli package could lose some weight #7387

Open
akx opened this issue Sep 21, 2018 · 56 comments
Open

azure-cli package could lose some weight #7387

akx opened this issue Sep 21, 2018 · 56 comments

Comments

@akx
Copy link

akx commented Sep 21, 2018

The azure-cli package (Ubuntu Xenial) could stand to lose some weight.

😲 and 😳 don't begin to describe my reaction at the size difference between the AWS CLI and the Azure CLI packages.

Package: awscli
Version: 1.11.13-1ubuntu1~16.04.0
Depends: python3, python3-botocore (>= 1.4.70), python3-colorama, python3-docutils, python3-rsa, python3-s3transfer, python3:any (>= 3.3.2-2~)
Installed-Size: 2.9 MB

Package: azure-cli
Version: 2.0.45-1~xenial
Installed-Size: 347 MB
Depends: libc6 (>= 2.17), libssl1.0.0 (>= 1.0.2~beta3)
  • Why must the package bundle its own version of Python 3.6? Couldn't it just depend on a system python3 (on platforms where it is known there is a recent enough Python 3)?
    • Why does that bundled version also include the Python 3.6 test suite (55.8 MB)?
    • Why are all .py files precompiled to .pyc (twice in the case of the stdlib)?
  • Why are the py2 versions of all files with _py3 variants even included if az is always run with Python 3?

It's also not just about the size; install speed is also a thing. It takes 50 seconds to install the single azure-cli package on Azure machine, with its 39 000 files (dpkg-query -L azure-cli | wc -l: 39094), just a little less than installing the twentyish packages that comprise all of awscli's dependencies (that can be used by other software on the system).

@yugangw-msft
Copy link
Contributor

CC @mayurid @troydai @johanste @lmazuel, this is known issue. We did some improvement for windows installer, but same kind of improvement can be done on Linux as well.

@troydai troydai self-assigned this Sep 21, 2018
@troydai
Copy link
Contributor

troydai commented Sep 21, 2018

Thank you for the feedback. You made a great point. We will definitely look into this. I will keep this issue open and assign it to myself.

@yugangw-msft
Copy link
Contributor

@marstr, could you please prioritize this work and get it done sometime in April?

@marstr marstr self-assigned this Mar 4, 2019
@marstr
Copy link
Member

marstr commented Mar 4, 2019

I'm excited to take care of this :)

@simonbrady
Copy link

@marstr Thanks for picking this up, I've just been through another painfully slow upgrade (to 2.0.60 under WSL/Ubuntu) so I can assure you this fix will be welcomed!

@benc-uk
Copy link

benc-uk commented Mar 11, 2019

Geat that this is finally getting looked at!

Every time I update the CLI, I kiss goodbye to my machine for ~20 minutes. I dread updating it.
Bundling Python with the CLI seems like madness

Looking forward to these changes

@marstr
Copy link
Member

marstr commented Mar 11, 2019

20 minutes? Yikes. Which platform are you using @benc-uk? Maybe Ubuntu via WSL?

@benc-uk
Copy link

benc-uk commented Mar 11, 2019

Yes.
A very common configuration I find at customers and partners. First it's the actual update and then Windows Defender going absolutely crazy

I've had some snide comments from customers about the CLI in this regard

@marstr
Copy link
Member

marstr commented Mar 11, 2019

All the more reason to get this taken care of. Thanks for the honest feedback.

@troydai
Copy link
Contributor

troydai commented Mar 11, 2019

The WSL file system being slow when transfer large number of files were the problem when I looked at it months ago. Windows Defender is more about the cold start time of the command.

@benc-uk
Copy link

benc-uk commented Mar 12, 2019

It's a pipe dream at this point as I can see the amount of work that's gone into this Python version of the CLI, but...

A single executable (i.e. based on golang) would be ideal. I've been working with Kubernetes, Helm and Terraform. Their tooling is written in Golang and it's really nice just having this single static binary you can put anywhere

EDIT: the new AzCopy V10 has gone down this route

@marstr
Copy link
Member

marstr commented Mar 12, 2019

If you look into my GitHub profile, you'll see that I'm a huge fan of Go and have spent a non-trivial part of my career on/with it. I love Terraform and k8s, and am envious of their ease of distribution. TBH, there are a lot of benefits to Go, but with a project like ours where we need to support many many contributors across the company, asking people to learn Go is a much heavier lift than asking them to use/learn Python. Not to mention, I don't think this product is at burn-it down and rewrite quality just yet ;)

@benc-uk
Copy link

benc-uk commented Mar 12, 2019

I totally understand, that's why I prefixed my comment with "this is a pipe dream" 😄
Having done some Go myself it's quite the context switch from other languages

Hopefully some tidy up of the packaging is all that is required to get things speedy again

@sptramer
Copy link
Contributor

@marstr Will slimming the .deb package finally remove the bundled Python interpreter and finally rely on the system Python? This will affect install instructions.

@marstr
Copy link
Member

marstr commented Mar 19, 2019

Yes! That's what I'm working on right now.

@benc-uk
Copy link

benc-uk commented Sep 2, 2022

I don't think there's anyway out of this now, beyond a full rewrite in Go, Rust or Dotnet etc, something that can compile to a single binary

Each week that passes the CLL gets larger and larger, and the chances of a rewrite smaller and smaller

Where does it stop? When the CLI takes up 2GB? 5GB? 50GB?

@sodul
Copy link

sodul commented Sep 2, 2022

@benc-uk Have you tried https://github.com/clumio-code/azure-sdk-trim ?

I wrote this a while back to delete older API directories that are obviously superfluous in 99% of cases and the site-packages/azure directory goes from 846.6 MB to 348.1 MB for us. That's not perfect, it is still ridiculously large, but until the Azure developers actually do something about this problem it does help quite a bit.

If you call it from a Dockerfile make sure to have it in the same RUN call as when the Azure cli/sdk is installed or you will still get the overhead in the layers.

@jiasli
Copy link
Member

jiasli commented Sep 5, 2022

@benc-uk, If you do an ncdu, you will find Azure CLI itself (/cli, 28.6 MB) is pretty small:

image

It is actually the Azure Python SDK (/mgmt, 676.7 MB) that makes Azure CLI huge. See #7387 (comment) for the explanation.

Rewriting in Go, Rust, .NET won't give too much benefit as those Azure SDKs are equally huge.

@benc-uk
Copy link

benc-uk commented Sep 5, 2022

Agreed, but with a compiled language you ship a binary, you don't ship the SDK with it!

If I wrote a Go app that used the Go SDK for Azure, used a few functions from the SDK, and compiled it - it would be completely standalone executable and only a few megabytes

@usrme
Copy link

usrme commented Feb 10, 2023

I still want to see a version of Azure CLI that can be composed of different components more easily, but I've written a blog post that can hopefully help some people out: https://usrme.xyz/posts/how-to-trim-a-container-image-that-includes-azure-cli/. Using the methods described there I was able to slim an image from 1.17GB to 307MB!

@psadi
Copy link

psadi commented Feb 12, 2023

I was trying to find a cli mainly for azure devops. Installing the cli literally (via pip) shocked me. My usage is just to manage repositories & pull requests.

I understand the installer itself is pretty small, but having 1G of dependencies is not ideal, at-least for my use case

image

Isn't there way to decouple modules and use them independently, the current az cli is kind of overkill for my workflow.

@ivanechegaray
Copy link

Some people are confusing or don't understand the problem.
The CLI alone can only weigh a few megabytes, but the bad design of the entire CLI, not only the core part, causes it to have dependencies for more than 1.5GB and continues to grow as well commented above. For automated processes that use disposable computing where you can't cache images, downloading the 1.5GB is not optimal.

Even more I see with great surprise that some say that changing to Go we will practically not gain anything regarding the size of the CLI, surely they are the ones who love and support the current development that is a disaster.

@mpender
Copy link

mpender commented Mar 27, 2023

quite true on the actual issue of dependency hell, just looking at the pip install execution logs makes ones eyes water in disbelief. Looking at other major cloud providers I can see their equivalent cli tools weigh in differently
AWS ~ 210 MB
GCP ~ 822 MB

Does it add more confusion to request a 'core' cli that is smaller but handles common activities (VMs, Blobs, AD, etc) or is that just masking the fundamental issue. think I would nearly prefer to dynamically add extensions to whatever i need rather than pulling down everything.

@bebound
Copy link
Contributor

bebound commented Mar 28, 2023

I've created this pr to fix the problem. #25801
result:
Ubuntu 22.04 installed size 1,196 MB -> 322 MB
Docker image size 1,300 MB -> 706 MB

@akx
Copy link
Author

akx commented Feb 8, 2025

I'm glad to report (/s, if that wasn't obvious) that things have gotten worse in the last 6 years. The current version of the azure-cli Debian package is 693 megs on disk. Maybe I'm old, but it's a bit weird that azure-cli is larger than a CD-ROM.

# curl -sL https://aka.ms/InstallAzureCLIDeb | bash
[...]
# apt show azure-cli
Package: azure-cli
Version: 2.68.0-1~noble
Priority: extra
Section: python
Maintainer: Azure Python CLI Team <azpycli@microsoft.com>
Installed-Size: 693 MB
Depends: libc6 (>= 2.38), libcrypt1 (>= 1:4.1.0), libffi8 (>= 3.4), libgcc-s1 (>= 4.2), libssl3t64 (>= 3.0.0), libuuid1 (>= 2.20.1), zlib1g (>= 1:1.2.0)
Homepage: https://github.com/azure/azure-cli
Download-Size: 55.0 MB

@benc-uk
Copy link

benc-uk commented Feb 12, 2025

It's never going to be fixed long as they continue to use Python IMO, and they'll never move off Python because a rewrite is too big a job

The only answer would be to modularize things into multiple optional packages/sub-packages, but that'll add complexity for users, so I also don't see that happening

@akx
Copy link
Author

akx commented Feb 12, 2025

It's never going to be fixed long as they continue to use Python IMO, and they'll never move off Python because a rewrite is too big a job

That's not exactly the issue.

  • uv pip install azure-cli installs 190 megs of packages, but the Debian package is much larger.
  • There are files shipped for old versions of APIs (is it really worth it to ship v2015_06_15 network APIs – is there a region that uses them?), and having "_py3" and non-"_py3" variants of some files, even if azure-cli only works with Python 3.9+.

@sodul
Copy link

sodul commented Feb 14, 2025

@benc-uk the size of Azure CLI is not because of the language but, as @akx mentioned, because the Azure SDK for Python is designed in a manner that embeds every single prior release of the Azure APIs. For example if I go to my current python packages directory:

❯ du -shc azure/mgmt/monitor/v* 
796K	azure/mgmt/monitor/v2015_04_01
576K	azure/mgmt/monitor/v2015_07_01
608K	azure/mgmt/monitor/v2016_03_01
332K	azure/mgmt/monitor/v2016_09_01
200K	azure/mgmt/monitor/v2017_12_01_preview
280K	azure/mgmt/monitor/v2018_01_01
652K	azure/mgmt/monitor/v2018_03_01
356K	azure/mgmt/monitor/v2018_04_16
492K	azure/mgmt/monitor/v2018_06_01_preview
204K	azure/mgmt/monitor/v2018_11_27_preview
468K	azure/mgmt/monitor/v2019_03_01
392K	azure/mgmt/monitor/v2019_06_01
672K	azure/mgmt/monitor/v2019_10_17
328K	azure/mgmt/monitor/v2020_10_01
724K	azure/mgmt/monitor/v2021_04_01
860K	azure/mgmt/monitor/v2021_05_01_preview
616K	azure/mgmt/monitor/v2022_06_01
480K	azure/mgmt/monitor/v2022_10_01
8.8M	total

Why we need versions from 2015, or 5 versions from 2018 including 2 preview versions? The latest, valid, version is less than 500K but all these versions add up to almost 9MB. Worse newer versions can depend and import older versions so you can't just delete the older versions.

I wrote a tool several years ago to carefully delete unreferenced older versions so it is possible that there are even more versions of the monitor api:
https://github.com/clumio-code/azure-sdk-trim

Nowadays a fresh install of the azure folder is 1.3GB which the tool trims to 300MB. Last year something happened and the azure folder had shrunk to 600MB and trimmed to 300MB, but new versions of APIs got added and we are back to over 1GB.

Again this has nothing to do with Python but everything to do with an extremely poor design choice to force everyone using Azure to embed all the API versions. This problem has existed for years and there are several issues, including this one, documenting the lack of significant progress to provide something remotely reasonable.

The azure cli was not always in Python it used to be in NodeJS and it was much worse. One of the worse part was that to get an API token you had to run a special cli command which would launch Internet Explorer, and only Internet Explorer. That did not work from a linux or macOS machine and we had to launch Windows VMs, install nodeJS and the azure cli in order to create the token and copy it to non Windows machines.

@pachisb
Copy link

pachisb commented Feb 14, 2025 via email

@sodul
Copy link

sodul commented Feb 19, 2025

@pachisb in our case we do not follow the recommended way to install the azure cli, we install it as part of our requirements.txt file so we can re-use our version of python. Furthermore we leave the system version of python alone and use pyenv so we can install a fresh version of python with recent vulnerability and bug fixes. Note that we can't install python 3.13 yet because the Azure SDK is currently incompatible with 3.13 due to the microsoft authentication library, so regardless of the CLI we are stuck with 3.12.

We install the Azure CLI through pip because:

  • we have our own python automation that needs the azure CLI and SDK for our CI/CD pipelines, and we do not want to have 2 giant copies of SDKs in our container.
  • we want to use a fresh version of python to ensure security issues are patched (installed through pyenv).
  • the official cli installer installs older versions of python with known security vulnerabilities.
  • Azure SDK and CLI are strictly opinionated on their dependencies and will install third party packages with known vulnerabilities which takes months to be updated in their own dependencies.

We have a special script to install all this, where we create a custom azure-requirements.txt, install that first, then install our updated requirements. This 'tricks' pip into accepting conflicting requirements but things do work anyways and we are able to pass our security reports.

If you have a simple requirements.txt you should be able to just add azure-cli==2.69.0 to it. If you do not need to manage your python packages directly you should be able to just do this:

pip install azure-cli azure-sdk-trim
azure-sdk-trim

Note that the azure-sdk team has created a similar tool to my azure-sdk-trim afterwards and it is called by their installer but it is not as aggressive.

@pachisb
Copy link

pachisb commented Feb 24, 2025

@sodul Thanks a lot for your detailed explanation! I will definitely give a try to different installation methods, including installing via requirements.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests