Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Directly supply parameters to S3ArtifactStorage boto3 client #3257

Closed
sbatchelder opened this issue Dec 2, 2024 · 0 comments
Closed

Directly supply parameters to S3ArtifactStorage boto3 client #3257

sbatchelder opened this issue Dec 2, 2024 · 0 comments
Labels
type / enhancement Issue type: new feature or request

Comments

@sbatchelder
Copy link
Contributor

🚀 Feature

Expose a mechanism to set boto3 client arguments in S3ArtifactStorage to allow for full s3 client customization. This would allow logging/storing of artifacts to new or particular S3 endpoints, to apply settings such credentials directly without necessitating environment variables, and generally unlock the full capabilities of the boto3 client (new endpoints, proxy settings, ssl, etc.

Motivation

My organization has an on-prem bucketstore solution (VAST S3) that I wanted to use to store artifacts. I needed a way to modify S3ArtifactStorage so as to specify the input arguments to the s3 client, eg. boto3.client(url_endpoint=...)

It's possible to configure SOME of the arguments used by the client with environment variables (like AWS_ACCESS_KEY_ID) , but not url_endpoint. It's possible setup the default boto3 Session and provide credentials or refer to a config file profile with boto3.setup_default_session(profile_name=...) but (a) I didn't want to define my connection params that way (b) I'm not even sure I can specify a url_endpoint or proxy servers that way.

Pitch

Create a subclass of S3ArtifactStorage such that the _get_s3_client() method is overriden. The overriden method is initialized with all the desired client arguments . The custom class with boto3 client arguments can be initialized via a S3ArtifactStorage_factory method that returns the ready-to-go custom S3ArtifactStorageCustom class. All that then remains is to register the new S3ArtifactStorageCustom class with the artifact registry. To make the whole experience as seamless as possible, a S3ArtifactStorage_clientconfig(**boto3_client_kwargs) method can be exposed. Calling this method with boto3 client arguments will perform all previously mentioned steps, afterwhich when a Run is initialized run.set_artifacts_uri('s3://...') and run.log_artifact(...) can be used to save artifacts to diverse s3 endpoints (and without mucking with environment variables).

Usage

from aim Run
from aim.storage.artifacts.s3_storage import S3ArtifactStorage_clientconfig

S3ArtifactStorage_clientconfig(
    endpoint_url='http://vast.myorg.net',
    aws_access_key_id='xxxxxxxxx',
    aws_secret_access_key='yyyyyyyyyyyyyyyyyy',
    config = {...}, ...)
run = Run(...)
run.set_artifacts_uri('s3://MYBUCKET/aim_artifacts')
run.log_artifact(...)

Alternatives

I considered modifying S3ArtifactStorage's method directly such that the client configs could be included as part of the class's init, but the artifact_registry registry mechanism doesn't really allow for that.
I opted to patch the registry entry for s3 as opposed to creating a new registry entry such as s3+, because I just want my s3://BUCKET/whatever/... paths to just work.
Otherwise uuhhh, I guess I could save the artifacts locally and have a different mechanism to upload the to s3. Or I guess at that point just handle all the storage if artifacts myself without using aim to do so at all... but that'd be a shame.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type / enhancement Issue type: new feature or request
Projects
None yet
Development

No branches or pull requests

1 participant