-
-
Notifications
You must be signed in to change notification settings - Fork 381
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP support ACL's x-amz-acl via resource_kwargs (apply on s3's BufferedOutputBase close) #402
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your contribution and your interest in the smart_open library.
Personally, I don't think this is a good idea, because you're dealing with irrelevant boto3 details inside of smart_open. I think it's better to deal with ACLs directly using boto3.
So, I think your use case is better realized using:
with smart_open.open('s3://bucket/key', ...) as fout:
# do stuff here
s3_object = fout.get_s3_object() # this method is currently not implemented
s3_object.Acl().put(your_acl_here)
s3_object.OtherBoto3VoodooHappensHere()
The reason I prefer this way is:
- That there are many other options in
boto3
. It'd be a pain to add and maintain support for all of them, or even part of them. - These options don't even influence the operation of
smart_open
. They're merely hand-me-downs toboto3
. We can set them at any time, e.g. after the S3 object has been written.
We make exception to things like resource_kwargs
because we cannot set them at any time, we have to set them while smart_open
is doing its thing.
@piskvorky @albertoa What do you think?
Thank you for your comments. However that kind of defeats the purpose of using smart_open. Currently we use boto3 and detect the usage of S3 within user given URIs. Within the intended CLI applications, URIs are for the local file systems or S3. The goal of moving to smart_open was to be able to handle local file systems and S3 without having to do different code paths in the applications. This is why using smart_open.open(URI,...) was great for us. However the transport is what makes the behavior differently. Since the behavior is transport specific I figured the s3.py file was the best place for this. On top of that, this is an option that doesn't affect everybody (that's why I used the .pop as it is also used in other places in the code) I liked how smart_open can use transport specific options that are ignored if not needed for the specific transport affecting the current URI. If I have to detect the transport and handle the differences within the application then what would be the purpose of using smart_open? @mpenkov Am I missing something? Maybe I'm using it wrong ;-) |
I think you're conflating the goals of smart_open with their consequences. The mission of smart_open is simple (from README.md):
The emphasis is on "efficient streaming". To achieve that, we use an abstraction familiar to most Python users: the file object, which gets opened by the Consider the built-in io.open function. Does it allow you to specify a mask so it can automatically
This isn't what
Presumably, so you can open the remote location and write to it as a stream using the familiar construct: with open(some_url, 'w') as fout:
fout.write(data) That's what If you want to do other things to the object after you've written it, e.g.
Then that is application logic, and is the responsibility of your application. Application logic does not belong in a library. |
If it is out of scope for smart_open let's go ahead and close the PR. However, I will say that we should not generalize the behavior of what is the responsibility of application vs. library based on this PR. When a library abstracts away the transport and provides a mechanism for transport specific options but then we state that transport behavior is 100% the responsibility of the application it creates a conflict. I understand this specific use case doesn't belong in this library, but it should not be taken as a generalization. Thanks for looking into it. |
support ACL's x-amz-acl via resource_kwargs (apply on s3's BufferedOutputBase close)
Motivation
We often need to support uploading files to S3 buckets from different accounts.
In order for the S3 object to be visible from the account that owns the bucket the ACL needs to be set. A quick way is to use the built in: x-amz-acl
In this PR I overload the resource_kwargs used within s3.py so that we can pass x-amz-acl which is then applied during the close for BufferedOutputBase objects.
Tests
Let me know what tests you would like me to write for it. Keep in mind that the upload doesn't fail unless the owner of the bucket refuses the object creation (may point to whether or not the close is the right place). So a test would require 2 accounts an upload afterwards download from a separate account.
Work in progress
Flagged as WIP pending comments on testing.
Checklist
Before you create the PR, please make sure you have: