Skip to content

Commit

Permalink
Fix to_boto3 method (#619)
Browse files Browse the repository at this point in the history
* Fix to_boto3 method

The method had a bug where it was always creating its own boto3
resource. The intention was that it re-used the resource held by the
class containing the method.

However, we've since moved away from the resource API, because it is not
multiprocessing-friendly. So, the contained class no longer holds a
boto3 resource. We now require to_boto3 to receive the resource as a
parameter. This achieves a trade-off between convenience and thread
safety:

- The user can use the more convenient Resource API
- The smart_open library does not have to create unsafe objects by
  itself (the user does that)

Fix #615

* update tests

* Update CHANGELOG.md
  • Loading branch information
mpenkov authored May 25, 2021
1 parent f92d1ed commit 56cee00
Show file tree
Hide file tree
Showing 5 changed files with 18 additions and 23 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
- Add warning for recently deprecated s3 parameters (PR [#618](https://github.com/RaRe-Technologies/smart_open/pull/618), [@mpenkov](https://github.com/mpenkov))
- Add new top-level compression parameter (PR [#609](https://github.com/RaRe-Technologies/smart_open/pull/609), [@dmcguire81](https://github.com/dmcguire81))
- Drop mock dependency; standardize on unittest.mock (PR [#621](https://github.com/RaRe-Technologies/smart_open/pull/621), [@musicinmybrain](https://github.com/musicinmybrain))
- Fix to_boto3 method (PR [#619](https://github.com/RaRe-Technologies/smart_open/pull/619), [@mpenkov](https://github.com/mpenkov))

# 5.0.0, 30 Mar 2021

Expand Down
8 changes: 6 additions & 2 deletions howto.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,10 +103,12 @@ This returns a `boto3.s3.Object` that you can work with directly.
For example, let's get the content type of a publicly available file:

```python
>>> import boto3
>>> from smart_open import open
>>> resource = boto3.resource('s3') # Pass additional resource parameters here
>>> with open('s3://commoncrawl/robots.txt') as fin:
... print(fin.readline().rstrip())
... boto3_s3_object = fin.to_boto3()
... boto3_s3_object = fin.to_boto3(resource)
... print(repr(boto3_s3_object))
... print(boto3_s3_object.content_type) # Using the boto3 API here
User-Agent: *
Expand Down Expand Up @@ -153,8 +155,10 @@ You can then interact with the object using the `boto3` API:


```python
>>> import boto3
>>> resource = boto3.resource('s3') # Pass additional resource parameters here
>>> with open('s3://commoncrawl/robots.txt') as fin:
... boto3_object = fin.to_boto3()
... boto3_object = fin.to_boto3(resource)
... print(boto3_object)
... print(boto3_object.get()['LastModified'])
s3.Object(bucket_name='commoncrawl', key='robots.txt')
Expand Down
26 changes: 8 additions & 18 deletions smart_open/s3.py
Original file line number Diff line number Diff line change
Expand Up @@ -685,17 +685,12 @@ def terminate(self):
"""Do nothing."""
pass

def to_boto3(self, resource=None):
def to_boto3(self, resource):
"""Create an **independent** `boto3.s3.Object` instance that points to
the same resource as this instance.
The created instance will re-use the session and resource parameters of
the current instance, but it will be independent: changes to the
`boto3.s3.Object` may not necessarily affect the current instance.
the same S3 object as this instance.
Changes to the returned object will not affect the current instance.
"""
if resource is None:
resource = boto3.resource('s3')
assert resource, 'resource must be a boto3.resource instance'
obj = resource.Object(self._bucket, self._key)
if self._version_id is not None:
return obj.Version(self._version_id)
Expand Down Expand Up @@ -890,17 +885,12 @@ def terminate(self):
)
self._upload_id = None

def to_boto3(self, resource=None):
def to_boto3(self, resource):
"""Create an **independent** `boto3.s3.Object` instance that points to
the same resource as this instance.
The created instance will re-use the session and resource parameters of
the current instance, but it will be independent: changes to the
`boto3.s3.Object` may not necessary affect the current instance.
the same S3 object as this instance.
Changes to the returned object will not affect the current instance.
"""
if not resource:
resource = boto3.resource('s3')
assert resource, 'resource must be a boto3.resource instance'
return resource.Object(self._bucket, self._key)

#
Expand Down
4 changes: 2 additions & 2 deletions smart_open/tests/test_s3.py
Original file line number Diff line number Diff line change
Expand Up @@ -419,7 +419,7 @@ def test_to_boto3(self):
with self.assertApiCalls():
# set defer_seek to verify that to_boto3() doesn't trigger an unnecessary API call
with smart_open.s3.Reader(BUCKET_NAME, KEY_NAME, defer_seek=True) as fin:
returned_obj = fin.to_boto3()
returned_obj = fin.to_boto3(boto3.resource('s3'))

boto3_body = returned_obj.get()['Body'].read()
self.assertEqual(contents, boto3_body)
Expand Down Expand Up @@ -594,7 +594,7 @@ def test_to_boto3(self):

with smart_open.s3.open(BUCKET_NAME, KEY_NAME, 'wb') as fout:
fout.write(contents)
returned_obj = fout.to_boto3()
returned_obj = fout.to_boto3(boto3.resource('s3'))

boto3_body = returned_obj.get()['Body'].read()
self.assertEqual(contents, boto3_body)
Expand Down
2 changes: 1 addition & 1 deletion smart_open/tests/test_s3_version.py
Original file line number Diff line number Diff line change
Expand Up @@ -124,7 +124,7 @@ def test_version_to_boto3(self):
self.versions = get_versions(BUCKET_NAME, self.key)
params = {'version_id': self.versions[0]}
with open(self.url, mode='rb', transport_params=params) as fin:
returned_obj = fin.to_boto3()
returned_obj = fin.to_boto3(boto3.resource('s3'))

boto3_body = boto3_body = returned_obj.get()['Body'].read()
self.assertEqual(boto3_body, self.test_ver1)
Expand Down

0 comments on commit 56cee00

Please sign in to comment.