Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading a 0 B file from s3 raises KeyError #667

Closed
3 tasks done
Domcikas opened this issue Nov 9, 2021 · 4 comments · Fixed by #771
Closed
3 tasks done

Reading a 0 B file from s3 raises KeyError #667

Domcikas opened this issue Nov 9, 2021 · 4 comments · Fixed by #771
Labels

Comments

@Domcikas
Copy link

Domcikas commented Nov 9, 2021

Problem description

A problem is somewhat similar to the one described here #548 , though the Error is not the same.

Be sure your description clearly answers the following questions:

  • What are you trying to achieve?
    I'm trying to read the file that might be empty in S3.
  • What is the expected result?
    The file is read without exceptions.
  • What are you seeing instead?
    KeyError exception is thrown

Steps/code to reproduce the problem

  • Have an empty file in S3
  • Run the following code
from smart_open import open
with open('S3_uri', 'rb') as file:
    file.read()

Traceback:

Traceback (most recent call last):
  File "/home/user/.local/lib/python3.8/site-packages/smart_open/s3.py", line 330, in _get
    return client.get_object(Bucket=bucket, Key=key, Range=range_string)
  File "/home/user/.local/lib/python3.8/site-packages/botocore/client.py", line 386, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/home/user/.local/lib/python3.8/site-packages/botocore/client.py", line 705, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (InvalidRange) when calling the GetObject operation: The requested range is not satisfiable

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/user/.local/lib/python3.8/site-packages/smart_open/s3.py", line 438, in _open_body
    response = _get(
  File "/home/user/.local/lib/python3.8/site-packages/smart_open/s3.py", line 338, in _get
    raise wrapped_error from error
OSError: unable to access bucket: 'mybucket' key: 'existing_file' version: None error: An error occurred (InvalidRange) when calling the GetObject operation: The requested range is not satisfiable

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/user/.local/lib/python3.8/site-packages/smart_open/smart_open_lib.py", line 235, in open
    binary = _open_binary_stream(uri, binary_mode, transport_params)
  File "/home/user/.local/lib/python3.8/site-packages/smart_open/smart_open_lib.py", line 398, in _open_binary_stream
    fobj = submodule.open_uri(uri, mode, transport_params)
  File "/home/user/.local/lib/python3.8/site-packages/smart_open/s3.py", line 224, in open_uri
    return open(parsed_uri['bucket_id'], parsed_uri['key_id'], mode, **kwargs)
  File "/home/user/.local/lib/python3.8/site-packages/smart_open/s3.py", line 291, in open
    fileobj = Reader(
  File "/home/user/.local/lib/python3.8/site-packages/smart_open/s3.py", line 574, in __init__
    self.seek(0)
  File "/home/user/.local/lib/python3.8/site-packages/smart_open/s3.py", line 666, in seek
    self._current_pos = self._raw_reader.seek(offset, whence)
  File "/home/user/.local/lib/python3.8/site-packages/smart_open/s3.py", line 417, in seek
    self._open_body(start, stop)
  File "/home/user/.local/lib/python3.8/site-packages/smart_open/s3.py", line 450, in _open_body
    self._position = self._content_length = int(error_response['ActualObjectSize'])
KeyError: 'ActualObjectSize'

Versions

Please provide the output of:

smart_open 5.2.1

Checklist

Before you create the issue, please make sure you have:

  • Described the problem clearly
  • Provided a minimal reproducible example, including any required data
  • Provided the version numbers of the relevant software
@sungwy-backup
Copy link

sungwy-backup commented Jan 25, 2022

I am seeing the same issue with reading attempting to open a 0B file.

This PR was supposedly merged to fix this issue, but it actually introduces the missing KeyError reported above.

The expected key 'ActualObjectSize' cannot be found on botocore.exceptions.ClientError which is the wrapped error that gets returned from the boto3.client.get_object call.

I propose that instead of trying to get 'ActualObjectSize' from the wrapped error object, we instead get the content length by making a get_object call without the range_string if there is an InvalidRange error:

self._position = self._content_length = self._client.get_object(Bucket=self._bucket, Key=self._key)["ContentLength"]

@mpenkov
Copy link
Collaborator

mpenkov commented Jan 26, 2022

Do we need to make an additional call? If yes, then I'd rather avoid doing unless it's absolutely necessary.

Are you interested in making a PR?

@mpenkov mpenkov added the bug label Jan 26, 2022
@gmichaeljaison
Copy link

I am still facing "ClientError: An error occurred (416) when calling the GetObject operation: Requested Range Not Satisfiable" error with latest version 6.2.0 for files with 0 bytes. Even though it is supposed to be fixed in #548

@Darkheir
Copy link
Contributor

I created a PR calling get_object only when we get a KeyError when accessing ActualObjectSize.

This way it should limit unnecessary HTTP call.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants