Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading a 0 B file from s3 raises OSError #548

Closed
3 tasks done
rchui opened this issue Oct 10, 2020 · 3 comments · Fixed by #549
Closed
3 tasks done

Reading a 0 B file from s3 raises OSError #548

rchui opened this issue Oct 10, 2020 · 3 comments · Fixed by #549
Labels

Comments

@rchui
Copy link

rchui commented Oct 10, 2020

Problem description

smart_open should be able to read a 0 byte file from s3 without raising an Exception.

Steps/code to reproduce the problem

  1. Upload a 0 byte file to s3
  2. Use smart_open.open to try to read it:
from smart_open import open

with open('s3://some/random/uri', 'rb') as file:
    file.read()
Traceback
s3.py in read_file(self)
     55         """
     56 
---> 57         with open(self.uri, "rb") as file:
     58             contents: bytes = file.read()
     59         return contents

venv/lib/python3.6/site-packages/smart_open/smart_open_lib.py in open(uri, mode, buffering, encoding, errors, newline, closefd, opener, ignore_ext, transport_params)
    221     #
    222     binary_mode = _TO_BINARY_LUT.get(mode, mode)
--> 223     binary = _open_binary_stream(uri, binary_mode, transport_params)
    224     if ignore_ext:
    225         decompressed = binary

.venv/lib/python3.6/site-packages/smart_open/smart_open_lib.py in _open_binary_stream(uri, mode, transport_params)
    399     scheme = _sniff_scheme(uri)
    400     submodule = transport.get_transport(scheme)
--> 401     fobj = submodule.open_uri(uri, mode, transport_params)
    402     if not hasattr(fobj, 'name'):
    403         logger.critical('TODO')

.venv/lib/python3.6/site-packages/smart_open/s3.py in open_uri(uri, mode, transport_params)
    168     parsed_uri, transport_params = _consolidate_params(parsed_uri, transport_params)
    169     kwargs = smart_open.utils.check_kwargs(open, transport_params)
--> 170     return open(parsed_uri['bucket_id'], parsed_uri['key_id'], mode, **kwargs)
    171 
    172 

.venv/lib/python3.6/site-packages/smart_open/s3.py in open(bucket_id, key_id, mode, version_id, buffer_size, min_part_size, session, resource_kwargs, multipart_upload_kwargs, multipart_upload, singlepart_upload_kwargs, object_kwargs, defer_seek)
    245             resource_kwargs=resource_kwargs,
    246             object_kwargs=object_kwargs,
--> 247             defer_seek=defer_seek,
    248         )
    249     elif mode == constants.WRITE_BINARY:

.venv/lib/python3.6/site-packages/smart_open/s3.py in __init__(self, bucket, key, version_id, buffer_size, line_terminator, session, resource_kwargs, object_kwargs, defer_seek)
    472 
    473         if not defer_seek:
--> 474             self.seek(0)
    475 
    476     #

.venv/lib/python3.6/site-packages/smart_open/s3.py in seek(self, offset, whence)
    567             offset += self._current_pos
    568 
--> 569         self._current_pos = self._raw_reader.seek(offset, whence)
    570         logger.debug('new_position: %r', self._current_pos)
    571 

.venv/lib/python3.6/site-packages/smart_open/s3.py in seek(self, offset, whence)
    355             self._position = self._content_length
    356         else:
--> 357             self._open_body(start, stop)
    358 
    359         return self._position

.venv/lib/python3.6/site-packages/smart_open/s3.py in _open_body(self, start, stop)
    381                 version=self._version_id,
    382                 Range=range_string,
--> 383                 **self._object_kwargs
    384             )
    385         except IOError as ioe:

.venv/lib/python3.6/site-packages/smart_open/s3.py in _get(s3_object, version, **kwargs)
    284         )
    285         wrapped_error.backend_error = error
--> 286         raise wrapped_error from error
    287 
    288 

OSError: unable to access bucket: '<bucket>' key: '<key>' version: None error: An error occurred (InvalidRange) when calling the GetObject operation: The requested range is not satisfiable

Versions

Please provide the output of:

import platform, sys, smart_open
print(platform.platform())
print("Python", sys.version)
print("smart_open", smart_open.__version__)
Darwin-19.6.0-x86_64-i386-64bit
Python 3.6.10 (default, Jun 29 2020, 13:50:05) 
[GCC 4.2.1 Compatible Apple LLVM 11.0.3 (clang-1103.0.32.29)]
smart_open 2.2.1

Checklist

Before you create the issue, please make sure you have:

  • Described the problem clearly
  • Provided a minimal reproducible example, including any required data
  • Provided the version numbers of the relevant software
@mpenkov mpenkov added the bug label Oct 10, 2020
@Andrew-Sheridan
Copy link

Can confirm this is a regression. This error does not occur with version 2.1.0

@Andrew-Sheridan
Copy link

Same issue present in version 3.0.0 :/

@piskvorky
Copy link
Owner

@jcushman could this be related to you #495?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants