Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: the find() method is now also returning the specified directory to search #393

Closed
jorisvandenbossche opened this issue Nov 9, 2020 · 6 comments · Fixed by #413
Closed

Comments

@jorisvandenbossche
Copy link

Compared to s3fs 0.4, the new 0.5 release changed the behaviour of S3FileSystem.find(). This has been partly fixed already on master I think (eg #378, to respect again the maxdepth keyword (to allow non-recursive finds), and the fix for this also seems to have fixed other things for pyarrow, i.e. to include sub-directories in the output, and not only the files).
But from testing it with pyarrow, I noticed another difference that is still present on master: it now also returns the "current" directory that is passed to find(..).

So when doing fs.find(base_dir), this gives a list of files/directories in "base_dir". But with s3fs 0.5/master, this now also includes "base_dir" itself.
That behaviour is in contrast with s3fs0.4, but also with eg fsspec's LocalFileSystem.

I have a reproducible test script included below, but this is using a slimmed down and adapted version of a pyarrow test, so using pyarrows test machinery (with a fixture to set up minio to test), so this will need some edits to be includable in the s3fs tests.

import pathlib
import sys
import os
import subprocess
from tempfile import TemporaryDirectory

import pytest

from pyarrow.util import find_free_port


@pytest.fixture
def tempdir(tmpdir):
    # convert pytest's LocalPath to pathlib.Path
    return pathlib.Path(tmpdir.strpath)


@pytest.fixture(scope='session')
def s3_connection():
    host, port = 'localhost', find_free_port()
    access_key, secret_key = 'arrow', 'apachearrow'
    return host, port, access_key, secret_key


@pytest.fixture(scope='session')
def s3_server(s3_connection):
    host, port, access_key, secret_key = s3_connection

    address = '{}:{}'.format(host, port)
    env = os.environ.copy()
    env.update({
        'MINIO_ACCESS_KEY': access_key,
        'MINIO_SECRET_KEY': secret_key
    })

    with TemporaryDirectory() as tempdir:
        args = ['minio', '--compat', 'server', '--quiet', '--address',
                address, tempdir]
        proc = None
        try:
            proc = subprocess.Popen(args, env=env)
        except OSError:
            pytest.skip('`minio` command cannot be located')
        else:
            yield proc
        finally:
            if proc is not None:
                proc.kill()


@pytest.fixture
def fsspec_localfs(request, tempdir):
    fsspec = pytest.importorskip("fsspec")
    fs = fsspec.filesystem('file')
    return fs, lambda p: (tempdir / p).as_posix()


@pytest.fixture
def fsspec_s3fs(request, s3_connection, s3_server):
    s3fs = pytest.importorskip("s3fs")

    host, port, access_key, secret_key = s3_connection
    bucket = 'pyarrow-filesystem/'

    fs = s3fs.S3FileSystem(
        key=access_key,
        secret=secret_key,
        client_kwargs=dict(endpoint_url='http://{}:{}'.format(host, port))
    )
    try:
        fs.mkdir(bucket)
    except IOError:
        # BucketAlreadyOwnedByYou on second test
        pass

    return fs, bucket.__add__


@pytest.fixture(params=[
    pytest.param(
        pytest.lazy_fixture('fsspec_localfs'),
        id='localfs'
    ),
    pytest.param(
        pytest.lazy_fixture('fsspec_s3fs'),
        id='s3fs)'
    ),
])
def filesystem_config(request):
    return request.param


def test_get_file_info_with_selector(filesystem_config):
    fs, pathfn = filesystem_config

    base_dir = pathfn('selector-dir/')
    file_a = pathfn('selector-dir/test_file_a')
    file_b = pathfn('selector-dir/test_file_b')
    dir_a = pathfn('selector-dir/test_dir_a')
    file_c = pathfn('selector-dir/test_dir_a/test_file_c')

    try:
        fs.mkdir(base_dir)
        with fs.open(file_a, mode="wb"):
            pass
        with fs.open(file_b, mode="wb"):
            pass
        fs.mkdir(dir_a)
        with fs.open(file_c, mode="wb"):
            pass

        infos = fs.find(base_dir, maxdepth=None, withdirs=True, detail=True)
        assert len(infos) == 4

        for info in infos.values():
            if info["name"].endswith(file_a):
                assert info["type"] == "file"
            elif info["name"].endswith(file_b):
                assert info["type"] == "file"
            elif info["name"].endswith(file_c):
                assert info["type"] == "file"
            elif info["name"].rstrip("/").endswith(dir_a):
                assert info["type"] == "directory"
            else:
                raise ValueError('unexpected path {}'.format(info["name"]))
    finally:
        fs.rm(base_dir, recursive=True)
@martindurant
Copy link
Member

Can you form this into a specific test for s3fs, so we can include it in a PR with a fix?

@jorisvandenbossche
Copy link
Author

Note the test itself is already s3fs specific, it's only the parametrization/fixture that uses MinIO for interacting with S3. I suppose that could be quite easily changed to use the s3 fixture using moto from the tests here.

@jorisvandenbossche
Copy link
Author

@martindurant are there any instructions about running the s3fs tests?
I get "Unable to locate credentials" errors in the fixture, but no idea what credentions it is looking for.

(arrow-dev) joris@joris-XPS-13-9380:~/scipy/repos/s3fs$ pytest s3fs/tests/test_s3fs.py::test_shallow_find -v
=============================================================================================== test session starts ===============================================================================================
platform linux -- Python 3.7.3, pytest-6.1.1, py-1.9.0, pluggy-0.12.0 -- /home/joris/miniconda3/envs/arrow-dev/bin/python
cachedir: .pytest_cache
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/home/joris/scipy/repos/s3fs/.hypothesis/examples')
rootdir: /home/joris/scipy/repos/s3fs, configfile: pytest.ini
plugins: hypothesis-4.47.5, lazy-fixture-0.6.3, mock-3.1.1
collected 1 item                                                                                                                                                                                                  

s3fs/tests/test_s3fs.py::test_shallow_find ERROR                                                                                                                                                            [100%]

===================================================================================================== ERRORS ======================================================================================================
_______________________________________________________________________________________ ERROR at setup of test_shallow_find _______________________________________________________________________________________

s3_base = None

    @pytest.fixture()
    def s3(s3_base):
        from botocore.session import Session
        # NB: we use the sync botocore client for setup
        session = Session()
        client = session.create_client('s3', endpoint_url=endpoint_uri)
>       client.create_bucket(Bucket=test_bucket_name, ACL='public-read')

s3fs/tests/test_s3fs.py:85: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../../../miniconda3/envs/arrow-dev/lib/python3.7/site-packages/botocore/client.py:316: in _api_call
    return self._make_api_call(operation_name, kwargs)
../../../miniconda3/envs/arrow-dev/lib/python3.7/site-packages/botocore/client.py:622: in _make_api_call
    operation_model, request_dict, request_context)
../../../miniconda3/envs/arrow-dev/lib/python3.7/site-packages/botocore/client.py:641: in _make_request
    return self._endpoint.make_request(operation_model, request_dict)
../../../miniconda3/envs/arrow-dev/lib/python3.7/site-packages/botocore/endpoint.py:102: in make_request
    return self._send_request(request_dict, operation_model)
../../../miniconda3/envs/arrow-dev/lib/python3.7/site-packages/botocore/endpoint.py:132: in _send_request
    request = self.create_request(request_dict, operation_model)
../../../miniconda3/envs/arrow-dev/lib/python3.7/site-packages/botocore/endpoint.py:116: in create_request
    operation_name=operation_model.name)
../../../miniconda3/envs/arrow-dev/lib/python3.7/site-packages/botocore/hooks.py:356: in emit
    return self._emitter.emit(aliased_event_name, **kwargs)
../../../miniconda3/envs/arrow-dev/lib/python3.7/site-packages/botocore/hooks.py:228: in emit
    return self._emit(event_name, kwargs)
../../../miniconda3/envs/arrow-dev/lib/python3.7/site-packages/botocore/hooks.py:211: in _emit
    response = handler(**kwargs)
../../../miniconda3/envs/arrow-dev/lib/python3.7/site-packages/botocore/signers.py:90: in handler
    return self.sign(operation_name, request)
../../../miniconda3/envs/arrow-dev/lib/python3.7/site-packages/botocore/signers.py:160: in sign
    auth.add_auth(request)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <botocore.auth.S3SigV4Auth object at 0x7f47705a2358>, request = <botocore.awsrequest.AWSRequest object at 0x7f47705a2208>

    def add_auth(self, request):
        if self.credentials is None:
>           raise NoCredentialsError
E           botocore.exceptions.NoCredentialsError: Unable to locate credentials

../../../miniconda3/envs/arrow-dev/lib/python3.7/site-packages/botocore/auth.py:357: NoCredentialsError
---------------------------------------------------------------------------------------------- Captured stderr setup ----------------------------------------------------------------------------------------------
 * Running on http://127.0.0.1:5555/ (Press CTRL+C to quit)
127.0.0.1 - - [10/Nov/2020 17:29:26] "GET / HTTP/1.1" 200 -
================================================================================================ warnings summary =================================================================================================
../../../miniconda3/envs/arrow-dev/lib/python3.7/site-packages/_pytest/config/__init__.py:1230
  /home/joris/miniconda3/envs/arrow-dev/lib/python3.7/site-packages/_pytest/config/__init__.py:1230: PytestConfigWarning: Unknown config option: env
  
    self._warn_or_fail_if_strict("Unknown config option: {}\n".format(key))

-- Docs: https://docs.pytest.org/en/stable/warnings.html
============================================================================================= short test summary info =============================================================================================
ERROR s3fs/tests/test_s3fs.py::test_shallow_find - botocore.exceptions.NoCredentialsError: Unable to locate credentials
=========================================================================================== 1 warning, 1 error in 3.93s ===========================================================================================

@martindurant
Copy link
Member

You need to set the key and secret variables to anything at all (see the CI script)

@martindurant
Copy link
Member

Can you check if this is still failing with master versions, please?

@jorisvandenbossche
Copy link
Author

jorisvandenbossche commented Jan 13, 2021

Yes, for me it's still failing with s3fs master (edit: and with fsspec master as well)

martindurant pushed a commit to martindurant/s3fs that referenced this issue Jan 15, 2021
martindurant added a commit that referenced this issue Jan 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants