Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix appending of random directory within find() method #507

Merged
merged 2 commits into from
Jan 2, 2021

Conversation

gutzbenj
Copy link
Contributor

@gutzbenj gutzbenj commented Dec 21, 2020

At #506 we've seen that when using the HTTPFileSystem's find method, it appears that the method would randomly add one of the found directories to the returned files listing. This is due to conflicting variable naming here:
https://github.com/intake/filesystem_spec/blob/a5bb95155b450ec2443872a45a4a58ce90748297/fsspec/spec.py#L430-L437

To fix this I renamed resulting path from self.walk() to path_. See here:
https://github.com/earthobservations/filesystem_spec/blob/e3956827181393b30a57ab6d11ede52931a4dec8/fsspec/spec.py#L430-L437

@martindurant
Copy link
Member

Thanks for finding this!

I expect this didn't show up with existing tests of find because, unlike the HTTP case, we can filter out directories easily, right? It would be nice to have a test for HTTP that would have shown the failure, but is now fixed after this change.

@martindurant
Copy link
Member

Fixes #506

fsspec/spec.py Outdated
@@ -427,7 +427,7 @@ def find(self, path, maxdepth=None, withdirs=False, **kwargs):
path = self._strip_protocol(path)
out = dict()
detail = kwargs.pop("detail", False)
for path, dirs, files in self.walk(path, maxdepth, detail=True, **kwargs):
for path_, dirs, files in self.walk(path, maxdepth, detail=True, **kwargs):
Copy link
Contributor

@amotl amotl Dec 21, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just phrase it like this?

Suggested change
for path_, dirs, files in self.walk(path, maxdepth, detail=True, **kwargs):
for _, dirs, files in self.walk(path, maxdepth, detail=True, **kwargs):

…ations, change unused variable to underscore
@gutzbenj gutzbenj force-pushed the fix-random-directory branch from 8bd4275 to d26e6c3 Compare December 26, 2020 12:45
@gutzbenj
Copy link
Contributor Author

Dear Martin,

I added two tests for the FTP and HTTP filesystem querying the Debian repo at [1]. As you can see from the test

  • the url which was queried is included in HTTPs result while for FTP this one is missing
  • for the case of HTTP this one has found the additional filter hrefs at this directory
  • the FTP method returns the relative paths

I propose that the FTP method is aligned with the one from HTTP by adding the base url to the found files thus returning the full filepath. Would you agree on this point?

[1] https://ftp.fau.de/debian-cd/current/amd64/log/success/

@martindurant
Copy link
Member

I propose that the FTP method is aligned with the one from HTTP by adding the base url to the found files thus returning the full filepath. Would you agree on this point?

I don't think I do. In the case of FTP, the host/port are arguments to the class, so each set of arguments is an independent file-system with a set of paths within it (which may depend on the specific user info given) and a single persistent connection. Conversely, HTTP sees the internet as a single filesystem, and makes new connections for every access. Although it's true that you can phrase a unique FTP path like "ftp://user:pass@host:port/path", this is for convenience only. Obviously for a case like this, where you have FTP-on-HTTP, the distinction is blurred.

for the case of HTTP this one has found the additional filter hrefs at this directory

I suppose this is OK, though one wonders what these things are, and whether they are likely to stay the same in the future.

@martindurant
Copy link
Member

Ah, those hrefs are new sorted views of the same directory. There is no way to know that they are not new content.

Ready to merge now?

@martindurant
Copy link
Member

Fixes fsspec/s3fs#393

@gutzbenj
Copy link
Contributor Author

gutzbenj commented Jan 1, 2021

Yes, ready.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants