-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix appending of random directory within find() method #507
Fix appending of random directory within find() method #507
Conversation
Thanks for finding this! I expect this didn't show up with existing tests of |
Fixes #506 |
fsspec/spec.py
Outdated
@@ -427,7 +427,7 @@ def find(self, path, maxdepth=None, withdirs=False, **kwargs): | |||
path = self._strip_protocol(path) | |||
out = dict() | |||
detail = kwargs.pop("detail", False) | |||
for path, dirs, files in self.walk(path, maxdepth, detail=True, **kwargs): | |||
for path_, dirs, files in self.walk(path, maxdepth, detail=True, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just phrase it like this?
for path_, dirs, files in self.walk(path, maxdepth, detail=True, **kwargs): | |
for _, dirs, files in self.walk(path, maxdepth, detail=True, **kwargs): |
dc4404a
to
8bd4275
Compare
…ations, change unused variable to underscore
8bd4275
to
d26e6c3
Compare
Dear Martin, I added two tests for the FTP and HTTP filesystem querying the Debian repo at [1]. As you can see from the test
I propose that the FTP method is aligned with the one from HTTP by adding the base url to the found files thus returning the full filepath. Would you agree on this point? |
I don't think I do. In the case of FTP, the host/port are arguments to the class, so each set of arguments is an independent file-system with a set of paths within it (which may depend on the specific user info given) and a single persistent connection. Conversely, HTTP sees the internet as a single filesystem, and makes new connections for every access. Although it's true that you can phrase a unique FTP path like "ftp://user:pass@host:port/path", this is for convenience only. Obviously for a case like this, where you have FTP-on-HTTP, the distinction is blurred.
I suppose this is OK, though one wonders what these things are, and whether they are likely to stay the same in the future. |
Ah, those hrefs are new sorted views of the same directory. There is no way to know that they are not new content. Ready to merge now? |
Fixes fsspec/s3fs#393 |
Yes, ready. |
At #506 we've seen that when using the HTTPFileSystem's
find
method, it appears that the method would randomly add one of the found directories to the returned files listing. This is due to conflicting variable naming here:https://github.com/intake/filesystem_spec/blob/a5bb95155b450ec2443872a45a4a58ce90748297/fsspec/spec.py#L430-L437
To fix this I renamed resulting path from
self.walk()
to path_. See here:https://github.com/earthobservations/filesystem_spec/blob/e3956827181393b30a57ab6d11ede52931a4dec8/fsspec/spec.py#L430-L437