Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple find() without delimiters for recursive rm #280

Merged
merged 7 commits into from
Sep 2, 2020

Conversation

martindurant
Copy link
Member

Fixes #273
Fixes #279

(needs rerecording)

This method is efficient and works in general. In the case that you recursive-delete a directory (i.e., within a bucket) with content, but where a key of the same name as the directory does not exist, you will get an exception for that path (but process succeeds)

@martindurant
Copy link
Member Author

I would appreciate trials here, before bothering rerecording the tests

@pvanderlinden
Copy link

Quick test regarding #279 : It manages to work way faster then it used to be (large recursive listing was very slow before). However it still fails to delete:

  • it first deletes everything
  • then it either raises an exception OSError: {"Invalid bucket name: 'gs:'"} if you passed in something prepended with 'gs://' or OSError: {'No such object: <fullname you passed in>'}

@martindurant
Copy link
Member Author

The "no such object" is mentioned in the description above. I wonder if we should silently ignore these, since the "root" of the delete tree may or may not exist as a key.

fsspec/filesystem_spec#391 will solve the "gs://" thing

@pvanderlinden
Copy link

Not sure how it supposed to work. Remove usually works unless the specified inputs don't exists. So I think as long as the input paths exists every other not found should be catched?

@martindurant
Copy link
Member Author

Right, but "exists" is a little subtle a meaning here. If we find any paths, then the root path exists, but doesn't necessarily need to be deleted (because if it only an implied prefix, it disappears when all the contents are gone).

@pvanderlinden
Copy link

What I mean: as long as the paths exists before deleting anything it's fine. Any "not found" during deleting is a success as the goal is successfull

@martindurant
Copy link
Member Author

OK, that makes sense! So raise NotFound in the case that no paths were retrieved, ignore any no-such errors. I'll make that happen.

@pvanderlinden
Copy link

That makes sense to me at least, in the case the app doesn't care if the path existed before they can always catch that initial not found.

@pvanderlinden
Copy link

pvanderlinden commented Aug 31, 2020

Will this also work for other recursive listings?

I have some code which lists everything, currently with a direct library usage, but using fsspec would be great (but my test with 0.6.2 made it run very slow).

@martindurant
Copy link
Member Author

It will affect things that call find. Note that the code here does not populate the directory cache, so if you then go to open/list/etc on the files, you will not have gained anything.

@pvanderlinden
Copy link

In my case that's actually good. I use it currently in combination with pyspark, so the cache won't be up-to-date anyway

@martindurant
Copy link
Member Author

@pvanderlinden , would you mind testing again?

@pvanderlinden
Copy link

@martindurant this works good when passing in without protocol, expand_path will add the original path with protocol to the list though, which causes the error (line 737 in core.py).
Also, it doesn't give an error when the path doesn't exist before delete, although that is fine by me.

@martindurant
Copy link
Member Author

The expand_path issue is fixed in fsspec master

@martindurant
Copy link
Member Author

@mariusvniekerk , do you think this should be ported to s3fs too?

@pvanderlinden
Copy link

The expand_path issue is fixed in fsspec master

Sorry, I missed that. It works fine with master indeed.

@mariusvniekerk
Copy link
Collaborator

Yeah it would be handy in s3fs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Recursive remove fails Optimise find()
3 participants