Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

413 Payload Too Large when recursive remove a large directory #281

Closed
pvanderlinden opened this issue Aug 30, 2020 · 4 comments
Closed

413 Payload Too Large when recursive remove a large directory #281

pvanderlinden opened this issue Aug 30, 2020 · 4 comments

Comments

@pvanderlinden
Copy link

When calling rm(path, recursive=True), the request fails without an actual error response:

_call non-retriable exception: 
Traceback (most recent call last):
  File "/home/paul/anaconda3/envs/antenna/lib/python3.7/site-packages/gcsfs/core.py", line 487, in _call
    validate_response(r, path)
  File "/home/paul/anaconda3/envs/antenna/lib/python3.7/site-packages/gcsfs/core.py", line 132, in validate_response
    raise HttpError({"code": r.status_code})
gcsfs.utils.HttpError

When digging deeper:
<Response [413]>

This happens basically on large directories.

@martindurant
Copy link
Member

rm includes a batchsize= parameter, default 20. Can you find a smaller value that works?
Perhaps the code logic is wrong, and you can find out by debugging a little how many files to delete were actually included in the call that failed.

I would appreciate if, while digging, you also helped improve the code to dig out the actual error from the general HttpError case. Note that I have deleted large numbers of files in a single call before.

@pvanderlinden
Copy link
Author

I just checked, and the batchsize parameter is only available in the latest version. Unfortunately the latest version is unusable due to #279
Closing this issue though as it has been resolved in the latest version (I think however that 20 is quiet low, this code has ran before with a large amount of files, but I assume it has crossed the threshold now).

@martindurant
Copy link
Member

Let's get that issue fixed, then. Did you try #280?

I think however that 20 is quite low

Since we can now send many requests concurrently, there is less of a need to pack them into large batches. Indeed, this very issue shows the problem. I don't know what an optimal batch-size would be...

@pvanderlinden
Copy link
Author

I didn't know the new version automatically did many calls at once, that's a great improvement (as well as asyncio support!). I will try out the PR, didn't see that last week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants