-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cloud Storage recently stopped handling the RFC 3986 URI encoding by it self #57
Comments
I do confirm: it works with the libraries-bom 3.2.0, but no longer works with 3.3.0. |
This was broken by the following commit: The problem disappeared when I changed GenericUrl.java as it was before:
|
Workaround:
|
Thanks for the detailed report. I think you've nailed the root cause. We're going to have to dig into this one and figure out exactly how to fix it. The old http java client behavior was a common but incorrect handling of plus signs that caused other problems. Something in the GCS stack--I'm not sure what yet--was depending on the buggy behavior. If we're lucky, we can fix it here. If not, we might have to fix this somewhere on the server. |
Encoding spaces as "%20" not as "+" if met in the path component of URL should probably help. |
Yes, that's exactly what we should be doing. The question is what piece of code in which project is doing this encoding. It probably isn't in this repo. |
The easiest way to fix: com.google.api.client.util.escape.PercentEscaper
It fixes problem with the storage. But it may have some side effects. |
Where do you propose making that change? |
I guess the source of the com.google.api.client.util.escape.PercentEscaper class exists only here: |
I don't think we're going to change http-java-client. 1.33.0 had a bug here and 1.34.0 fixed it. The problem is some other code somewhere has a complementary bug. |
We should dig into the actual HTTP calls that are happening here to understand what the server is sending us. That might narrow this down. See https://cloud.google.com/storage/docs/xml-api/overview and https://cloud.google.com/storage/docs/json_api/ I'm not sure which of these two the client library is wrap[ping. |
Per the GCS API docs %20 is correct and + is wrong. This is consistent with the HTTP spec as well. Something is not following the docs, and I now think it might actually be in this repo. let me look |
As I can see, the storage passes the names of bucket and object to the http client as is. The client is responsible for URL encoding/decoding. It encodes spaces as '+', but since 1.33.0 '+' is no longer decoded as space. Encoding as '%20' allows to decode it properly. |
The client should not encode spaces as +. Just need to figure out exactly where that happens so I can fix it. |
This works only to obtain a signed URL |
This is stack trace from the point where the storage parameters (bucket name, object name) are starting to be encoded:
|
This is quite a huge bug, preventing anyone from reading files containing spaces in their name, which is quite frequent. Apparently, it exists in the latest version 1.103.0, but not in version 1.102.0. Wouldn't it be possible to release a new version identical to 1.102.0 for those upgrading to the latest version? I had the luck to have caught this issue by a test before going to production, but I guess this could break a lot of applications. |
* feat: decode uri path components correctly The old implementation was incorrecly treating '+' as a space. Now the only things that get decoded in the path are uri escaped sequences. Fixes #398 * tweak javadoc * remove hardcoded string
This only applies to signing objects. The code path is not used when reading objects. |
I can confirm this finding. |
@elharo Have you considered update
|
closing this issue as its duplicate of #121. |
@athakor I don't know why you choose to close this old bug and keep the new one which it seems has less priority as well, but any way we are still waiting for a fix, we can just use the work around and make sure google-http-client stays with none problematic version, but we use many Google client java libraries, not sure we want to take the risk of updating them but keeping one of their transitive dependencies. |
Environment details
Java version: 8
google-cloud-java version(s): libraries-bom:3.3.0
Steps to reproduce
1- Upload a file with spaces in its name
2- Read that file
Code example
Stack trace
External references such as API reference guides used
https://cloud.google.com/storage/docs/request-endpoints#encoding
According to this the encoding is typically handled for you by client libraries, such as the Cloud Storage Client Libraries, so you can pass the raw object name to them.
Any additional information below
This stopped working recently After Upgrading the libraries-bom from 3.1.0 to 3.3.0
The object exists in the bucket with spaces.
The text was updated successfully, but these errors were encountered: