-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor CesiumUtility::Uri
to use ada instead of uriparser
#1072
Conversation
"%7Bz%7D/%7Bx%7D/" | ||
"%7By%7D.terrain?v=%7Bversion%7D&extensions=octvertexnormals-metadata"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes me a little nervous. I guess it's needed because {
and }
aren't valid characters in a URL path, right? But it gets tricky fast. What if the URL already contains %7B
and %7D
characters? How do we distinguish the ones that are really the placeholder delimiters from the others that might be in the URL?
The best solution, I think, is for the URL library to understand placeholders. That sounds like a tall order, but it looks like ada added support two weeks ago!
ada-url/ada#785
This isn't in a released version. And it might not fit perfectly; we might need to replace {x}
with {:x}
, if I'm reading it correctly. Still, this should be a lot more reliable than trying to encode/decode URLs with placeholders.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would probably be good to switch to the URLPattern
implementation. It looks like it would speed up placeholder substitution (not that it's a big performance cost for us right now, but a win is a win), and having our placeholders in line with the spec can't hurt. My question is, is there anywhere in the codebase currently where users can input their own templated URLs? Besides UrlTemplateRasterOverlay, of course, which hasn't yet been shipped so it's OK if we change the behavior. I'd hate for us to update and break everyone's projects because they don't have colons in their URL templates.
As for it not being in a released version, that might be a concern. It doesn't look like there's any defined release cadence for ada, since the last release was in September. Using this might mean we have to add it as a submodule - which wouldn't be the end of the world, but I would prefer not to if we can avoid it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do wonder how many URLs contain encoded {
and }
characters that aren't URL templates that we should be worried about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do wonder how many URLs contain encoded { and } characters that aren't URL templates that we should be worried about.
Well, it's not just the {
and }
we have to worry about, though. It looks like URLs are now percent-decoded before doing the placeholder substitution. So if you had, say, a percent encoded slash, that would get decoded, too, and could break the URL. How common is this? I don't know, but it definitely came up in CesiumJS at one point. CesiumJS replaces just the {
and }
characters:
https://github.com/CesiumGS/cesium/blob/main/packages/engine/Source/Core/Resource.js#L550
Using this might mean we have to add it as a submodule - which wouldn't be the end of the world, but I would prefer not to if we can avoid it.
No need for a submodule, it can be done with a vcpkg overlay port. We haven't used those in cesium-native itself yet, though (only Unreal and Unity).
is there anywhere in the codebase currently where users can input their own templated URLs
Yeah, the WMTS tile provider uses placeholders.
I suspect we could simply replace {whatever}
with {:whatever}
on-the-fly, though, and that would be reliable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kring After looking into ada's URL pattern support, I think it would be better if we avoided it. Besides the URL pattern support being fairly sparse on documentation, not yet in a stable release, and not currently building on GCC versions lower than 13.1, it's actually meant to do the opposite of what we're looking to do. It's meant to match a pattern against a URL, such as for the purposes of routing (like how Express.js uses them), rather than using a pattern to generate a URL from a set of parameters. We could probably finagle the library to work this way, but it would be a bit of a headache and doesn't seem too worth it all things considered. Probably would not be particularly more performant than our current implementation.
Instead, I've rewritten substituteTemplateParameters
so that encoded {
and }
characters are treated as if they are their unencoded equivalents. This allows the templates to be substituted without decoding the entire string, saving us a string copy as well as preventing any issues from URL decoding things we shouldn't be. Technically, this does still have the issue where an encoded {
}
that isn't meant to be treated as a placeholder will still be treated as one, but considering CesiumJS would have the same issue I think it's a fine solution.
@kring I've done my best to make sure that the If we did change the API to allow us to know when a template should be ignored (if the callback returns Anyways, probably the most impactful change here is that an unclosed template parameter no longer throws a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking really good! I have a few small comments. See also #1084. And please update CHANGES.md because this does have at least one important user-facing improvement (unicode in URLs!).
ada-url: Substitute layer.json placeholders before treating as URLs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes so far! I'm not sure if you were done after the last round of review, but here are a few more comments for you.
@kring Split off |
This looks great and is ready to merge other than one small reservation I have left, and I feel bad about this one: we really shouldn't use abbreviations like "Params", especially in a relatively prominent part of our public API. How about renaming |
@kring Renamed! |
Thanks for all the work (and patience!) on this @azrogers! |
In the discussion around #973, we decided we wanted to see if we could replace the aging
uriparser
library with an alternative that is compatible with the URIs modern browsers use, which has advanced past the RFC 3986 specification in many ways. To this end, I've refactored the implementation ofCesiumUtility::Uri
to work with ada, a modern C++ URI parser - notably, the one used by NodeJS. This will help us avoid unicode issues like the aforementioned #973, but also hopefully give us more tools to deal with URLs and avoid other problems like these in the future.The biggest part of this refactor is that
CesiumUtility::Uri
instances are now constructed and interacted with via member functions as opposed to the previous approach of static methods accepting and returning strings (though the static methods remain for API compatibility). The advantage of this is it means we only have to parse a URL once - if we're performing multiple operations, such as using it as the base URL for many different relative URLs, or fetching multiple query values from it, or setting multiple query values on it, we don't need to be constantly allocating and copying around strings to accomplish this.I've leaned on the side of permissiveness over adherence to the spec. For example, opaque paths are allowed as URLs despite being specifically disallowed by the WhatWG spec. This is because in at least one point in Cesium Native, we're adding query parameters to an opaque path (what will be a relative path, but hasn't yet been resolved). To accomplish this, we just pretend that it's actually a file URI, and remove the dummy scheme when the user extracts the URI into a string. This also helps some of the tests work out because, for whatever reason, we wrote them counting on the fallback behavior of the old implementation returning the
relative
parameter when a URL failed to parse.I wasn't able to work out all of the issues from CI before the end of the week, but I'll pick it back up on Monday. In the mean time, it's definitely worth taking a look at my approach and giving feedback on.