-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[html] Allow ampersands in attribute values. #2036
[html] Allow ampersands in attribute values. #2036
Conversation
Package publishing
Documentation at https://github.com/dart-lang/ecosystem/wiki/Publishing-automation. |
PR HealthBreaking changes ✔️
Changelog Entry ✔️
Changes to files need to be accounted for in their respective changelogs. Coverage ✔️
This check for test coverage is informational (issues shown here will not fail the PR).
API leaks
|
Package | Leaked API symbols |
---|---|
html | HtmlTokenizer Token HtmlInputStream TagToken DoctypeToken StringToken TreeBuilder ActiveFormattingElements StartTagToken TagAttribute CommentToken CharactersToken SpaceCharactersToken EndTagToken |
This check can be disabled by tagging the PR with skip-leaking-check
.
License Headers ⚠️
// Copyright (c) 2025, the Dart project authors. Please see the AUTHORS file
// for details. All rights reserved. Use of this source code is governed by a
// BSD-style license that can be found in the LICENSE file.
Files |
---|
pkgs/html/lib/src/tokenizer.dart |
pkgs/html/test/parser_feature_test.dart |
All source files should start with a license header.
Unrelated files missing license headers
Files |
---|
pkgs/bazel_worker/benchmark/benchmark.dart |
pkgs/bazel_worker/example/client.dart |
pkgs/bazel_worker/example/worker.dart |
pkgs/benchmark_harness/integration_test/perf_benchmark_test.dart |
pkgs/boolean_selector/example/example.dart |
pkgs/clock/lib/clock.dart |
pkgs/clock/lib/src/clock.dart |
pkgs/clock/lib/src/default.dart |
pkgs/clock/lib/src/stopwatch.dart |
pkgs/clock/lib/src/utils.dart |
pkgs/clock/test/clock_test.dart |
pkgs/clock/test/default_test.dart |
pkgs/clock/test/stopwatch_test.dart |
pkgs/clock/test/utils.dart |
pkgs/coverage/lib/src/coverage_options.dart |
pkgs/coverage/test/collect_coverage_config_test.dart |
pkgs/coverage/test/config_file_locator_test.dart |
pkgs/html/example/main.dart |
pkgs/html/lib/dom.dart |
pkgs/html/lib/dom_parsing.dart |
pkgs/html/lib/html_escape.dart |
pkgs/html/lib/parser.dart |
pkgs/html/lib/src/constants.dart |
pkgs/html/lib/src/encoding_parser.dart |
pkgs/html/lib/src/html_input_stream.dart |
pkgs/html/lib/src/list_proxy.dart |
pkgs/html/lib/src/query_selector.dart |
pkgs/html/lib/src/token.dart |
pkgs/html/lib/src/treebuilder.dart |
pkgs/html/lib/src/utils.dart |
pkgs/html/test/dom_test.dart |
pkgs/html/test/parser_test.dart |
pkgs/html/test/query_selector_test.dart |
pkgs/html/test/selectors/level1_baseline_test.dart |
pkgs/html/test/selectors/level1_lib.dart |
pkgs/html/test/selectors/selectors.dart |
pkgs/html/test/support.dart |
pkgs/html/test/tokenizer_test.dart |
pkgs/pubspec_parse/test/git_uri_test.dart |
pkgs/stack_trace/example/example.dart |
pkgs/watcher/test/custom_watcher_factory_test.dart |
pkgs/yaml_edit/example/example.dart |
This check can be disabled by tagging the PR with skip-license-check
.
@sigmundch - can you take a look? Do you think it's safe to silently ignore parse errors here? I think it would be better if we can track down a way to avoid parsing URL attribute values as HTML entities but I don't see an obvious way to track that in the current implementation. |
FWIW the diff I assumed we'd land in this code location is something like generateSpans: true, attributeSpans: true);
tokenizer.moveNext();
- final token = tokenizer.current as StartTagToken;
+ final token = tokenizer.current;
+ if (token is ParseErrorToken) {
+ throw ParseError(token.data, token.span, token.messageParams);
+ }
+ token as StartTagToken;
if (token.attributeSpans == null) return; // no attributes But that would require us to track down and fix the spurious parse error. |
@natebosch I tracked the tokenizer error to here (I added a couple of links to the related issue):
Fixing it gracefully there... well, I didn't find a good solution. I tried the navie thing of not emitting those non-fatal tokenizer errors, but that seemed to mess up the computation of the spans. There's also a test validating that the
So I guessed that token is a feature, and attempted to fix it one level up. (Throwing if |
Yeah a better error message is my intention with that diff.
Is it a hard requirement to parse this invalid HTML? Or is it possible for you to fix the input to I don't fully understand how or whether we differentiate fatal and non-fatal errors. I expect browser probably allow this syntax to work but if we can be more strict to the standards in the parser that's definitely easier for us. |
Yes, using
I disagree with this being invalid HTML. In practice, it is rare that anybody is going to use the As for the standard; the URI rfc specifically lists The HTML Standard says that Character references like I think it's a little unfortunate that the HTML parser behaves this way, but I can't suggest a better solution than this PR. |
Interesting, reading this part of the spec, it does appear like the parse error was not intended to happen within an attribute value. In particular, see the text around being "consumed as part of an attrinbute". The location you found @ditman in tools/pkgs/html/lib/src/tokenizer.dart Lines 342 to 346 in 9c53358
does appear to be the place to address this issue. Note that there is a parameter if (!fromAttribute) {
_addToken(ParseErrorToken(...))
}
stream.unget(...)
output = ...; Just trying that out locally seems to still keep all tests passing and make your new test pass as well. |
@sigmundch what a cool find Siggi, I totally missed that param! (That method is long and full of terrors :P) Let me add that change and another test case to the tokenizer. |
lol, totally :) |
…side an attribute value. Co-Authored-By: sigmundch <sigmund@google.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks David, looks great!
This seems to be all green, but I can't click the submit button :) /cc @sigmundch |
Skips unexpected tokens when attempting to find a
StartTagToken
indom.dart
, so attribute values can contain the&
sign without it being part of an actual HTML Entity.Contribution guidelines:
dart format
.Note that many Dart repos have a weekly cadence for reviewing PRs - please allow for some latency before initial review feedback.