Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reintroduce support for acronyms in Legislative Lists and Calls To Action #285

Merged
merged 7 commits into from
Aug 23, 2023

Conversation

brucebolt
Copy link
Member

@brucebolt brucebolt commented Aug 21, 2023

This adds support for acronyms in Legislative Lists and Calls to Action, which was removed in #229.

The removal was because acronyms that matched parts of the HTML tags/attributes were also being replaced with acronym markup.

Therefore resolving this by searching for acronyms before converting to HTML.

Trello card

This test is currently failing as abbreviations within Call To Action
elements do not currently work (as reported by a user).

This functionality will be added in later commits.
This test currently fails as the code to support this has been commented
out.

Work to fix the issue will be done in a later commit.
In a previous Zendesk ticket
(https://govuk.zendesk.com/agent/tickets/4819383), a problem was
reported where abbreviations within HTML tag elements were being
replaced with abbreviation tags.

The fix at the time was to revert the code for adding abbreviations to
legislative lists and calls to action.

This adds a test for the previously problematic scenario, so we can
implement the abbreviations functionality again in a later commit.
This was previously not working (before the code was commented out) as:
- multiple calls to `preprocess` meant `@acronyms` was empty on
  susbsequent runs (using `concat` solves that)
- `add_acronym_alt_text` was returning the output of the `each` method
  (which was some array) rather than the updated HTML

Support for other types of element was already commented out and that
behaviour has been retained here. It will be fixed in later commits.
This code was previously removed as it was considered unreliable as
acronyms within HTML tag attributes were also being replaced.

Adding it back in, but now parsing the acronyms before converting to
HTML.
@brucebolt brucebolt marked this pull request as ready for review August 22, 2023 15:48
Copy link

@CristinaRO CristinaRO left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the detailed commit messages and step-by-step walkthrough! I think I understand enough to approve, but I won't be offended if you want to get a second opinion from someone with more experience in this area.

@brucebolt brucebolt merged commit 9ae30e4 into main Aug 23, 2023
@brucebolt brucebolt deleted the fix-abbr-cta branch August 23, 2023 10:19
jkempster34 added a commit that referenced this pull request Oct 11, 2023
Kramdown handles abbreviations and footnotes natively.

Call to action and legislative list are custom Govspeak components built as
extensions to Kramdown. These sections are parsed separately in preprocessing,
they are ignored in any subsequent parsing because Kramdown ignores HTML.

Because of this, we have added custom handling of abbreviations and footnotes
specifically for these components [1]. This custom implementation means that
abbreviations defined anywhere in the document will be applied to call to
action and legislative list components. Otherwise, only abbreviations defined
within these components would be applied.

This custom implementation has been shown to contain bugs through several
Zendesk tickets. The main issues reported are:

1. Acronyms are inserted into the produced HTML, even if undesirable. For
example, a link `<email@example.com>` with the acronym `*[email]: Electronic
mail is a method of transmitting and receiving messages` would produce
`<p>href="mailto:<abbr title="Electronic mail is a method of transmitting and
receiving messages">email</abbr>@example.com"<abbr title="Electronic mail is a
method of transmitting and receiving messages">email</abbr>@example.com</p>`
rather than the expected link.

2. Because the `add_acronym_alt_text` method runs through the text for each
acronym, acronyms can be inserted into other acronyms creating invalid HTML.

If we change tack, and instead allow Kramdown to parse these components as part
of the main conversion to HTML (instead of in preprocessing), Kramdown will
handle abbreviations and footnotes correctly, fixing both of these issues (as well
as some other undocumented issues). 

The call to action component requires a surrounding `div` with the
`call-to-action` class. Because Kramdown syntax is normally not processed inside
a HTML tag, we must toggle the `parse_block_html` option [2]. 

There is a small regression in that the `$CTA` block must be proceeded by a
blank line (demonstrated in `test/govspeak_test.rb:428`), but this seems okay
as it's standard markdown.

Some of the tests can likely be removed as we are now testing the functionality
of the library, however, for now these represent that there are no
regressions.

[1]: #285
[2]: https://kramdown.gettalong.org/quickref.html#html-elements
jkempster34 added a commit that referenced this pull request Oct 11, 2023
Kramdown handles abbreviations and footnotes natively.

Call to action and legislative list are custom Govspeak components built as
extensions to Kramdown. These sections are parsed separately in preprocessing,
they are ignored in any subsequent parsing because Kramdown ignores HTML.

Because of this, we have added custom handling of abbreviations and footnotes
specifically for these components [1]. This custom implementation means that
abbreviations defined anywhere in the document will be applied to call to
action and legislative list components. Otherwise, only abbreviations defined
within these components would be applied.

This custom implementation has been shown to contain bugs through several
Zendesk tickets. The main issues reported are:

1. Acronyms are inserted into the produced HTML, even if undesirable. For
example, a link `<email@example.com>` with the acronym `*[email]: Electronic
mail is a method of transmitting and receiving messages` would produce
`<p>href="mailto:<abbr title="Electronic mail is a method of transmitting and
receiving messages">email</abbr>@example.com"<abbr title="Electronic mail is a
method of transmitting and receiving messages">email</abbr>@example.com</p>`
rather than the expected link.

2. Because the `add_acronym_alt_text` method runs through the text for each
acronym, acronyms can be inserted into other acronyms creating invalid HTML.

If we change tack, and instead allow Kramdown to parse these components as part
of the main conversion to HTML (instead of in preprocessing), Kramdown will
handle abbreviations and footnotes correctly, fixing both of these issues (as
well as some other undocumented issues). 

The legislative list component currently works by overriding Kramdowns `list`
extension [2] and disabling ordered lists [3]. So that we can parse this
component as part of the main conversion, this applies two options:
`parse_block_html` [4], so that markdown inside the `legislative-list-wrapper`
`div` is parsed by Kramdown, and a new custom option of
`ordered_lists_disabled`, so that we can control flow inside our
`Parser::Govuk` class (subclass of the Kramdown parser). It is possible to
override `parse_list` instead of `parse_block_html`, but this means that the
outermost list will not be parsed (by the time `parse_list` is called the
outermost list is already being parsed)

The only difference between this new iteration of the legislative list and the
previous iteration, is that we now wrap the component in a
`legislative-list-wrapper` `div`. We could remove this in postprocessing, but
this requires additional modification of the Kramdown produced HTML which seems
unnecessary.

This also allows us to remove all remaining code which replicates the footnote
and acronym functions of Kramdown.

Some of the tests can likely be removed as we are now testing the functionality
of the library, however, for now these represent that there are no
regressions.

[1]: #285
[2]:
https://github.com/gettalong/kramdown/blob/bd678ecb59f70778fdb3b08bdcd39e2ab7379b45/lib/kramdown/parser/kramdown/list.rb#L54
[3]: #25
[4]: https://kramdown.gettalong.org/quickref.html#html-elements
jkempster34 added a commit that referenced this pull request Oct 12, 2023
Kramdown handles abbreviations and footnotes natively.

Call to action and legislative list are custom Govspeak components built as
extensions to Kramdown. These sections are parsed separately in preprocessing,
they are ignored in any subsequent parsing because Kramdown ignores HTML.

Because of this, we have added custom handling of abbreviations and footnotes
specifically for these components [1]. This custom implementation means that
abbreviations defined anywhere in the document will be applied to call to
action and legislative list components. Otherwise, only abbreviations defined
within these components would be applied.

This custom implementation has been shown to contain bugs through several
Zendesk tickets. The main issues reported are:

1. Acronyms are inserted into the produced HTML, even if undesirable. For
example, a link `<email@example.com>` with the acronym `*[email]: Electronic
mail is a method of transmitting and receiving messages` would produce
`<p>href="mailto:<abbr title="Electronic mail is a method of transmitting and
receiving messages">email</abbr>@example.com"<abbr title="Electronic mail is a
method of transmitting and receiving messages">email</abbr>@example.com</p>`
rather than the expected link.

2. Because the `add_acronym_alt_text` method runs through the text for each
acronym, acronyms can be inserted into other acronyms creating invalid HTML.

If we change tack, and instead allow Kramdown to parse these components as part
of the main conversion to HTML (instead of in preprocessing), Kramdown will
handle abbreviations and footnotes correctly, fixing both of these issues (as well
as some other undocumented issues). 

The call to action component requires a surrounding `div` with the
`call-to-action` class. Because Kramdown syntax is normally not processed inside
a HTML tag, we must toggle the `parse_block_html` option [2]. 

There is a small regression in that the `$CTA` block must be preceded by a
blank line (demonstrated in `test/govspeak_test.rb:428`), but this seems okay
as it's standard markdown.

Some of the tests can likely be removed as we are now testing the functionality
of the library, however, for now these represent that there are no
regressions.

[1]: #285
[2]: https://kramdown.gettalong.org/quickref.html#html-elements
jkempster34 added a commit that referenced this pull request Oct 12, 2023
Kramdown handles abbreviations and footnotes natively.

Call to action and legislative list are custom Govspeak components built as
extensions to Kramdown. These sections are parsed separately in preprocessing,
they are ignored in any subsequent parsing because Kramdown ignores HTML.

Because of this, we have added custom handling of abbreviations and footnotes
specifically for these components [1]. This custom implementation means that
abbreviations defined anywhere in the document will be applied to call to
action and legislative list components. Otherwise, only abbreviations defined
within these components would be applied.

This custom implementation has been shown to contain bugs through several
Zendesk tickets. The main issues reported are:

1. Acronyms are inserted into the produced HTML, even if undesirable. For
example, a link `<email@example.com>` with the acronym `*[email]: Electronic
mail is a method of transmitting and receiving messages` would produce
`<p>href="mailto:<abbr title="Electronic mail is a method of transmitting and
receiving messages">email</abbr>@example.com"<abbr title="Electronic mail is a
method of transmitting and receiving messages">email</abbr>@example.com</p>`
rather than the expected link.

2. Because the `add_acronym_alt_text` method runs through the text for each
acronym, acronyms can be inserted into other acronyms creating invalid HTML.

If we change tack, and instead allow Kramdown to parse these components as part
of the main conversion to HTML (instead of in preprocessing), Kramdown will
handle abbreviations and footnotes correctly, fixing both of these issues (as
well as some other undocumented issues). 

The legislative list component currently works by overriding Kramdowns `list`
extension [2] and disabling ordered lists [3]. So that we can parse this
component as part of the main conversion, this applies two options:
`parse_block_html` [4], so that markdown inside the `legislative-list-wrapper`
`div` is parsed by Kramdown, and a new custom option of
`ordered_lists_disabled`, so that we can control flow inside our
`Parser::Govuk` class (subclass of the Kramdown parser). It is possible to
override `parse_list` instead of `parse_block_html`, but this means that the
outermost list will not be parsed (by the time `parse_list` is called the
outermost list is already being parsed)

The only difference between this new iteration of the legislative list and the
previous iteration, is that we now wrap the component in a
`legislative-list-wrapper` `div`. We could remove this in postprocessing, but
this requires additional modification of the Kramdown produced HTML which seems
unnecessary.

This also allows us to remove all remaining code which replicates the footnote
and acronym functions of Kramdown.

Some of the tests can likely be removed as we are now testing the functionality
of the library, however, for now these represent that there are no
regressions.

[1]: #285
[2]:
https://github.com/gettalong/kramdown/blob/bd678ecb59f70778fdb3b08bdcd39e2ab7379b45/lib/kramdown/parser/kramdown/list.rb#L54
[3]: #25
[4]: https://kramdown.gettalong.org/quickref.html#html-elements
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants