Reintroduce support for acronyms in Legislative Lists and Calls To Action #285

brucebolt · 2023-08-21T12:46:06Z

This adds support for acronyms in Legislative Lists and Calls to Action, which was removed in #229.

The removal was because acronyms that matched parts of the HTML tags/attributes were also being replaced with acronym markup.

Therefore resolving this by searching for acronyms before converting to HTML.

Trello card

This test is currently failing as abbreviations within Call To Action elements do not currently work (as reported by a user). This functionality will be added in later commits.

This test currently fails as the code to support this has been commented out. Work to fix the issue will be done in a later commit.

In a previous Zendesk ticket (https://govuk.zendesk.com/agent/tickets/4819383), a problem was reported where abbreviations within HTML tag elements were being replaced with abbreviation tags. The fix at the time was to revert the code for adding abbreviations to legislative lists and calls to action. This adds a test for the previously problematic scenario, so we can implement the abbreviations functionality again in a later commit.

This was previously not working (before the code was commented out) as: - multiple calls to `preprocess` meant `@acronyms` was empty on susbsequent runs (using `concat` solves that) - `add_acronym_alt_text` was returning the output of the `each` method (which was some array) rather than the updated HTML Support for other types of element was already commented out and that behaviour has been retained here. It will be fixed in later commits.

This code was previously removed as it was considered unreliable as acronyms within HTML tag attributes were also being replaced. Adding it back in, but now parsing the acronyms before converting to HTML.

This adds the `span` that was added to buttons in alphagov/govuk_publishing_components#3478.

CristinaRO

Thank you for the detailed commit messages and step-by-step walkthrough! I think I understand enough to approve, but I won't be offended if you want to get a second opinion from someone with more experience in this area.

Kramdown handles abbreviations and footnotes natively. Call to action and legislative list are custom Govspeak components built as extensions to Kramdown. These sections are parsed separately in preprocessing, they are ignored in any subsequent parsing because Kramdown ignores HTML. Because of this, we have added custom handling of abbreviations and footnotes specifically for these components [1]. This custom implementation means that abbreviations defined anywhere in the document will be applied to call to action and legislative list components. Otherwise, only abbreviations defined within these components would be applied. This custom implementation has been shown to contain bugs through several Zendesk tickets. The main issues reported are: 1. Acronyms are inserted into the produced HTML, even if undesirable. For example, a link `<email@example.com>` with the acronym `*[email]: Electronic mail is a method of transmitting and receiving messages` would produce `href="mailto:<abbr title="Electronic mail is a method of transmitting and receiving messages">email</abbr>@example.com"<abbr title="Electronic mail is a method of transmitting and receiving messages">email</abbr>@example.com` rather than the expected link. 2. Because the `add_acronym_alt_text` method runs through the text for each acronym, acronyms can be inserted into other acronyms creating invalid HTML. If we change tack, and instead allow Kramdown to parse these components as part of the main conversion to HTML (instead of in preprocessing), Kramdown will handle abbreviations and footnotes correctly, fixing both of these issues (as well as some other undocumented issues). The call to action component requires a surrounding `div` with the `call-to-action` class. Because Kramdown syntax is normally not processed inside a HTML tag, we must toggle the `parse_block_html` option [2]. There is a small regression in that the `$CTA` block must be proceeded by a blank line (demonstrated in `test/govspeak_test.rb:428`), but this seems okay as it's standard markdown. Some of the tests can likely be removed as we are now testing the functionality of the library, however, for now these represent that there are no regressions. [1]: #285 [2]: https://kramdown.gettalong.org/quickref.html#html-elements

Kramdown handles abbreviations and footnotes natively. Call to action and legislative list are custom Govspeak components built as extensions to Kramdown. These sections are parsed separately in preprocessing, they are ignored in any subsequent parsing because Kramdown ignores HTML. Because of this, we have added custom handling of abbreviations and footnotes specifically for these components [1]. This custom implementation means that abbreviations defined anywhere in the document will be applied to call to action and legislative list components. Otherwise, only abbreviations defined within these components would be applied. This custom implementation has been shown to contain bugs through several Zendesk tickets. The main issues reported are: 1. Acronyms are inserted into the produced HTML, even if undesirable. For example, a link `<email@example.com>` with the acronym `*[email]: Electronic mail is a method of transmitting and receiving messages` would produce `href="mailto:<abbr title="Electronic mail is a method of transmitting and receiving messages">email</abbr>@example.com"<abbr title="Electronic mail is a method of transmitting and receiving messages">email</abbr>@example.com` rather than the expected link. 2. Because the `add_acronym_alt_text` method runs through the text for each acronym, acronyms can be inserted into other acronyms creating invalid HTML. If we change tack, and instead allow Kramdown to parse these components as part of the main conversion to HTML (instead of in preprocessing), Kramdown will handle abbreviations and footnotes correctly, fixing both of these issues (as well as some other undocumented issues). The legislative list component currently works by overriding Kramdowns `list` extension [2] and disabling ordered lists [3]. So that we can parse this component as part of the main conversion, this applies two options: `parse_block_html` [4], so that markdown inside the `legislative-list-wrapper` `div` is parsed by Kramdown, and a new custom option of `ordered_lists_disabled`, so that we can control flow inside our `Parser::Govuk` class (subclass of the Kramdown parser). It is possible to override `parse_list` instead of `parse_block_html`, but this means that the outermost list will not be parsed (by the time `parse_list` is called the outermost list is already being parsed) The only difference between this new iteration of the legislative list and the previous iteration, is that we now wrap the component in a `legislative-list-wrapper` `div`. We could remove this in postprocessing, but this requires additional modification of the Kramdown produced HTML which seems unnecessary. This also allows us to remove all remaining code which replicates the footnote and acronym functions of Kramdown. Some of the tests can likely be removed as we are now testing the functionality of the library, however, for now these represent that there are no regressions. [1]: #285 [2]: https://github.com/gettalong/kramdown/blob/bd678ecb59f70778fdb3b08bdcd39e2ab7379b45/lib/kramdown/parser/kramdown/list.rb#L54 [3]: #25 [4]: https://kramdown.gettalong.org/quickref.html#html-elements

Kramdown handles abbreviations and footnotes natively. Call to action and legislative list are custom Govspeak components built as extensions to Kramdown. These sections are parsed separately in preprocessing, they are ignored in any subsequent parsing because Kramdown ignores HTML. Because of this, we have added custom handling of abbreviations and footnotes specifically for these components [1]. This custom implementation means that abbreviations defined anywhere in the document will be applied to call to action and legislative list components. Otherwise, only abbreviations defined within these components would be applied. This custom implementation has been shown to contain bugs through several Zendesk tickets. The main issues reported are: 1. Acronyms are inserted into the produced HTML, even if undesirable. For example, a link `<email@example.com>` with the acronym `*[email]: Electronic mail is a method of transmitting and receiving messages` would produce `href="mailto:<abbr title="Electronic mail is a method of transmitting and receiving messages">email</abbr>@example.com"<abbr title="Electronic mail is a method of transmitting and receiving messages">email</abbr>@example.com` rather than the expected link. 2. Because the `add_acronym_alt_text` method runs through the text for each acronym, acronyms can be inserted into other acronyms creating invalid HTML. If we change tack, and instead allow Kramdown to parse these components as part of the main conversion to HTML (instead of in preprocessing), Kramdown will handle abbreviations and footnotes correctly, fixing both of these issues (as well as some other undocumented issues). The call to action component requires a surrounding `div` with the `call-to-action` class. Because Kramdown syntax is normally not processed inside a HTML tag, we must toggle the `parse_block_html` option [2]. There is a small regression in that the `$CTA` block must be preceded by a blank line (demonstrated in `test/govspeak_test.rb:428`), but this seems okay as it's standard markdown. Some of the tests can likely be removed as we are now testing the functionality of the library, however, for now these represent that there are no regressions. [1]: #285 [2]: https://kramdown.gettalong.org/quickref.html#html-elements

Kramdown handles abbreviations and footnotes natively. Call to action and legislative list are custom Govspeak components built as extensions to Kramdown. These sections are parsed separately in preprocessing, they are ignored in any subsequent parsing because Kramdown ignores HTML. Because of this, we have added custom handling of abbreviations and footnotes specifically for these components [1]. This custom implementation means that abbreviations defined anywhere in the document will be applied to call to action and legislative list components. Otherwise, only abbreviations defined within these components would be applied. This custom implementation has been shown to contain bugs through several Zendesk tickets. The main issues reported are: 1. Acronyms are inserted into the produced HTML, even if undesirable. For example, a link `<email@example.com>` with the acronym `*[email]: Electronic mail is a method of transmitting and receiving messages` would produce `href="mailto:<abbr title="Electronic mail is a method of transmitting and receiving messages">email</abbr>@example.com"<abbr title="Electronic mail is a method of transmitting and receiving messages">email</abbr>@example.com` rather than the expected link. 2. Because the `add_acronym_alt_text` method runs through the text for each acronym, acronyms can be inserted into other acronyms creating invalid HTML. If we change tack, and instead allow Kramdown to parse these components as part of the main conversion to HTML (instead of in preprocessing), Kramdown will handle abbreviations and footnotes correctly, fixing both of these issues (as well as some other undocumented issues). The legislative list component currently works by overriding Kramdowns `list` extension [2] and disabling ordered lists [3]. So that we can parse this component as part of the main conversion, this applies two options: `parse_block_html` [4], so that markdown inside the `legislative-list-wrapper` `div` is parsed by Kramdown, and a new custom option of `ordered_lists_disabled`, so that we can control flow inside our `Parser::Govuk` class (subclass of the Kramdown parser). It is possible to override `parse_list` instead of `parse_block_html`, but this means that the outermost list will not be parsed (by the time `parse_list` is called the outermost list is already being parsed) The only difference between this new iteration of the legislative list and the previous iteration, is that we now wrap the component in a `legislative-list-wrapper` `div`. We could remove this in postprocessing, but this requires additional modification of the Kramdown produced HTML which seems unnecessary. This also allows us to remove all remaining code which replicates the footnote and acronym functions of Kramdown. Some of the tests can likely be removed as we are now testing the functionality of the library, however, for now these represent that there are no regressions. [1]: #285 [2]: https://github.com/gettalong/kramdown/blob/bd678ecb59f70778fdb3b08bdcd39e2ab7379b45/lib/kramdown/parser/kramdown/list.rb#L54 [3]: #25 [4]: https://kramdown.gettalong.org/quickref.html#html-elements

brucebolt added 5 commits August 21, 2023 12:32

Add test for abbreviations in Call To Action elements

4f9a54b

This test is currently failing as abbreviations within Call To Action elements do not currently work (as reported by a user). This functionality will be added in later commits.

Reintroduce test for acronyms in Legislative Lists

9ca47bb

This test currently fails as the code to support this has been commented out. Work to fix the issue will be done in a later commit.

Add support for acronyms in Legislative Lists

7daa93c

This code was previously removed as it was considered unreliable as acronyms within HTML tag attributes were also being replaced. Adding it back in, but now parsing the acronyms before converting to HTML.

brucebolt force-pushed the fix-abbr-cta branch from 1b4be18 to 2088e81 Compare August 22, 2023 07:35

Update test to match change in govuk_publishing_components

ed07d0b

This adds the `span` that was added to buttons in alphagov/govuk_publishing_components#3478.

brucebolt force-pushed the fix-abbr-cta branch from 2088e81 to ed07d0b Compare August 22, 2023 10:36

Update CHANGELOG for addition of acronym feature

442b0a7

brucebolt force-pushed the fix-abbr-cta branch from d48b871 to 442b0a7 Compare August 22, 2023 15:05

brucebolt marked this pull request as ready for review August 22, 2023 15:48

CristinaRO approved these changes Aug 23, 2023

View reviewed changes

brucebolt merged commit 9ae30e4 into main Aug 23, 2023

brucebolt deleted the fix-abbr-cta branch August 23, 2023 10:19

jkempster34 mentioned this pull request Oct 11, 2023

Fix abbreviation bugs in CTA and LegislativeList #291

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reintroduce support for acronyms in Legislative Lists and Calls To Action #285

Reintroduce support for acronyms in Legislative Lists and Calls To Action #285

brucebolt commented Aug 21, 2023 •

edited

Loading

CristinaRO left a comment

Reintroduce support for acronyms in Legislative Lists and Calls To Action #285

Reintroduce support for acronyms in Legislative Lists and Calls To Action #285

Conversation

brucebolt commented Aug 21, 2023 • edited Loading

CristinaRO left a comment

Choose a reason for hiding this comment

brucebolt commented Aug 21, 2023 •

edited

Loading