Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for inline image Markdown - attribute fix #661

Merged
merged 4 commits into from
Mar 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 12 additions & 9 deletions __tests__/ExpensiMark-HTML-test.js
Original file line number Diff line number Diff line change
Expand Up @@ -1779,14 +1779,14 @@ describe('when should keep raw input flag is enabled', () => {
});
});
});

test('Test code fence within inline code', () => {
let testString = 'Hello world `(```test```)` Hello world';
expect(parser.replace(testString)).toBe('Hello world &#x60;(<pre>test</pre>)&#x60; Hello world');

testString = 'Hello world `(```test\ntest```)` Hello world';
expect(parser.replace(testString)).toBe('Hello world &#x60;(<pre>test<br />test</pre>)&#x60; Hello world');

testString = 'Hello world ```(`test`)``` Hello world';
expect(parser.replace(testString)).toBe('Hello world <pre>(&#x60;test&#x60;)</pre> Hello world');

Expand Down Expand Up @@ -1893,12 +1893,9 @@ describe('Image markdown conversion to html tag', () => {
expect(parser.replace(testString)).toBe(resultString);
});

// Currently any markdown used inside the square brackets is converted to html string in the alt attribute
// The attributes should only contain plain text, but it doesn't seem possible to convert markdown to plain text
// or let the parser know not to convert markdown to html for html attributes
xtest('Image with alt text containing markdown', () => {
const testString = '![*bold* _italic_ ~strike~](https://example.com/image.png)';
const resultString = '<img src="https://example.com/image.png" alt="*bold* _italic_ ~strike~" />';
test('Image with alt text containing markdown', () => {
const testString = '![# fake-heading *bold* _italic_ ~strike~ [:-)]](https://example.com/image.png)';
const resultString = '<img src="https://example.com/image.png" alt="# fake-heading &ast;bold&ast; &lowbar;italic&lowbar; &#126;strike&#126; &lbrack;:-)&rbrack;" />';
expect(parser.replace(testString)).toBe(resultString);
});

Expand Down Expand Up @@ -1934,4 +1931,10 @@ describe('Image markdown conversion to html tag', () => {
const resultString = '<img src=\"https://example.com/image.png\" alt=\"test&quot; onerror=&quot;alert(&#x27;xss&#x27;)\" />';
expect(parser.replace(testString)).toBe(resultString);
});

test('No html inside the src attribute', () => {
const testString = '![`code`](https://example.com/image.png)';
const resultString = '<img src="https://example.com/image.png" alt="<code>code</code>" />';
expect(parser.replace(testString)).toBe(resultString);
})
Comment on lines +1935 to +1939
Copy link
Contributor

@tomekzaw tomekzaw Mar 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kidroca Shouldn't it be alt="&lt;code&gt;code&lt;/code&gt;" instead of alt="<code>code</code>"?

It doesn't seem to lead to anything breaking, but preferably should be escaped as I'm not sure it's valid

This breaks Live Markdown library 😢 Not sure if < and > characters are supported in HTML attributes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, the text inside the alt attribute should be alt="`code`" or use escaped entities. However, achieving this proved challenging without significant changes. The code rules execute first, generating a string like [<code>code</code>](...), and subsequent rules then move the text into the alt attribute.

I introduced this test to address a scenario where using the syntax ![code](https://example.com/image.png) incorrectly generated an <img> tag with a src attribute like <img src="<a href="..." />" />". This format, obviously, failed to load the image.

How does this impact Live Markdown functionality?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this impact Live Markdown functionality?

Currently, Live Markdown uses pretty basic algorithm for parsing HTML that alternates between finding < and then >.

The output <img src="..." alt="<code>...</code>" ... /> breaks the parser as the > from <code> gets matched instead of the (self-)closing > from img tag.

According to the HTML standard, > is not a valid character in HTML attribute:

Attributes have a name and a value. Attribute names must consist of one or more characters other than controls, U+0020 SPACE, U+0022 ("), U+0027 ('), U+003E (>), U+002F (/), U+003D (=), and noncharacters.

Is there a chance we could somehow escape the alt attribute?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi,
The text you're referring to in your quote is regarding the attribute name - an attribute name cannot contain the specified characters

According to the HTML standard, > is not a valid character in HTML attribute:

Attributes have a name and a value. Attribute names must consist of one or more characters other than controls, U+0020 SPACE, U+0022 ("), U+0027 ('), U+003E (>), U+002F (/), U+003D (=), and noncharacters.

Attribute values on the other hand, are less restricted as long as the value is wrapped in matching quotes. There's a specific text in the shared spec clearly stating that < and > are supported as long as the attribute value is wrapped in quotes.

The attribute value can remain unquoted if it doesn't contain ASCII whitespace or any of " ' ` = < or >. Otherwise, it has to be quoted using either single or double quotes.

In any case I'll look into it a bit further to see if there's anything I can do to mitigate the issue

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The text you're referring to in your quote is regarding the attribute name

@kidroca oops, you're right, sorry for mistake!

In any case I'll look into it a bit further to see if there's anything I can do to mitigate the issue

I believe we could use a more complex HTML parser in Live Markdown.

cc @robertKozik who adapted ExpensiMark to Live Markdown.

Copy link
Contributor

@robertKozik robertKozik Apr 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But does reverting back when editing work? When I've tested it yesterday I think typing ```test``` as img name, and trying to edit it leads to test string. I may be wrong tho, as I'm not remember exactly nwm I was wrong 😃

Yeah your alternative solution will indeed solve our issues as we wouldn't have an < or > inside attributes. What is ETA for merging it?

Copy link
Contributor Author

@kidroca kidroca Apr 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We seem to be in a bit of a dead lock situation here. To advance my expensify-common PR, I need to demonstrate its effectiveness within the main branch of NewDot. Would the solution I proposed here be acceptable to you: Expensify/App#38952 (comment)?

Feel free to utilize this snapshot of expensify-common in your projects:

"expensify-common": "git+ssh://git@github.com/kidroca/expensify-common.git#3ad1c3322f4ec8999483955afe6f55c95e78172a",

This snapshot reflects the changes in my PR for expensify-common.

  1. Incorporating it into your PR would resolve your issue
  2. It would allow QA to test the MD Image feature.
    • But we should note to test this in the Tests section of your PR.
  3. This in turn would allow approve the merge of the expensify-common PR
  4. Letting you to finally switch back expensify-common to the Expensify's repo

I'll be ready to cover any follow-up work related to the MD image feature.

What are your thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit tricky because we're incorporating the expensify-common version inside react-native-live-markdown as well. So we would need to point to your branch within the new library version, which isn't my preferred approach. If possible, merging your expensify-common PR as it is would be great. If not, we can go with a patch containing your changes.

Here's how I envision the process:

  1. Merge your expensify-common PR (if possible).
  2. Publish a new version of react-native-live-markdown with both our and your fixes.
  3. Create a joint PR for react-native-live-markdown and expensify-common in the E/App repo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me just test your changes in live-markdown beforehand

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kidroca Apart from the one problem I've mentioned here everything looks great. I think we can proceed with the patch inside the react-native-live-markdown so it will unblock you - does it sound good for you?

});
6 changes: 6 additions & 0 deletions __tests__/ExpensiMark-Markdown-test.js
Original file line number Diff line number Diff line change
Expand Up @@ -769,4 +769,10 @@ describe('Image tag conversion to markdown', () => {
const resultString = '![https://example.com/image.png](https://example.com/image.png)';
expect(parser.htmlToMarkdown(testString)).toBe(resultString);
});

test('Image with alt text containing escaped markdown', () => {
const testString = '<img src="https://example.com/image.png" alt="&ast;bold&ast; &lowbar;italic&lowbar; &#126;strike&#126;" />';
const resultString = '![*bold* _italic_ ~strike~](https://example.com/image.png)';
expect(parser.htmlToMarkdown(testString)).toBe(resultString);
});
});
30 changes: 27 additions & 3 deletions lib/ExpensiMark.js
Original file line number Diff line number Diff line change
Expand Up @@ -113,12 +113,13 @@
* Converts markdown style images to img tags e.g. ![Expensify](https://www.expensify.com/attachment.png)
* We need to convert before linking rules since they will not try to create a link from an existing img
* tag.
* Additional sanitization is done to the alt attribute to prevent parsing it further to html by later rules.
*/
{
name: 'image',
regex: MARKDOWN_IMAGE_REGEX,
replacement: (match, g1, g2) => `<img src="${Str.sanitizeURL(g2)}" alt="${g1}" />`,
rawInputReplacement: (match, g1, g2) => `<img src="${Str.sanitizeURL(g2)}" alt="${g1}" data-raw-href="${g2}" data-link-variant="labeled" />`
replacement: (match, g1, g2) => `<img src="${Str.sanitizeURL(g2)}" alt="${this.escapeMarkdownEntities(g1)}" />`,
rawInputReplacement: (match, g1, g2) => `<img src="${Str.sanitizeURL(g2)}" alt="${this.escapeMarkdownEntities(g1)}" data-raw-href="${g2}" data-link-variant="labeled" />`
Copy link
Contributor

@marcaaron marcaaron Mar 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Escaping is good for the "markdown entities" - but why display them at all? The alt tag will show a bunch of junk if the image fails to load or they hover over it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered keeping the text plain initially, but realized that it wouldn't allow us to revert to the original Markdown message from the HTML. This way, the original format is retained.

Regarding hover text, the alt attribute doesn't display text on hover. For that, Markdown has an extended syntax (![alt](src "title text")) that adds a title attribute to the resulting HTML. However, we don't support this feature, similarly to our approach with links.

If an image fails to load, the fallback is something like:
image

But NewDot and react-native-web don't display this fallback since they render a hidden <img> tag instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah hmm. Quite weird. Perhaps we can filter it before rendering i.e. not actually modify the underlying html field. But we can look into that in the App PR. Thanks for the explanation.

},

/**
Expand Down Expand Up @@ -152,7 +153,7 @@
*/
{
name: 'hereMentions',
regex: /([a-zA-Z0-9.!$%&+/=?^`{|}_-]?)(@here)([.!$%&+/=?^`{|}_-]?)(?=\b)(?!([\w'#%+-]*@(?:[a-z\d-]+\.)+[a-z]{2,}(?:\s|$|@here))|((?:(?!<a).)+)?<\/a>|[^<]*(<\/pre>|<\/code>))/gm,

Check warning on line 156 in lib/ExpensiMark.js

View workflow job for this annotation

GitHub Actions / lint

This line has a length of 193. Maximum allowed is 190
replacement: (match, g1, g2, g3) => {
if (!Str.isValidMention(match)) {
return match;
Expand Down Expand Up @@ -197,7 +198,7 @@

process: (textToProcess, replacement) => {
const regex = new RegExp(
`(?![^<]*>|[^<>]*<\\/(?!h1>))([_*~]*?)${MARKDOWN_URL_REGEX}\\1(?!((?:(?!<a).)+)?<\\/a>|[^<]*(<\\/pre>|<\\/code>))`,
`(?![^<]*>|[^<>]*<\\/(?!h1>))([_*~]*?)${MARKDOWN_URL_REGEX}\\1(?!((?:(?!<a).)+)?<\\/a>|[^<]*(<\\/pre>|<\\/code>|.+\\/>))`,
'gi',
);
return this.modifyTextForUrlLinks(regex, textToProcess, replacement);
Expand Down Expand Up @@ -945,4 +946,27 @@
const linksInNew = this.extractLinksInMarkdownComment(newComment);
return linksInOld === undefined || linksInNew === undefined ? [] : _.difference(linksInOld, linksInNew);
}

/**
* Replace MD characters with their HTML entity equivalent
* @param {String} text
* @return {String}
*/
escapeMarkdownEntities(text) {
// A regex pattern matching special MD characters we'd like to escape
const pattern = /([*_{}[\]~])/g;

// A map of MD characters to their HTML entity equivalent
const entities = {
'*': '&ast;',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A backtick is missing here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've intentionally left out inline code and code blocks. These rules are applied first, converting anything that matches into HTML. Therefore, for input like:

![# fake-heading *bold* _italic_ ~strike~ `code`](https://example.com/image.png)

The result is:

<img src="<a href="https://example.com/image.png" target="_blank" rel="noreferrer noopener">https://example.com/image.png</a>" alt="# fake-heading &ast;bold&ast; &lowbar;italic&lowbar; &#126;strike&#126; <code>code</code>" />

This approach reveals a limitation in our rule application method, but I'm unable to cover it easily in the current PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The aforementioned bug has been resolved by modifying the autolink pattern to recognize self-closing tags such as img. The absence of backticks is due to the fact that when escapeMarkdownEntities is executed, all backticks have already been transformed into <code> or <pre>

_: '&lowbar;',
'{': '&lbrace;',
'}': '&rbrace;',
'[': '&lbrack;',
']': '&rbrack;',
'~': '&#126;',
};

return text.replace(pattern, char => entities[char] || char);
}
}
Loading