Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for inline image Markdown #658

Merged
merged 7 commits into from
Mar 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 56 additions & 0 deletions __tests__/ExpensiMark-HTML-test.js
Original file line number Diff line number Diff line change
Expand Up @@ -1879,3 +1879,59 @@ describe('multi-level blockquote', () => {
expect(parser.replace(quoteTestStartString)).toBe(quoteTestReplacedString);
});
});

describe('Image markdown conversion to html tag', () => {
test('Single image', () => {
const testString = '![test](https://example.com/image.png)';
const resultString = '<img src="https://example.com/image.png" alt="test" />';
expect(parser.replace(testString)).toBe(resultString);
});

test('Text containing images', () => {
const testString = 'An image of a banana: ![banana](https://example.com/banana.png) an image of a developer: ![dev](https://example.com/developer.png)';
const resultString = 'An image of a banana: <img src="https://example.com/banana.png" alt="banana" /> an image of a developer: <img src="https://example.com/developer.png" alt="dev" />';
expect(parser.replace(testString)).toBe(resultString);
});

// Currently any markdown used inside the square brackets is converted to html string in the alt attribute
// The attributes should only contain plain text, but it doesn't seem possible to convert markdown to plain text
// or let the parser know not to convert markdown to html for html attributes
xtest('Image with alt text containing markdown', () => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xtest ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The xtest function in Jest is a way to skip a particular test. When you prefix a test with xtest instead of test, Jest will ignore this test during the test run. This is useful for temporarily disabling a test that might be failing due to unfinished features or bugs that are not yet resolved, without having to comment out the test code. It allows you to easily re-enable the test in the future by changing xtest back to test.

const testString = '![*bold* _italic_ ~strike~](https://example.com/image.png)';
const resultString = '<img src="https://example.com/image.png" alt="*bold* _italic_ ~strike~" />';
expect(parser.replace(testString)).toBe(resultString);
});
Comment on lines +1896 to +1903
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The interaction of image Markdown with other Markdown features (e.g., bold, italic) currently exhibits partial functionality issues. Specifically, content intended for the alt attribute in images is being incorrectly parsed from Markdown to HTML. For instance, given an input like ![*bold*](https://example/img.jpg), the output produced is <img src="https://example/img.jpg" alt="<strong>bold</strong>" />.
Although this parsing behavior functions as designed and doesn't seem to cause issues in NewDot, the expected outcome for the alt attribute should be either alt="*bold*" or a plain text alt="bold", without HTML tags. This highlights a known limitation in the current implementation, where Markdown syntax within image alt text does not retain its original or simplified plain text form.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created a fix for this in a separate PR: #661


test('Text containing image and autolink', () => {
const testString = 'An image of a banana: ![banana](https://example.com/banana.png) an autolink: example.com';
const resultString = 'An image of a banana: <img src="https://example.com/banana.png" alt="banana" /> an autolink: <a href="https://example.com" target="_blank" rel="noreferrer noopener">example.com</a>';
expect(parser.replace(testString)).toBe(resultString);
});

test('Image with raw data attributes', () => {
const testString = '![test](https://example.com/image.png)';
const resultString = '<img src="https://example.com/image.png" alt="test" data-raw-href="https://example.com/image.png" data-link-variant="labeled" />';
expect(parser.replace(testString, {shouldKeepRawInput: true})).toBe(resultString);
});

test('Image with invalid url should remain unchanged', () => {
const testString = '![test](invalid)';
expect(parser.replace(testString)).toBe(testString);
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a test for something like:

![banana](https://example.com/banana.png" onerror="alert('xss')")

Tbh not sure if xss like that is even remotely possible based on how the markdown parser works.

Copy link
Contributor Author

@kidroca kidroca Mar 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That particular example is converted into an anchor, presumably due to the use of brackets. However, I will add a simpler test to ensure that additional attributes are not captured.

Expected: "![banana](https://example.com/banana.png onerror=\"alert('xss')\")"
Received: "![banana](<a href=\"https://example.com/banana.png\" target=\"_blank\" rel=\"noreferrer noopener\">https://example.com/banana.png</a> onerror=&quot;alert(&#x27;xss&#x27;)&quot;)"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome thanks!


test('Trying to pass additional attributes should not create an <img>', () => {
const testString = '![test](https://example.com/image.png "title" class="image")';

// It seems the autolink rule is applied. We might need to update this test if the autolink rule is changed
// Ideally this test should return the same string as the input, or an <img> tag with the alt attribute set to
// "test" and no other attributes
const resultString = '![test](<a href=\"https://example.com/image.png\" target=\"_blank\" rel=\"noreferrer noopener\">https://example.com/image.png</a> &quot;title&quot; class=&quot;image&quot;)';
expect(parser.replace(testString)).toBe(resultString);
});

test('Trying to inject additional attributes should not work', () => {
const testString = '![test" onerror="alert(\'xss\')](https://example.com/image.png)';
const resultString = '<img src=\"https://example.com/image.png\" alt=\"test&quot; onerror=&quot;alert(&#x27;xss&#x27;)\" />';
expect(parser.replace(testString)).toBe(resultString);
});
Comment on lines +1922 to +1936
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the PR to include 2 more tests here, other updates are minor code style/formatting changes applied automatically

});
5 changes: 5 additions & 0 deletions __tests__/ExpensiMark-HTMLToText-test.js
Original file line number Diff line number Diff line change
Expand Up @@ -146,3 +146,8 @@ test('Mention html to text', () => {
testString = '<mention-user>@user@DOMAIN.com</mention-user>';
expect(parser.htmlToText(testString)).toBe('@user@DOMAIN.com');
});

test('Test replacement for <img> tags', () => {
const testString = '<img src="https://example.com/image.png" alt="Image description" />';
expect(parser.htmlToText(testString)).toBe('(image of: Image description)');
});
14 changes: 14 additions & 0 deletions __tests__/ExpensiMark-Markdown-test.js
Original file line number Diff line number Diff line change
Expand Up @@ -756,3 +756,17 @@ test('Mention html to markdown', () => {
testString = '<mention-user>@user@DOMAIN.com</mention-user>';
expect(parser.htmlToMarkdown(testString)).toBe('@user@DOMAIN.com');
});

describe('Image tag conversion to markdown', () => {
test('Image with alt attribute', () => {
const testString = '<img src="https://example.com/image.png" alt="image" />';
const resultString = '![image](https://example.com/image.png)';
expect(parser.htmlToMarkdown(testString)).toBe(resultString);
});

test('Image without alt attribute', () => {
const testString = '<img src="https://example.com/image.png" />';
const resultString = '![https://example.com/image.png](https://example.com/image.png)';
expect(parser.htmlToMarkdown(testString)).toBe(resultString);
});
});
42 changes: 34 additions & 8 deletions lib/ExpensiMark.js
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,9 @@
import {MARKDOWN_URL_REGEX, LOOSE_URL_REGEX, URL_REGEX} from './Url';
import {CONST} from './CONST';

const MARKDOWN_LINK_REGEX = new RegExp(`\\[([^\\][]*(?:\\[[^\\][]*][^\\][]*)*)]\\(${MARKDOWN_URL_REGEX}\\)(?![^<]*(<\\/pre>|<\\/code>))`, 'gi');
const MARKDOWN_LINK = `\\[([^\\][]*(?:\\[[^\\][]*][^\\][]*)*)]\\(${MARKDOWN_URL_REGEX}\\)(?![^<]*(<\\/pre>|<\\/code>))`;
const MARKDOWN_LINK_REGEX = new RegExp(MARKDOWN_LINK, 'gi');
const MARKDOWN_IMAGE_REGEX = new RegExp(`\\!${MARKDOWN_LINK}`, 'gi');

const SLACK_SPAN_NEW_LINE_TAG = '<span class="c-mrkdwn__br" data-stringify-type="paragraph-break" style="box-sizing: inherit; display: block; height: unset;"></span>';

Expand Down Expand Up @@ -67,8 +69,8 @@

/**
* Converts markdown style links to anchor tags e.g. [Expensify](concierge@expensify.com)
* We need to convert before the auto email link rule and the manual link rule since it will not try to create a link
* from an existing anchor tag.
* We need to convert before the auto email link rule and the manual link rule since it will not try to
* create a link from an existing anchor tag.
*/
{
name: 'email',
Expand Down Expand Up @@ -107,6 +109,18 @@
replacement: '<h1>$1</h1>',
},

/**
* Converts markdown style images to img tags e.g. ![Expensify](https://www.expensify.com/attachment.png)
* We need to convert before linking rules since they will not try to create a link from an existing img
* tag.
*/
{
name: 'image',
regex: MARKDOWN_IMAGE_REGEX,
replacement: (match, g1, g2) => `<img src="${Str.sanitizeURL(g2)}" alt="${g1}" />`,
rawInputReplacement: (match, g1, g2) => `<img src="${Str.sanitizeURL(g2)}" alt="${g1}" data-raw-href="${g2}" data-link-variant="labeled" />`
},

/**
* Converts markdown style links to anchor tags e.g. [Expensify](https://www.expensify.com)
* We need to convert before the autolink rule since it will not try to create a link
Expand All @@ -115,7 +129,6 @@
{
name: 'link',
process: (textToProcess, replacement) => this.modifyTextForUrlLinks(MARKDOWN_LINK_REGEX, textToProcess, replacement),

replacement: (match, g1, g2) => {
if (g1.match(CONST.REG_EXP.EMOJIS) || !g1.trim()) {
return match;
Expand All @@ -132,13 +145,14 @@

/**
* Apply the hereMention first because the string @here is still a valid mention for the userMention regex.
* This ensures that the hereMention is always considered first, even if it is followed by a valid userMention.
* This ensures that the hereMention is always considered first, even if it is followed by a valid
* userMention.
*
* Also, apply the mention rule after email/link to prevent mention appears in an email/link.
*/
{
name: 'hereMentions',
regex: /([a-zA-Z0-9.!$%&+/=?^`{|}_-]?)(@here)([.!$%&+/=?^`{|}_-]?)(?=\b)(?!([\w'#%+-]*@(?:[a-z\d-]+\.)+[a-z]{2,}(?:\s|$|@here))|((?:(?!<a).)+)?<\/a>|[^<]*(<\/pre>|<\/code>))/gm,

Check warning on line 155 in lib/ExpensiMark.js

View workflow job for this annotation

GitHub Actions / lint

This line has a length of 193. Maximum allowed is 190
replacement: (match, g1, g2, g3) => {
if (!Str.isValidMention(match)) {
return match;
Expand All @@ -149,10 +163,12 @@

/**
* This regex matches a valid user mention in a string.
* A user mention is a string that starts with the '@' symbol and is followed by a valid user's primary login
* A user mention is a string that starts with the '@' symbol and is followed by a valid user's primary
* login
*
* Note: currently we are only allowing mentions in a format of @+19728974297 (E.164 format phone number) and @username@example.com
* The username can contain any combination of alphanumeric letters, numbers, and underscores
* Note: currently we are only allowing mentions in a format of @+19728974297 (E.164 format phone number)
* and @username@example.com The username can contain any combination of alphanumeric letters, numbers, and
* underscores
*/
{
name: 'userMentions',
Expand Down Expand Up @@ -414,6 +430,11 @@
return `[${g4}](${email || g3})`;
},
},
{
name: 'image',
regex: /<img[^><]*src\s*=\s*(['"])(.*?)\1(?:[^><]*alt\s*=\s*(['"])(.*?)\3)?[^><]*>*(?![^<][\s\S]*?(<\/pre>|<\/code>))/gi,
replacement: (match, g1, g2, g3, g4) => `![${g4 || g2}](${g2})`
}
];

/**
Expand Down Expand Up @@ -452,6 +473,11 @@
regex: /<style>.*?<\/style>/gi,
replacement: '',
},
{
name: 'image',
regex: /<img[^><]*src\s*=\s*(['"])(.*?)\1(?:[^><]*alt\s*=\s*(['"])(.*?)\3)?[^><]*>*(?![^<][\s\S]*?(<\/pre>|<\/code>))/gi,
replacement: '(image of: $4)',
},
{
name: 'stripTag',
regex: /(<([^>]+)>)/gi,
Expand Down
Loading