Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix cubic ReDoS in fenced code and references #1130

Merged
merged 1 commit into from
May 7, 2021
Merged

Fix cubic ReDoS in fenced code and references #1130

merged 1 commit into from
May 7, 2021

Conversation

b-c-ds
Copy link
Contributor

@b-c-ds b-c-ds commented May 7, 2021

Two regular expressions were vulnerable to Regular Expression Denial of Service (ReDoS).

Crafted strings containing a long sequence of spaces could cause Denial of Service by making markdown take a long time to process.

This represents a vulnerability when untrusted user input is processed with the markdown package.

ReferencesProcessor:

class ReferenceProcessor(BlockProcessor):
""" Process link references. """
RE = re.compile(
r'^[ ]{0,3}\[([^\]]*)\]:[ ]*\n?[ ]*([^\s]+)[ ]*\n?[ ]*((["\'])(.*)\4|\((.*)\))?[ ]*$', re.MULTILINE
)

e.g.:

import markdown
markdown.markdown('[]:0' + ' ' * 4321 + '0')

FencedBlockPreprocessor (requires fenced_code extension):

FENCED_BLOCK_RE = re.compile(
dedent(r'''
(?P<fence>^(?:~{3,}|`{3,}))[ ]* # opening fence
((\{(?P<attrs>[^\}\n]*)\})?| # (optional {attrs} or
(\.?(?P<lang>[\w#.+-]*))?[ ]* # optional (.)lang
(hl_lines=(?P<quot>"|')(?P<hl_lines>.*?)(?P=quot))?) # optional hl_lines)
[ ]*\n # newline (end of opening fence)
(?P<code>.*?)(?<=\n) # the code block
(?P=fence)[ ]*$ # closing fence
'''),
re.MULTILINE | re.DOTALL | re.VERBOSE
)

e.g.:

import markdown
markdown.markdown('```' + ' ' * 4321, extensions=['fenced_code'])

Both regular expressions had cubic worst-case complexity, so doubling the number of spaces made processing take 8 times as long.
The cubic behaviour can be seen as follows:

$ time python -c "import markdown; markdown.markdown('[]:0' + ' ' * 1000 + '0')"
python -c "import markdown; markdown.markdown('[]:0' + ' ' * 1000 + '0')"  1.25s user 0.02s system 99% cpu 1.271 total
$ time python -c "import markdown; markdown.markdown('[]:0' + ' ' * 2000 + '0')"
python -c "import markdown; markdown.markdown('[]:0' + ' ' * 2000 + '0')"  9.01s user 0.02s system 99% cpu 9.040 total
$ time python -c "import markdown; markdown.markdown('[]:0' + ' ' * 4000 + '0')"
python -c "import markdown; markdown.markdown('[]:0' + ' ' * 4000 + '0')"  74.86s user 0.27s system 99% cpu 1:15.38 total

Both regexes had three [ ]* groups separated by optional groups, in effect making the regex [ ]*[ ]*[ ]*.

Discovered using doyensec/regexploit.

Two regular expressions were vulerable to Regular Expression Denial of
Service (ReDoS).

Crafted strings containing a long sequence of spaces could cause Denial
of Service by making markdown take a long time to process.
This represents a vulnerability when untrusted user input is processed
with the markdown package.

ReferencesProcessor:

https://github.com/Python-Markdown/markdown/blob/4acb949256adc535d6e6cd8/markdown/blockprocessors.py#L559-L563

e.g.:

```python
import markdown
markdown.markdown('[]:0' + ' ' * 4321 + '0')
```

FencedBlockPreprocessor (requires fenced_code extension):

https://github.com/Python-Markdown/markdown/blob/a11431539d08e14b0bd821c/markdown/extensions/fenced_code.py#L43-L54

e.g.:

```python
import markdown
markdown.markdown('```' + ' ' * 4321, extensions=['fenced_code'])
```

Both regular expressions had cubic worst-case complexity, so doubling
the number of spaces made processing take 8 times as long.
The cubic behaviour can be seen as follows:

```
$ time python -c "import markdown; markdown.markdown('[]:0' + ' ' * 1000 + '0')"
python -c "import markdown; markdown.markdown('[]:0' + ' ' * 1000 + '0')"  1.25s user 0.02s system 99% cpu 1.271 total
$ time python -c "import markdown; markdown.markdown('[]:0' + ' ' * 2000 + '0')"
python -c "import markdown; markdown.markdown('[]:0' + ' ' * 2000 + '0')"  9.01s user 0.02s system 99% cpu 9.040 total
$ time python -c "import markdown; markdown.markdown('[]:0' + ' ' * 4000 + '0')"
python -c "import markdown; markdown.markdown('[]:0' + ' ' * 4000 + '0')"  74.86s user 0.27s system 99% cpu 1:15.38 total
```

Both regexes had three `[ ]*` groups separated by optional groups, in
effect making the regex `[ ]*[ ]*[ ]*`.

Discovered using [regexploit](https://github.com/doyensec/regexploit).
@waylan waylan merged commit eacff47 into Python-Markdown:master May 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants