Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-34155: Dont parse domains containing @ #13079

Merged
merged 1 commit into from
Jul 17, 2019

Conversation

jpic
Copy link
Contributor

@jpic jpic commented May 3, 2019

Before:

    >>> email.message_from_string('From: a@malicious.org@important.com', policy=email.policy.default)['from'].addresses
    (Address(display_name='', username='a', domain='malicious.org'),)

    >>> parseaddr('a@malicious.org@important.com')
    ('', 'a@malicious.org')

After:

    >>> email.message_from_string('From: a@malicious.org@important.com', policy=email.policy.default)['from'].addresses
    (Address(display_name='', username='', domain=''),)

    >>> parseaddr('a@malicious.org@important.com')
    ('', 'a@')

https://bugs.python.org/issue34155

Automerge-Triggered-By: @warsaw

@jpic jpic requested a review from a team as a code owner May 3, 2019 21:27
@the-knights-who-say-ni
Copy link

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA).

Unfortunately we couldn't find an account corresponding to your GitHub username on bugs.python.org (b.p.o) to verify you have signed the CLA (this might be simply due to a missing "GitHub Name" entry in your b.p.o account settings). This is necessary for legal reasons before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

You can check yourself to see if the CLA has been received.

Thanks again for your contribution, we look forward to reviewing it!

@jpic jpic force-pushed the bpo-34155 branch 3 times, most recently from 41c6fe8 to 1b6269b Compare May 4, 2019 01:16
@jpic jpic changed the title bpo-34155: Check for conflicting atomends in parseaddr bpo-34155: Dont parse domains containing @ May 4, 2019
Copy link
Contributor

@maxking maxking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch!

I have some different ideas on how to do error recovery in this case, please see my comment on bpo-34155.

@@ -1559,6 +1559,8 @@ def get_domain(value):
token, value = get_dot_atom(value)
except errors.HeaderParseError:
token, value = get_atom(value)
if value and value[0] == '@':
raise errors.HeaderParseError('Multiple domains')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rename the error as Invalid Domain.

@@ -0,0 +1 @@
Don't parse email domain containing an at, ie. a@malicious.org@important.com
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest a slightly different wording for this:

Fix parsing of invalid email addresses with more than one ``@`` (e.g. a@b@c.com.) to not return the part before 2nd ``@`` as valid email address. Patch by jpic.

@jpic
Copy link
Contributor Author

jpic commented Jun 11, 2019

Pushed a couple of fixes in new commits in my branch, left a question for you.

Thank you very much for your review @maxking ! You're max kind ;)

@warsaw
Copy link
Member

warsaw commented Jul 2, 2019

As I mentioned in this comment on bpo I think that parseaddr('a@b@c') has to return ('', ''). To me, it's the only sane return value for illegal addresses.

@maxking
Copy link
Contributor

maxking commented Jul 2, 2019

I totally agree on the fact that we should return tuple of two empty strings on "failed to parse", whether on not we decide to be opinionated about what we fail to parse. In this case security issue should return the same value.

@jpic
Copy link
Contributor Author

jpic commented Jul 3, 2019

Understood, it was decided to deal with that later if it had to be dealt with at all anyway.

I have fixed my own tests, but other tests fail since i changed the implementation and i'm currently digging into it... I lost track and started over from master which means that some commit history is lost.

I could add maxking credit too in the news item and bpo commit message.

@jpic jpic force-pushed the bpo-34155 branch 3 times, most recently from 84ca1e1 to 85ec1a8 Compare July 3, 2019 05:42
@jpic
Copy link
Contributor Author

jpic commented Jul 3, 2019

Just one last question please, currently with this patch we get:

>>> parseaddr('a@b@c')                
('', '')

But when there is an invalid domain such as b., it doesn't return an empty string:

>>> parseaddr('a@b.')
('', 'a@b.')

Do you also want this patch to make parseaddr('a@b.') to return a tuple of empty strings ?

Or is it out of the scope of this patch ?

Or is a@b. not an illegal address at all ? I thought it was but I'm having a last minute doubt now (sorry about that)

Thanks a heap for your support ! Deeply appreciated ;)

@maxking
Copy link
Contributor

maxking commented Jul 3, 2019

Do you also want this patch to make parseaddr('a@b.') to return a tuple of empty strings ?
Or is it out of the scope of this patch ?

That seems to be a change of behaviour, even though it is going to align with what we say in docs. Perhaps for the the short-term, we should make the docs align a bit more with the reality.

In the long term, we can decide if we want to return empty string where the email address is technically invalid. As it stands today, "failed to parse" means there is no way to read the value at all, so that includes weird encodings, random chars etc. I am not sure if we do RFC level validation for what is and isn't a valid address (maybe we do, haven't read those parts of the code too closely, but that is my current impression) so it may or may not be tricky to weed out all invalid addresses.

That discussion can perhaps start on a new issue and follow the usual change-of-behavior process and should be fixed in a different patch. This being a security fix will go back to 3.6 as well, so I guess we shouldn't do that here.

Or is a@b. not an illegal address at all ? I thought it was but I'm having a last minute doubt now (sorry about that)

It is an invalid address AFAIK, I can't quote the right RFC page to corroborate that right now, will need some more time for that.

@jpic
Copy link
Contributor Author

jpic commented Jul 3, 2019

Thanks for your explanation. Please let me know if there's anything else you want me to do. If you want me to open the other issue I will start studying the corresponding contribution docs and try my best to match Python's standard expectations. Thanks for baring with me.

Copy link
Contributor

@maxking maxking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The patch looks good to me, just a very teeny tiny nitpick that I have added inline comment for.

@warsaw I think after the fix for the comment, this patch should be good to go!
Needs backport to 3.6, 3.7, 3.8 since this has security implications.

@maxking
Copy link
Contributor

maxking commented Jul 3, 2019

@jpic I think it would be good to create a new issue to discuss if we want to change behaviour of parseaddr to validate valid addresses.

@jpic
Copy link
Contributor Author

jpic commented Jul 3, 2019

There goes the new issue, turned out that domains ending with a dot are valid ! Good to know

@jpic
Copy link
Contributor Author

jpic commented Jul 4, 2019

Tests passed except MacOS job which failed on Azure, but it doesn't seem related to this patch.

2019-07-04-183254_1478x905_scrot

@jpic
Copy link
Contributor Author

jpic commented Jul 5, 2019

Do you also want PR against 3.6, 3.7, 3.8 since this has security implications ?

@maxking
Copy link
Contributor

maxking commented Jul 9, 2019

I am not sure about the failing tests, maybe we can rebase this against master and hope that it is fixed there? :)

I am currently travelling, so won't be able to help out with debugging much for this week.

And no, you don't need to manually create backports, I added the comment for whoever merges this to add right "needs backport to xxx" labels, which would then trigger the bot to create backport PRs.

@bedevere-bot
Copy link

GH-14824 is a backport of this pull request to the 3.8 branch.

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Jul 17, 2019
Before:

        >>> email.message_from_string('From: a@malicious.org@important.com', policy=email.policy.default)['from'].addresses
        (Address(display_name='', username='a', domain='malicious.org'),)

        >>> parseaddr('a@malicious.org@important.com')
        ('', 'a@malicious.org')

    After:

        >>> email.message_from_string('From: a@malicious.org@important.com', policy=email.policy.default)['from'].addresses
        (Address(display_name='', username='', domain=''),)

        >>> parseaddr('a@malicious.org@important.com')
        ('', 'a@')

https://bugs.python.org/issue34155
(cherry picked from commit 8cb65d1)

Co-authored-by: jpic <jpic@users.noreply.github.com>
@bedevere-bot
Copy link

GH-14825 is a backport of this pull request to the 3.7 branch.

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Jul 17, 2019
Before:

        >>> email.message_from_string('From: a@malicious.org@important.com', policy=email.policy.default)['from'].addresses
        (Address(display_name='', username='a', domain='malicious.org'),)

        >>> parseaddr('a@malicious.org@important.com')
        ('', 'a@malicious.org')

    After:

        >>> email.message_from_string('From: a@malicious.org@important.com', policy=email.policy.default)['from'].addresses
        (Address(display_name='', username='', domain=''),)

        >>> parseaddr('a@malicious.org@important.com')
        ('', 'a@')

https://bugs.python.org/issue34155
(cherry picked from commit 8cb65d1)

Co-authored-by: jpic <jpic@users.noreply.github.com>
@bedevere-bot
Copy link

GH-14826 is a backport of this pull request to the 3.6 branch.

@ned-deily
Copy link
Member

@warsaw, do you still want the backports of this PR to be merged? They are still awaiting review and merging.

@maxking
Copy link
Contributor

maxking commented Aug 9, 2019

@ned-deily It would be good to merge the backports for 3.6, I can merge the ones for 3.7 and 3.8.

miss-islington added a commit that referenced this pull request Aug 9, 2019
Before:

        >>> email.message_from_string('From: a@malicious.org@important.com', policy=email.policy.default)['from'].addresses
        (Address(display_name='', username='a', domain='malicious.org'),)

        >>> parseaddr('a@malicious.org@important.com')
        ('', 'a@malicious.org')

    After:

        >>> email.message_from_string('From: a@malicious.org@important.com', policy=email.policy.default)['from'].addresses
        (Address(display_name='', username='', domain=''),)

        >>> parseaddr('a@malicious.org@important.com')
        ('', 'a@')

https://bugs.python.org/issue34155
(cherry picked from commit 8cb65d1)

Co-authored-by: jpic <jpic@users.noreply.github.com>
miss-islington added a commit that referenced this pull request Aug 9, 2019
Before:

        >>> email.message_from_string('From: a@malicious.org@important.com', policy=email.policy.default)['from'].addresses
        (Address(display_name='', username='a', domain='malicious.org'),)

        >>> parseaddr('a@malicious.org@important.com')
        ('', 'a@malicious.org')

    After:

        >>> email.message_from_string('From: a@malicious.org@important.com', policy=email.policy.default)['from'].addresses
        (Address(display_name='', username='', domain=''),)

        >>> parseaddr('a@malicious.org@important.com')
        ('', 'a@')

https://bugs.python.org/issue34155
(cherry picked from commit 8cb65d1)

Co-authored-by: jpic <jpic@users.noreply.github.com>
ned-deily pushed a commit that referenced this pull request Aug 9, 2019
Before:

        >>> email.message_from_string('From: a@malicious.org@important.com', policy=email.policy.default)['from'].addresses
        (Address(display_name='', username='a', domain='malicious.org'),)

        >>> parseaddr('a@malicious.org@important.com')
        ('', 'a@malicious.org')

    After:

        >>> email.message_from_string('From: a@malicious.org@important.com', policy=email.policy.default)['from'].addresses
        (Address(display_name='', username='', domain=''),)

        >>> parseaddr('a@malicious.org@important.com')
        ('', 'a@')

https://bugs.python.org/issue34155
(cherry picked from commit 8cb65d1)

Co-authored-by: jpic <jpic@users.noreply.github.com>
@miss-islington
Copy link
Contributor

Thanks @jpic for the PR 🌮🎉.. I'm working now to backport this PR to: 2.7.
🐍🍒⛏🤖

@miss-islington
Copy link
Contributor

I'm having trouble backporting to 2.7. Reason: 'Error 110 while writing to socket. Connection timed out.'. Please retry by removing and re-adding the needs backport to 2.7 label.

@miss-islington
Copy link
Contributor

Thanks @jpic for the PR 🌮🎉.. I'm working now to backport this PR to: 2.7.
🐍🍒⛏🤖

@miss-islington
Copy link
Contributor

Sorry, @jpic, I could not cleanly backport this to 2.7 due to a conflict.
Please backport using cherry_picker on command line.
cherry_picker 8cb65d1381b027f0b09ee36bfed7f35bb4dec9a9 2.7

@miss-islington miss-islington self-assigned this Aug 15, 2019
maxking pushed a commit to maxking/cpython-1 that referenced this pull request Aug 17, 2019
Before:

        >>> email.message_from_string('From: a@malicious.org@important.com', policy=email.policy.default)['from'].addresses
        (Address(display_name='', username='a', domain='malicious.org'),)

        >>> parseaddr('a@malicious.org@important.com')
        ('', 'a@malicious.org')

    After:

        >>> email.message_from_string('From: a@malicious.org@important.com', policy=email.policy.default)['from'].addresses
        (Address(display_name='', username='', domain=''),)

        >>> parseaddr('a@malicious.org@important.com')
        ('', 'a@')

https://bugs.python.org/issue34155
(cherry picked from commit 8cb65d1)

Co-authored-by: jpic <jpic@users.noreply.github.com>
larryhastings pushed a commit that referenced this pull request Sep 7, 2019
https://bugs.python.org/issue34155
(cherry picked from commit 8cb65d1)

Co-authored-by: jpic <jpic@users.noreply.github.com>
lisroach pushed a commit to lisroach/cpython that referenced this pull request Sep 10, 2019
Before:
    
        >>> email.message_from_string('From: a@malicious.org@important.com', policy=email.policy.default)['from'].addresses
        (Address(display_name='', username='a', domain='malicious.org'),)
    
        >>> parseaddr('a@malicious.org@important.com')
        ('', 'a@malicious.org')
    
    After:
    
        >>> email.message_from_string('From: a@malicious.org@important.com', policy=email.policy.default)['from'].addresses
        (Address(display_name='', username='', domain=''),)
    
        >>> parseaddr('a@malicious.org@important.com')
        ('', 'a@')




https://bugs.python.org/issue34155
@bedevere-bot
Copy link

GH-16006 is a backport of this pull request to the 2.7 branch.

miss-islington pushed a commit that referenced this pull request Sep 14, 2019
This change skips parsing of email addresses where domains include a "@" character, which can be maliciously used since the local part is returned as a complete address. 

(cherry picked from commit 8cb65d1)

Excludes changes to Lib/email/_header_value_parser.py, which did not
exist in 2.7.

Co-authored-by: jpic <jpic@users.noreply.github.com>


https://bugs.python.org/issue34155
DinoV pushed a commit to DinoV/cpython that referenced this pull request Jan 14, 2020
Before:
    
        >>> email.message_from_string('From: a@malicious.org@important.com', policy=email.policy.default)['from'].addresses
        (Address(display_name='', username='a', domain='malicious.org'),)
    
        >>> parseaddr('a@malicious.org@important.com')
        ('', 'a@malicious.org')
    
    After:
    
        >>> email.message_from_string('From: a@malicious.org@important.com', policy=email.policy.default)['from'].addresses
        (Address(display_name='', username='', domain=''),)
    
        >>> parseaddr('a@malicious.org@important.com')
        ('', 'a@')




https://bugs.python.org/issue34155
websurfer5 pushed a commit to websurfer5/cpython that referenced this pull request Jul 20, 2020
Before:
    
        >>> email.message_from_string('From: a@malicious.org@important.com', policy=email.policy.default)['from'].addresses
        (Address(display_name='', username='a', domain='malicious.org'),)
    
        >>> parseaddr('a@malicious.org@important.com')
        ('', 'a@malicious.org')
    
    After:
    
        >>> email.message_from_string('From: a@malicious.org@important.com', policy=email.policy.default)['from'].addresses
        (Address(display_name='', username='', domain=''),)
    
        >>> parseaddr('a@malicious.org@important.com')
        ('', 'a@')




https://bugs.python.org/issue34155
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-security A security issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants