bpo-34155: Dont parse domains containing @ #13079

jpic · 2019-05-03T21:27:42Z

Before:

    >>> email.message_from_string('From: a@malicious.org@important.com', policy=email.policy.default)['from'].addresses
    (Address(display_name='', username='a', domain='malicious.org'),)

    >>> parseaddr('a@malicious.org@important.com')
    ('', 'a@malicious.org')

After:

    >>> email.message_from_string('From: a@malicious.org@important.com', policy=email.policy.default)['from'].addresses
    (Address(display_name='', username='', domain=''),)

    >>> parseaddr('a@malicious.org@important.com')
    ('', 'a@')

https://bugs.python.org/issue34155

Automerge-Triggered-By: @warsaw

the-knights-who-say-ni · 2019-05-03T21:27:45Z

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA).

Unfortunately we couldn't find an account corresponding to your GitHub username on bugs.python.org (b.p.o) to verify you have signed the CLA (this might be simply due to a missing "GitHub Name" entry in your b.p.o account settings). This is necessary for legal reasons before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

You can check yourself to see if the CLA has been received.

Thanks again for your contribution, we look forward to reviewing it!

maxking

Thanks for the patch!

I have some different ideas on how to do error recovery in this case, please see my comment on bpo-34155.

maxking · 2019-05-31T05:51:55Z

Lib/email/_header_value_parser.py

@@ -1559,6 +1559,8 @@ def get_domain(value):
        token, value = get_dot_atom(value)
    except errors.HeaderParseError:
        token, value = get_atom(value)
+    if value and value[0] == '@':
+        raise errors.HeaderParseError('Multiple domains')


I'd rename the error as Invalid Domain.

Lib/test/test_email/test_email.py

maxking · 2019-05-31T06:28:56Z

Misc/NEWS.d/next/Security/2019-05-04-13-33-37.bpo-34155.MJll68.rst

@@ -0,0 +1 @@
+Don't parse email domain containing an at, ie. a@malicious.org@important.com


I would suggest a slightly different wording for this:

Fix parsing of invalid email addresses with more than one ``@`` (e.g. a@b@c.com.) to not return the part before 2nd ``@`` as valid email address. Patch by jpic.

jpic · 2019-06-11T13:25:13Z

Pushed a couple of fixes in new commits in my branch, left a question for you.

Thank you very much for your review @maxking ! You're max kind ;)

warsaw · 2019-07-02T21:44:28Z

As I mentioned in this comment on bpo I think that parseaddr('a@b@c') has to return ('', ''). To me, it's the only sane return value for illegal addresses.

maxking · 2019-07-02T22:11:00Z

I totally agree on the fact that we should return tuple of two empty strings on "failed to parse", whether on not we decide to be opinionated about what we fail to parse. In this case security issue should return the same value.

jpic · 2019-07-03T04:58:50Z

Understood, it was decided to deal with that later if it had to be dealt with at all anyway.

I have fixed my own tests, but other tests fail since i changed the implementation and i'm currently digging into it... I lost track and started over from master which means that some commit history is lost.

I could add maxking credit too in the news item and bpo commit message.

jpic · 2019-07-03T06:45:54Z

Just one last question please, currently with this patch we get:

>>> parseaddr('a@b@c')                
('', '')

But when there is an invalid domain such as b., it doesn't return an empty string:

>>> parseaddr('a@b.')
('', 'a@b.')

Do you also want this patch to make parseaddr('a@b.') to return a tuple of empty strings ?

Or is it out of the scope of this patch ?

Or is a@b. not an illegal address at all ? I thought it was but I'm having a last minute doubt now (sorry about that)

Thanks a heap for your support ! Deeply appreciated ;)

maxking · 2019-07-03T07:33:45Z

Do you also want this patch to make parseaddr('a@b.') to return a tuple of empty strings ?
Or is it out of the scope of this patch ?

That seems to be a change of behaviour, even though it is going to align with what we say in docs. Perhaps for the the short-term, we should make the docs align a bit more with the reality.

In the long term, we can decide if we want to return empty string where the email address is technically invalid. As it stands today, "failed to parse" means there is no way to read the value at all, so that includes weird encodings, random chars etc. I am not sure if we do RFC level validation for what is and isn't a valid address (maybe we do, haven't read those parts of the code too closely, but that is my current impression) so it may or may not be tricky to weed out all invalid addresses.

That discussion can perhaps start on a new issue and follow the usual change-of-behavior process and should be fixed in a different patch. This being a security fix will go back to 3.6 as well, so I guess we shouldn't do that here.

Or is a@b. not an illegal address at all ? I thought it was but I'm having a last minute doubt now (sorry about that)

It is an invalid address AFAIK, I can't quote the right RFC page to corroborate that right now, will need some more time for that.

jpic · 2019-07-03T07:55:49Z

Thanks for your explanation. Please let me know if there's anything else you want me to do. If you want me to open the other issue I will start studying the corresponding contribution docs and try my best to match Python's standard expectations. Thanks for baring with me.

maxking

The patch looks good to me, just a very teeny tiny nitpick that I have added inline comment for.

@warsaw I think after the fix for the comment, this patch should be good to go!
Needs backport to 3.6, 3.7, 3.8 since this has security implications.

Lib/email/_parseaddr.py

maxking · 2019-07-03T08:11:15Z

@jpic I think it would be good to create a new issue to discuss if we want to change behaviour of parseaddr to validate valid addresses.

jpic · 2019-07-03T10:13:16Z

There goes the new issue, turned out that domains ending with a dot are valid ! Good to know

jpic · 2019-07-04T16:39:24Z

Tests passed except MacOS job which failed on Azure, but it doesn't seem related to this patch.

jpic · 2019-07-05T22:11:24Z

Do you also want PR against 3.6, 3.7, 3.8 since this has security implications ?

maxking · 2019-07-09T09:04:18Z

I am not sure about the failing tests, maybe we can rebase this against master and hope that it is fixed there? :)

I am currently travelling, so won't be able to help out with debugging much for this week.

And no, you don't need to manually create backports, I added the comment for whoever merges this to add right "needs backport to xxx" labels, which would then trigger the bot to create backport PRs.

bedevere-bot · 2019-07-17T21:54:42Z

GH-14824 is a backport of this pull request to the 3.8 branch.

Before: >>> email.message_from_string('From: a@malicious.org@important.com', policy=email.policy.default)['from'].addresses (Address(display_name='', username='a', domain='malicious.org'),) >>> parseaddr('a@malicious.org@important.com') ('', 'a@malicious.org') After: >>> email.message_from_string('From: a@malicious.org@important.com', policy=email.policy.default)['from'].addresses (Address(display_name='', username='', domain=''),) >>> parseaddr('a@malicious.org@important.com') ('', 'a@') https://bugs.python.org/issue34155 (cherry picked from commit 8cb65d1) Co-authored-by: jpic <jpic@users.noreply.github.com>

bedevere-bot · 2019-07-17T21:54:49Z

GH-14825 is a backport of this pull request to the 3.7 branch.

Before: >>> email.message_from_string('From: a@malicious.org@important.com', policy=email.policy.default)['from'].addresses (Address(display_name='', username='a', domain='malicious.org'),) >>> parseaddr('a@malicious.org@important.com') ('', 'a@malicious.org') After: >>> email.message_from_string('From: a@malicious.org@important.com', policy=email.policy.default)['from'].addresses (Address(display_name='', username='', domain=''),) >>> parseaddr('a@malicious.org@important.com') ('', 'a@') https://bugs.python.org/issue34155 (cherry picked from commit 8cb65d1) Co-authored-by: jpic <jpic@users.noreply.github.com>

bedevere-bot · 2019-07-17T21:54:56Z

GH-14826 is a backport of this pull request to the 3.6 branch.

ned-deily · 2019-08-01T16:41:31Z

@warsaw, do you still want the backports of this PR to be merged? They are still awaiting review and merging.

maxking · 2019-08-09T08:28:55Z

@ned-deily It would be good to merge the backports for 3.6, I can merge the ones for 3.7 and 3.8.

Before: >>> email.message_from_string('From: a@malicious.org@important.com', policy=email.policy.default)['from'].addresses (Address(display_name='', username='a', domain='malicious.org'),) >>> parseaddr('a@malicious.org@important.com') ('', 'a@malicious.org') After: >>> email.message_from_string('From: a@malicious.org@important.com', policy=email.policy.default)['from'].addresses (Address(display_name='', username='', domain=''),) >>> parseaddr('a@malicious.org@important.com') ('', 'a@') https://bugs.python.org/issue34155 (cherry picked from commit 8cb65d1) Co-authored-by: jpic <jpic@users.noreply.github.com>

miss-islington · 2019-08-15T19:09:20Z

Thanks @jpic for the PR 🌮🎉.. I'm working now to backport this PR to: 2.7.
🐍🍒⛏🤖

miss-islington · 2019-08-15T19:09:21Z

I'm having trouble backporting to 2.7. Reason: 'Error 110 while writing to socket. Connection timed out.'. Please retry by removing and re-adding the needs backport to 2.7 label.

miss-islington · 2019-08-15T19:11:23Z

Thanks @jpic for the PR 🌮🎉.. I'm working now to backport this PR to: 2.7.
🐍🍒⛏🤖

miss-islington · 2019-08-15T19:11:31Z

Sorry, @jpic, I could not cleanly backport this to 2.7 due to a conflict.
Please backport using cherry_picker on command line.
cherry_picker 8cb65d1381b027f0b09ee36bfed7f35bb4dec9a9 2.7

Before: >>> email.message_from_string('From: a@malicious.org@important.com', policy=email.policy.default)['from'].addresses (Address(display_name='', username='a', domain='malicious.org'),) >>> parseaddr('a@malicious.org@important.com') ('', 'a@malicious.org') After: >>> email.message_from_string('From: a@malicious.org@important.com', policy=email.policy.default)['from'].addresses (Address(display_name='', username='', domain=''),) >>> parseaddr('a@malicious.org@important.com') ('', 'a@') https://bugs.python.org/issue34155 (cherry picked from commit 8cb65d1) Co-authored-by: jpic <jpic@users.noreply.github.com>

https://bugs.python.org/issue34155 (cherry picked from commit 8cb65d1) Co-authored-by: jpic <jpic@users.noreply.github.com>

Before: >>> email.message_from_string('From: a@malicious.org@important.com', policy=email.policy.default)['from'].addresses (Address(display_name='', username='a', domain='malicious.org'),) >>> parseaddr('a@malicious.org@important.com') ('', 'a@malicious.org') After: >>> email.message_from_string('From: a@malicious.org@important.com', policy=email.policy.default)['from'].addresses (Address(display_name='', username='', domain=''),) >>> parseaddr('a@malicious.org@important.com') ('', 'a@') https://bugs.python.org/issue34155

bedevere-bot · 2019-09-11T21:10:37Z

GH-16006 is a backport of this pull request to the 2.7 branch.

This change skips parsing of email addresses where domains include a "@" character, which can be maliciously used since the local part is returned as a complete address. (cherry picked from commit 8cb65d1) Excludes changes to Lib/email/_header_value_parser.py, which did not exist in 2.7. Co-authored-by: jpic <jpic@users.noreply.github.com> https://bugs.python.org/issue34155

Before: >>> email.message_from_string('From: a@malicious.org@important.com', policy=email.policy.default)['from'].addresses (Address(display_name='', username='a', domain='malicious.org'),) >>> parseaddr('a@malicious.org@important.com') ('', 'a@malicious.org') After: >>> email.message_from_string('From: a@malicious.org@important.com', policy=email.policy.default)['from'].addresses (Address(display_name='', username='', domain=''),) >>> parseaddr('a@malicious.org@important.com') ('', 'a@') https://bugs.python.org/issue34155

jpic requested a review from a team as a code owner May 3, 2019 21:27

the-knights-who-say-ni added the CLA not signed label May 3, 2019

bedevere-bot added the awaiting review label May 3, 2019

the-knights-who-say-ni added CLA signed and removed CLA not signed labels May 4, 2019

jpic force-pushed the bpo-34155 branch 3 times, most recently from 41c6fe8 to 1b6269b Compare May 4, 2019 01:16

jpic changed the title ~~bpo-34155: Check for conflicting atomends in parseaddr~~ bpo-34155: Dont parse domains containing @ May 4, 2019

jpic force-pushed the bpo-34155 branch from 1b6269b to 41287e8 Compare May 4, 2019 01:32

maxking reviewed May 31, 2019

View reviewed changes

jpic force-pushed the bpo-34155 branch from 6d00315 to 07e5094 Compare June 25, 2019 19:26

jpic force-pushed the bpo-34155 branch 3 times, most recently from 84ca1e1 to 85ec1a8 Compare July 3, 2019 05:42

maxking reviewed Jul 3, 2019

View reviewed changes

Lib/email/_parseaddr.py Outdated Show resolved Hide resolved

jpic force-pushed the bpo-34155 branch from 85ec1a8 to 672f228 Compare July 4, 2019 11:11

bedevere-bot removed the needs backport to 3.7 label Jul 17, 2019

bedevere-bot removed the needs backport to 3.6 label Jul 17, 2019

maxking added the needs backport to 2.7 label Aug 15, 2019

maxking added needs backport to 2.7 and removed needs backport to 2.7 labels Aug 15, 2019

miss-islington self-assigned this Aug 15, 2019

larryhastings pushed a commit that referenced this pull request Sep 7, 2019

[3.5] bpo-34155: Dont parse domains containing @ (GH-13079) (#15317)

063eba2

https://bugs.python.org/issue34155 (cherry picked from commit 8cb65d1) Co-authored-by: jpic <jpic@users.noreply.github.com>

bedevere-bot removed the needs backport to 2.7 label Sep 11, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bpo-34155: Dont parse domains containing @ #13079

bpo-34155: Dont parse domains containing @ #13079

jpic commented May 3, 2019 •

edited by miss-islington

Loading

the-knights-who-say-ni commented May 3, 2019

maxking left a comment

maxking May 31, 2019

maxking May 31, 2019

jpic commented Jun 11, 2019

warsaw commented Jul 2, 2019

maxking commented Jul 2, 2019 •

edited

Loading

jpic commented Jul 3, 2019 •

edited

Loading

jpic commented Jul 3, 2019 •

edited

Loading

maxking commented Jul 3, 2019

jpic commented Jul 3, 2019 •

edited

Loading

maxking left a comment •

edited

Loading

maxking commented Jul 3, 2019

jpic commented Jul 3, 2019 •

edited

Loading

jpic commented Jul 4, 2019

jpic commented Jul 5, 2019

maxking commented Jul 9, 2019

bedevere-bot commented Jul 17, 2019

bedevere-bot commented Jul 17, 2019

bedevere-bot commented Jul 17, 2019

ned-deily commented Aug 1, 2019

maxking commented Aug 9, 2019

miss-islington commented Aug 15, 2019

miss-islington commented Aug 15, 2019

miss-islington commented Aug 15, 2019

miss-islington commented Aug 15, 2019

bedevere-bot commented Sep 11, 2019

		@@ -0,0 +1 @@
		Don't parse email domain containing an at, ie. a@malicious.org@important.com

bpo-34155: Dont parse domains containing @ #13079

bpo-34155: Dont parse domains containing @ #13079

Conversation

jpic commented May 3, 2019 • edited by miss-islington Loading

the-knights-who-say-ni commented May 3, 2019

maxking left a comment

Choose a reason for hiding this comment

maxking May 31, 2019

Choose a reason for hiding this comment

maxking May 31, 2019

Choose a reason for hiding this comment

jpic commented Jun 11, 2019

warsaw commented Jul 2, 2019

maxking commented Jul 2, 2019 • edited Loading

jpic commented Jul 3, 2019 • edited Loading

jpic commented Jul 3, 2019 • edited Loading

maxking commented Jul 3, 2019

jpic commented Jul 3, 2019 • edited Loading

maxking left a comment • edited Loading

Choose a reason for hiding this comment

maxking commented Jul 3, 2019

jpic commented Jul 3, 2019 • edited Loading

jpic commented Jul 4, 2019

jpic commented Jul 5, 2019

maxking commented Jul 9, 2019

bedevere-bot commented Jul 17, 2019

bedevere-bot commented Jul 17, 2019

bedevere-bot commented Jul 17, 2019

ned-deily commented Aug 1, 2019

maxking commented Aug 9, 2019

miss-islington commented Aug 15, 2019

miss-islington commented Aug 15, 2019

miss-islington commented Aug 15, 2019

miss-islington commented Aug 15, 2019

bedevere-bot commented Sep 11, 2019

jpic commented May 3, 2019 •

edited by miss-islington

Loading

maxking commented Jul 2, 2019 •

edited

Loading

jpic commented Jul 3, 2019 •

edited

Loading

jpic commented Jul 3, 2019 •

edited

Loading

jpic commented Jul 3, 2019 •

edited

Loading

maxking left a comment •

edited

Loading

jpic commented Jul 3, 2019 •

edited

Loading