-
-
Notifications
You must be signed in to change notification settings - Fork 31.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-34155: Dont parse domains containing @ #13079
Conversation
Hello, and thanks for your contribution! I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA). Unfortunately we couldn't find an account corresponding to your GitHub username on bugs.python.org (b.p.o) to verify you have signed the CLA (this might be simply due to a missing "GitHub Name" entry in your b.p.o account settings). This is necessary for legal reasons before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue. You can check yourself to see if the CLA has been received. Thanks again for your contribution, we look forward to reviewing it! |
41c6fe8
to
1b6269b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the patch!
I have some different ideas on how to do error recovery in this case, please see my comment on bpo-34155.
Lib/email/_header_value_parser.py
Outdated
@@ -1559,6 +1559,8 @@ def get_domain(value): | |||
token, value = get_dot_atom(value) | |||
except errors.HeaderParseError: | |||
token, value = get_atom(value) | |||
if value and value[0] == '@': | |||
raise errors.HeaderParseError('Multiple domains') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rename the error as Invalid Domain
.
@@ -0,0 +1 @@ | |||
Don't parse email domain containing an at, ie. a@malicious.org@important.com |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest a slightly different wording for this:
Fix parsing of invalid email addresses with more than one ``@`` (e.g. a@b@c.com.) to not return the part before 2nd ``@`` as valid email address. Patch by jpic.
Pushed a couple of fixes in new commits in my branch, left a question for you. Thank you very much for your review @maxking ! You're max kind ;) |
As I mentioned in this comment on bpo I think that |
I totally agree on the fact that we should return tuple of two empty strings on "failed to parse", whether on not we decide to be opinionated about what we fail to parse. In this case security issue should return the same value. |
Understood, it was decided to deal with that later if it had to be dealt with at all anyway. I have fixed my own tests, but other tests fail since i changed the implementation and i'm currently digging into it... I lost track and started over from master which means that some commit history is lost. I could add maxking credit too in the news item and bpo commit message. |
84ca1e1
to
85ec1a8
Compare
Just one last question please, currently with this patch we get:
But when there is an invalid domain such as
Do you also want this patch to make Or is it out of the scope of this patch ? Or is Thanks a heap for your support ! Deeply appreciated ;) |
That seems to be a change of behaviour, even though it is going to align with what we say in docs. Perhaps for the the short-term, we should make the docs align a bit more with the reality. In the long term, we can decide if we want to return empty string where the email address is technically invalid. As it stands today, "failed to parse" means there is no way to read the value at all, so that includes weird encodings, random chars etc. I am not sure if we do RFC level validation for what is and isn't a valid address (maybe we do, haven't read those parts of the code too closely, but that is my current impression) so it may or may not be tricky to weed out all invalid addresses. That discussion can perhaps start on a new issue and follow the usual change-of-behavior process and should be fixed in a different patch. This being a security fix will go back to 3.6 as well, so I guess we shouldn't do that here.
It is an invalid address AFAIK, I can't quote the right RFC page to corroborate that right now, will need some more time for that. |
Thanks for your explanation. Please let me know if there's anything else you want me to do. If you want me to open the other issue I will start studying the corresponding contribution docs and try my best to match Python's standard expectations. Thanks for baring with me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The patch looks good to me, just a very teeny tiny nitpick that I have added inline comment for.
@warsaw I think after the fix for the comment, this patch should be good to go!
Needs backport to 3.6, 3.7, 3.8 since this has security implications.
@jpic I think it would be good to create a new issue to discuss if we want to change behaviour of |
There goes the new issue, turned out that domains ending with a dot are valid ! Good to know |
Do you also want PR against 3.6, 3.7, 3.8 since this has security implications ? |
I am not sure about the failing tests, maybe we can rebase this against master and hope that it is fixed there? :) I am currently travelling, so won't be able to help out with debugging much for this week. And no, you don't need to manually create backports, I added the comment for whoever merges this to add right "needs backport to xxx" labels, which would then trigger the bot to create backport PRs. |
GH-14824 is a backport of this pull request to the 3.8 branch. |
Before: >>> email.message_from_string('From: a@malicious.org@important.com', policy=email.policy.default)['from'].addresses (Address(display_name='', username='a', domain='malicious.org'),) >>> parseaddr('a@malicious.org@important.com') ('', 'a@malicious.org') After: >>> email.message_from_string('From: a@malicious.org@important.com', policy=email.policy.default)['from'].addresses (Address(display_name='', username='', domain=''),) >>> parseaddr('a@malicious.org@important.com') ('', 'a@') https://bugs.python.org/issue34155 (cherry picked from commit 8cb65d1) Co-authored-by: jpic <jpic@users.noreply.github.com>
GH-14825 is a backport of this pull request to the 3.7 branch. |
Before: >>> email.message_from_string('From: a@malicious.org@important.com', policy=email.policy.default)['from'].addresses (Address(display_name='', username='a', domain='malicious.org'),) >>> parseaddr('a@malicious.org@important.com') ('', 'a@malicious.org') After: >>> email.message_from_string('From: a@malicious.org@important.com', policy=email.policy.default)['from'].addresses (Address(display_name='', username='', domain=''),) >>> parseaddr('a@malicious.org@important.com') ('', 'a@') https://bugs.python.org/issue34155 (cherry picked from commit 8cb65d1) Co-authored-by: jpic <jpic@users.noreply.github.com>
GH-14826 is a backport of this pull request to the 3.6 branch. |
@warsaw, do you still want the backports of this PR to be merged? They are still awaiting review and merging. |
@ned-deily It would be good to merge the backports for 3.6, I can merge the ones for 3.7 and 3.8. |
Before: >>> email.message_from_string('From: a@malicious.org@important.com', policy=email.policy.default)['from'].addresses (Address(display_name='', username='a', domain='malicious.org'),) >>> parseaddr('a@malicious.org@important.com') ('', 'a@malicious.org') After: >>> email.message_from_string('From: a@malicious.org@important.com', policy=email.policy.default)['from'].addresses (Address(display_name='', username='', domain=''),) >>> parseaddr('a@malicious.org@important.com') ('', 'a@') https://bugs.python.org/issue34155 (cherry picked from commit 8cb65d1) Co-authored-by: jpic <jpic@users.noreply.github.com>
Before: >>> email.message_from_string('From: a@malicious.org@important.com', policy=email.policy.default)['from'].addresses (Address(display_name='', username='a', domain='malicious.org'),) >>> parseaddr('a@malicious.org@important.com') ('', 'a@malicious.org') After: >>> email.message_from_string('From: a@malicious.org@important.com', policy=email.policy.default)['from'].addresses (Address(display_name='', username='', domain=''),) >>> parseaddr('a@malicious.org@important.com') ('', 'a@') https://bugs.python.org/issue34155 (cherry picked from commit 8cb65d1) Co-authored-by: jpic <jpic@users.noreply.github.com>
Before: >>> email.message_from_string('From: a@malicious.org@important.com', policy=email.policy.default)['from'].addresses (Address(display_name='', username='a', domain='malicious.org'),) >>> parseaddr('a@malicious.org@important.com') ('', 'a@malicious.org') After: >>> email.message_from_string('From: a@malicious.org@important.com', policy=email.policy.default)['from'].addresses (Address(display_name='', username='', domain=''),) >>> parseaddr('a@malicious.org@important.com') ('', 'a@') https://bugs.python.org/issue34155 (cherry picked from commit 8cb65d1) Co-authored-by: jpic <jpic@users.noreply.github.com>
Thanks @jpic for the PR 🌮🎉.. I'm working now to backport this PR to: 2.7. |
I'm having trouble backporting to |
Thanks @jpic for the PR 🌮🎉.. I'm working now to backport this PR to: 2.7. |
Sorry, @jpic, I could not cleanly backport this to |
Before: >>> email.message_from_string('From: a@malicious.org@important.com', policy=email.policy.default)['from'].addresses (Address(display_name='', username='a', domain='malicious.org'),) >>> parseaddr('a@malicious.org@important.com') ('', 'a@malicious.org') After: >>> email.message_from_string('From: a@malicious.org@important.com', policy=email.policy.default)['from'].addresses (Address(display_name='', username='', domain=''),) >>> parseaddr('a@malicious.org@important.com') ('', 'a@') https://bugs.python.org/issue34155 (cherry picked from commit 8cb65d1) Co-authored-by: jpic <jpic@users.noreply.github.com>
https://bugs.python.org/issue34155 (cherry picked from commit 8cb65d1) Co-authored-by: jpic <jpic@users.noreply.github.com>
Before: >>> email.message_from_string('From: a@malicious.org@important.com', policy=email.policy.default)['from'].addresses (Address(display_name='', username='a', domain='malicious.org'),) >>> parseaddr('a@malicious.org@important.com') ('', 'a@malicious.org') After: >>> email.message_from_string('From: a@malicious.org@important.com', policy=email.policy.default)['from'].addresses (Address(display_name='', username='', domain=''),) >>> parseaddr('a@malicious.org@important.com') ('', 'a@') https://bugs.python.org/issue34155
GH-16006 is a backport of this pull request to the 2.7 branch. |
This change skips parsing of email addresses where domains include a "@" character, which can be maliciously used since the local part is returned as a complete address. (cherry picked from commit 8cb65d1) Excludes changes to Lib/email/_header_value_parser.py, which did not exist in 2.7. Co-authored-by: jpic <jpic@users.noreply.github.com> https://bugs.python.org/issue34155
Before: >>> email.message_from_string('From: a@malicious.org@important.com', policy=email.policy.default)['from'].addresses (Address(display_name='', username='a', domain='malicious.org'),) >>> parseaddr('a@malicious.org@important.com') ('', 'a@malicious.org') After: >>> email.message_from_string('From: a@malicious.org@important.com', policy=email.policy.default)['from'].addresses (Address(display_name='', username='', domain=''),) >>> parseaddr('a@malicious.org@important.com') ('', 'a@') https://bugs.python.org/issue34155
Before: >>> email.message_from_string('From: a@malicious.org@important.com', policy=email.policy.default)['from'].addresses (Address(display_name='', username='a', domain='malicious.org'),) >>> parseaddr('a@malicious.org@important.com') ('', 'a@malicious.org') After: >>> email.message_from_string('From: a@malicious.org@important.com', policy=email.policy.default)['from'].addresses (Address(display_name='', username='', domain=''),) >>> parseaddr('a@malicious.org@important.com') ('', 'a@') https://bugs.python.org/issue34155
https://bugs.python.org/issue34155
Automerge-Triggered-By: @warsaw