Skip to content

Store strings as &[u8] instead of &str #63

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 23, 2019
Merged

Conversation

link2xt
Copy link
Contributor

@link2xt link2xt commented Nov 21, 2019

An attempt to fix #62

@djc
Copy link
Owner

djc commented Nov 21, 2019

Yes, this is straightforward, however it is a setback in usability. @jonhoo, any thoughts?

Strings can be literal and are not guaranteed to be valid UTF-8 strings.
It should not cause parsing errors.
@link2xt link2xt marked this pull request as ready for review November 21, 2019 12:52
@link2xt
Copy link
Contributor Author

link2xt commented Nov 21, 2019

Currently it causes Delta Chat to stop receiving messages due to spam messages with invalid-UTF8 subjects. Users can convert using from_utf8_lossy if needed, I don't think it is a major usability problem. &[u8] seems to be the only way to keep parser zero-copy.

@jonhoo
Copy link
Contributor

jonhoo commented Nov 21, 2019

Hmm, I'm curious how often invalid UTF-8 encodings actually come up? I'm not really opposed to the change, it's more that I wish it wasn't necessary. rust-imap can always do the decoding on its side I suppose.

@djc
Copy link
Owner

djc commented Nov 21, 2019

Yeah, exactly. This is the first time it's come up on this issue tracker, which I think at least means it's reasonably rare.

@link2xt
Copy link
Contributor Author

link2xt commented Nov 21, 2019

Hmm, I'm curious how often invalid UTF-8 encodings actually come up?

Even if normally it never happens, someone can do it on purpose to make an IMAP client unable to parse FETCH responses and stop operating. It's a security issue actually.

rust-imap can always do the decoding on its side I suppose.

Do you propose to parse FETCH responses manually, without imap_proto?

@jonhoo
Copy link
Contributor

jonhoo commented Nov 21, 2019

I think the real resolution is to provide both methods for operating directly on &[u8] and methods that expose &str. The former should be used for those who want low-level control over how stuff works (like Delta Chat if they worry about malicious messages), and the latter for users who just want convenient APIs (e.g., my system tray e-mail notifier).

Do you propose to parse FETCH responses manually, without imap_proto?

Ah, no, I was proposing that rust-imap would parse the &[u8] that imap-proto would output with this PR merged.

@link2xt
Copy link
Contributor Author

link2xt commented Nov 21, 2019

@jonhoo

I think the real resolution is to provide both methods for operating directly on &[u8] and methods that expose &str.

It is possible to add convenience methods to Envelope and Address, but they will be just thin from_utf8_lossy wrappers, returning Cow<str>. Returning more convenient &str requires that a String is allocated somewhere if the string happens to contain invalid UTF-8. That is outside the scope of the parser, IMO.

The former should be used for those who want low-level control over how stuff works (like Delta Chat if they worry about malicious messages), and the latter for users who just want convenient APIs (e.g., my system tray e-mail notifier).

Looks like rust-imap exposes fetches and envelopes directly to the user now. If you want more convenient API, you can convert byte slices to Cow<str> and store it in some auxiliary structure that will be exposed to the user with a stable documented .subject() API returning &str. This way users will have a convenient API that never fails and yet does not make them deal with &[u8] or Cow<str>.

Ah, no, I was proposing that rust-imap would parse the &[u8] that imap-proto would output with this PR merged.

Cool, so we agree here.

Feel free to ping me if any changes to this PR are needed.

@jonhoo
Copy link
Contributor

jonhoo commented Nov 21, 2019

@link2xt Yeah, again, my proposal is that we merge this, and that rust-imap (not this crate) provide the &str convenience methods (or, possibly, Cow<str> as you suggest).

@djc
Copy link
Owner

djc commented Nov 23, 2019

So did you assess whether any of the other uses of nstring_utf8 should also be converted?

@djc djc merged commit a260569 into djc:master Nov 23, 2019
@djc
Copy link
Owner

djc commented Nov 23, 2019

Thanks!

@link2xt link2xt deleted the u8-strings branch November 23, 2019 20:18
@link2xt
Copy link
Contributor Author

link2xt commented Nov 23, 2019

So did you assess whether any of the other uses of nstring_utf8 should also be converted?

I have converted envelope and address only, there may be more similar bugs, I did not check every nstring_utf8 in the crate.

@link2xt
Copy link
Contributor Author

link2xt commented Nov 23, 2019

@djc
Body fields seem to be wrong too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

IMAP parser does not allow invalid UTF-8 in literal strings
3 participants