Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Formatting of email responses in NewDot isn't pretty #8088

Closed
mvtglobally opened this issue Mar 11, 2022 · 36 comments
Closed

Formatting of email responses in NewDot isn't pretty #8088

mvtglobally opened this issue Mar 11, 2022 · 36 comments
Assignees
Labels
Engineering Improvement Item broken or needs improvement. Internal Requires API changes or must be handled by Expensify staff Monthly KSv2 Reviewing Has a PR in review

Comments

@mvtglobally
Copy link

If you haven’t already, check out our contributing guidelines for onboarding and email contributors@expensify.com to request to join our Slack channel!


Action Performed:

@greeny was sending messages back/forth to a business partner

Expected Result:

When the partner replied via email that the formatting would like nice and there wouldn't be extra line breaks

Actual Result:

Formatting is a bit janky, inc. extra line breaks

Workaround:

unknown

Platform:

Where is this issue occurring?

  • Web
  • iOS
  • Android
  • Desktop App
  • Mobile Web

Version Number: 1.1.41-3
Reproducible in staging?: need repro
Reproducible in production?: Y
Email or phone of affected tester (no customers):
Logs: https://stackoverflow.com/c/expensify/questions/4856
Notes/Photos/Videos: Any additional supporting documentation
image - 2022-03-11T000521 922

Expensify/Expensify Issue URL:
Issue reported by: @mallenexpensify
Slack conversation: https://expensify.slack.com/archives/C01GTK53T8Q/p1646931708047189

View all open jobs on GitHub

@MelvinBot
Copy link

Triggered auto assignment to @MariaHCD (Engineering), see https://stackoverflow.com/c/expensify/questions/4319 for more details.

@MariaHCD
Copy link
Contributor

I think this could be an issue on the backend when parsing the response (and thus, possibly not something an external contributor could work on)

@zsgreenwald Would you mind DM'ing me more details (eg. the user's email) so I can investigate further via the logs and possibly on Mailgun as well?

@zsgreenwald
Copy link
Contributor

On it!

@MariaHCD
Copy link
Contributor

Looks like we ran into something similar here: https://github.com/Expensify/Expensify/issues/142335

Looking at the logs for the user that @zsgreenwald sent over via DM, looks like we defaulted to mailgun's stripped HTML content instead of parsing it ourselves:

Email request is either using VML/XML namespaces or is from an unknown provider, relying on mailgun's stripped HTML content

cc: @thienlnam Not sure if we can do anything here?

@thienlnam
Copy link
Contributor

thienlnam commented Mar 15, 2022

Yeah, not much we can do here unless we try to strip it ourselves which leads to endless things we'd have to account for. Since mailgun email stripping has been unreliable we can look into using another library for this except we'd need to test multiple platforms to make sure it works and isn't stripping out text content. We can also strip out whitespace after stripped HTML but that also leads to some unintended changes in other email providers

@MariaHCD
Copy link
Contributor

Thanks, @thienlnam! So it sounds like:

Option 1: Strip the content ourselves (which would be tricky due to the endless number of mail providers that we'd have to account for)

Option 2: Investigate usage of another library (which would entail lots of tests across multiple platforms to ensure the text content isn't being stripped)

Option 3:

We can also strip out whitespace after stripped HTML but that also leads to some unintended changes in other email providers

Could you elaborate on those unintended changes? Maybe we can trim any whitespaces before and after the stripped HTML?

@thienlnam
Copy link
Contributor

Yup those options are correct, I think option 2 would actually be worthwhile to do

Could you elaborate on those unintended changes?

It's been a while so I might not be remembering correctly, but IIRC stripping whitespace messed up the email formatting inside the email and made (yahoo?) emails look especially bad. I think we had this enabled at one point but decided to revert it.

Maybe we can trim any whitespaces before and after the stripped HTML?

Yeah this could help a bit, though it looks like in the linked issue the spacing is bad between text and before/after the text too and so it would still look bad

@MariaHCD
Copy link
Contributor

Great, thanks, @thienlnam! So moving forward, sounds like the best option would be to investigate usage of another library and this is something that can be handled internally as it is a backend change.

@MelvinBot MelvinBot removed the Overdue label Mar 21, 2022
@MariaHCD MariaHCD added the Internal Requires API changes or must be handled by Expensify staff label Mar 21, 2022
@melvin-bot
Copy link

melvin-bot bot commented Mar 21, 2022

Triggered auto assignment to @NicMendonca (Internal) because issue was reported by a contributor who needs to be compensated if this issue is fixed

@MariaHCD MariaHCD added Weekly KSv2 and removed Daily KSv2 labels Mar 21, 2022
@MariaHCD
Copy link
Contributor

Triggered auto assignment to @NicMendonca (Internal) because issue was reported by a contributor who needs to be compensated if this issue is fixed

This issue was not reported by a contributor.

@aldo-expensify aldo-expensify self-assigned this Apr 8, 2022
@aldo-expensify
Copy link
Contributor

I worked in this in the past in this PR. I added a function to trim the html, but it missed to consider cases where there are spaces using &nsbp;. I'll do a PR to fix that.

@aldo-expensify aldo-expensify added the Reviewing Has a PR in review label Apr 14, 2022
@mallenexpensify mallenexpensify added the Improvement Item broken or needs improvement. label Apr 18, 2022
@melvin-bot melvin-bot bot added Monthly KSv2 and removed Weekly KSv2 labels May 9, 2022
@melvin-bot
Copy link

melvin-bot bot commented May 9, 2022

This issue has not been updated in over 15 days. @aldo-expensify eroding to Monthly issue.

P.S. Is everyone reading this sure this is really a near-term priority? Be brave: if you disagree, go ahead and close it out. If someone disagrees, they'll reopen it, and if they don't: one less thing to do!

@zsgreenwald
Copy link
Contributor

@aldo-expensify @MariaHCD what's the latest with this? I'm still noticing some heavily spaced formatting on incoming messages from email. Some screenshots below:

image

image

@aldo-expensify
Copy link
Contributor

Hey @zsgreenwald , I made a PR, but it seems like it didn't work in this case. I'll have a look and see what it may be

@zsgreenwald
Copy link
Contributor

Yes, the exact problem is the extra white space. I'm suggesting we tighten it so it looks more natural when we receive email responses.

image

Not pressing, but I figured it'd be a quick fix.

@aldo-expensify
Copy link
Contributor

image

@thienlnam
Copy link
Contributor

Ah this has been a PITA in the past, we've had a lot of trouble parsing / cleaning specially formatted emails from outlook which I believe this is - for these types of emails we delegate the html stripping to mailgun. In the past, we've just stripped all line breaks to solve this problem but it removed any line breaks intended within the email which also didn't look great

@aldo-expensify You could try removing all break tags before text content starts, but there's a lot of email formatting overhead that you would have to figure out.

@aldo-expensify
Copy link
Contributor

aldo-expensify commented May 17, 2022

It is indeed a PITA, but I think there is room from improvement.

I'm testing locally with a message with the following html (this is what we end up with after our API processes the email):

<div class="WordSection1"> <p class="MsoNormal">Hi Zach, </p><p></p> <p class="MsoNormal"></p><p> </p> <p class="MsoNormal">Will tomorrow at 3 est. work?</p><p></p> <p class="MsoNormal"></p><p> </p> <p class="MsoNormal">Thanks, Steve</p><p></p> <p class="MsoNormal"></p><p> </p> </div>Notice required by law: This e-mail may constitute an advertisement or solicitation under U.S. law, if its primary purpose is to advertise or promote a commercial product or service. You may choose not to receive advertising and promotional messages from Crowe LLP (except for the crowe.com website, which tracks e-mail preferences through a separate process) at this e-mail address by forwarding this message to CroweUnsubscribe@crowe.com. If you do so, the sender of this message will be notified promptly. Our principal postal address is 225 W Wacker Drive, Suite 2600, Chicago, IL 60602-4903. This email message is from Crowe LLP or one of its subsidiaries and may contain privileged or confidential information or other information exempt from disclosure under applicable law. If you are not the intended recipient, please notify the sender by reply email immediately and delete this message without reading further or forwarding to others. This email is not intended to be a contract or other legally binding obligation, and any tax advice expressed in this email should not be construed as a formal tax opinion unless expressly stated. Visit www.crowe.com/disclosure for more information about Crowe LLP and its subsidiaries.

image

The problem seems to be that whitespaces before the text "Hi Zach" are getting wrapped in divs in the front end. These are the white lines we see.

image

become these divs:

image

I think this is a problem to fix in the front end, but I'm not sure where the html gets transformed yet.

On the backend, we have some code that tries to trim the html by removing tags that don't have text content and that don't have text content after or before them. Applying the same logic, I can remove the TEXT nodes that are just blank spaces in the borders of the email that are getting converted to divs (see this draft PR). This works, but only for the white lines that are added at the top and bottom... but I think there are still some white lines added in between because of the spaces being wrapped in divs.

Result:

image

@aldo-expensify
Copy link
Contributor

Changing this prop dangerouslyDisableWhitespaceCollapsing to false seems to fix the problem of whitespaces getting wrapped in divs:

image

@aldo-expensify
Copy link
Contributor

aldo-expensify commented May 17, 2022

@parasharrajat this PR: #6975 changed dangerouslyDisableWhitespaceCollapsing to true, which seems to be causing the problem here: unnecessary extra empty lines. Maybe we should look for another solution to the problem that PR was trying to solve.

@parasharrajat
Copy link
Member

We are removing that flag in another issue. I will add the PR link as soon as I find it.

@parasharrajat
Copy link
Member

That PR is #9025

@aldo-expensify aldo-expensify added Weekly KSv2 and removed Monthly KSv2 labels May 17, 2022
@aldo-expensify
Copy link
Contributor

The solution in that PR seems to cause the exact same problem for this case:

Screen.Recording.2022-05-17.at.4.39.25.PM.mov

I don't think we should be changing how the whitespace works when rendering html outside a <pre>. If there is some indentation problems with lists.. maybe there is something to do with padding or margin?

@melvin-bot melvin-bot bot added the Overdue label May 26, 2022
@zsgreenwald
Copy link
Contributor

@aldo-expensify I noticed this PR merged. I'm waiting on some email responses to come in, but should that merge fix the issue?

@aldo-expensify
Copy link
Contributor

@aldo-expensify I noticed #9025 merged. I'm waiting on some email responses to come in, but should that merge fix the issue?

I tested locally that PR and it seems to cause the same problem.

@aldo-expensify
Copy link
Contributor

I investigate this a bit more today and I know better now when it happens, but I haven't been able to come up with a solution. I have a clearer way to reproduce it here: #9195 (comment)

@melvin-bot melvin-bot bot removed the Overdue label Jun 4, 2022
@aldo-expensify
Copy link
Contributor

aldo-expensify commented Jun 8, 2022

From what I can see, we have the following cases:

With white-space: pre Without `white-space: pre` Data
1 "0 spaces<br /> 1 spaces<br /> 2 spaces<br /> 3 spaces<br /> 4 spaces"
2 "0 spaces<br /> 1 spaces<br /> 2 spaces<br /> 3 spaces<br /> 4 spaces<br /> <pre>Test</pre>"
3 " <div class=\"WordSection1\"> <p class=\"MsoNormal\">Hi SomeName, </p><p></p> <p class=\"MsoNormal\"></p><p> </p> <p class=\"MsoNormal\">Will tomorrow at 34 est. work?</p><p></p> <p class=\"MsoNormal\"></p><p> </p> <p class=\"MsoNormal\">Thanks, Test</p><p></p> <p class=\"MsoNormal\"></p><p> </p> </div>xxxxxx xxxxxxxx xx xxx: xxxx x-xxxx xxx xxxxxxxxxx xx xxxxxxxxxxxxx xx xxxxxxxxxxxx xxxxx x.x. xxx, xx xxx xxxxxxx xxxxxxx xx xx xxxxxxxxx xx xxxxxxx x xxx..."
  1. This case is NOT rendered as HTML because of this exception in handling it, so it works in both cases.
  2. and 3. There are other html tags than <br />, so we render these cases using the react-native-render-html library.

I think it makes sense to add the white-space: pre when we are rendering html messages typed by the user in our App because that makes them render as close as possible to what the user initially typed (preserving spacing). The problem appears when the message came from an email. The email client has a different way of rendering the HTML of the message, they use white-space: normal and &nbsp; to preserve spacing.

We could try one of the following fixes:

  1. Flag the messages that came from the email intake, don't force white-space: pre for the flagged messages
  2. Check if the message contains a &nbsp;, if it does, don't force white-space: pre
  3. Replace &nbsp; by normal space and keep forcing white-space: pre. This didn't work: Remove whitespace: pre because of extra empty lines bug #9195 (comment)
  4. Do heavier processing of the email in the backend to avoid the amount of weird structured html in them, but that sounds like an endless amount of work.

@aldo-expensify
Copy link
Contributor

Just mentioning that if you edit the message that came from the email reply, it "breaks" like this:

image

The result is the same with and without white-space:pre

@aldo-expensify
Copy link
Contributor

aldo-expensify commented Jun 20, 2022

I did some more testing today and found out that outlook likes to add a bunch of \n in the email's html. The \n with whitespace: pre makes a lot of empty lines appear.

The proposed solution of avoid using whitespace: pre for message comings from emails handles well this case, so I'm sticking with that.

I updated the PR's descriptions and made them ready for review.

@aldo-expensify aldo-expensify added Reviewing Has a PR in review and removed Overdue labels Jun 20, 2022
@melvin-bot melvin-bot bot removed the Weekly KSv2 label Jul 14, 2022
@melvin-bot
Copy link

melvin-bot bot commented Jul 14, 2022

This issue has not been updated in over 15 days. @aldo-expensify eroding to Monthly issue.

P.S. Is everyone reading this sure this is really a near-term priority? Be brave: if you disagree, go ahead and close it out. If someone disagrees, they'll reopen it, and if they don't: one less thing to do!

@melvin-bot melvin-bot bot added the Monthly KSv2 label Jul 14, 2022
@melvin-bot melvin-bot bot closed this as completed Jul 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Engineering Improvement Item broken or needs improvement. Internal Requires API changes or must be handled by Expensify staff Monthly KSv2 Reviewing Has a PR in review
Projects
None yet
Development

No branches or pull requests

10 participants