Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve upload speed #165

Merged
merged 2 commits into from
Apr 1, 2022
Merged

Improve upload speed #165

merged 2 commits into from
Apr 1, 2022

Conversation

tsibley
Copy link
Member

@tsibley tsibley commented Mar 31, 2022

See commit messages for details.

Tested manually. Dramatically improves upload speed to nextstrain.org. I performed a few ad-hoc benchmarks to ensure the compression level change was worth it.

tsibley added 2 commits March 31, 2022 14:46
…y lines

It's a binary stream where we don't care about lines, and iterating in
tiny line-wise chunks (via the requests package) resulted in extremely
slow `nextstrain remote upload` times when the destination was
nextstrain.org.¹  When the destination was S3, the s3transfer package
underlying boto3 took care of reading from the file handle in chunks
instead of lines.

¹ https://bedfordlab.slack.com/archives/C01LCTT7JNN/p1648748118413089
On a ~200 MB example input (a real dataset JSON), the difference in
compressed size of ~200 kB (6 MB vs. 6.2MB) seems not worth the
difference in compression speed of ~4s (5.6s vs. 1.5s).  I assume
similar ratios for other inputs of different sizes but similar
composition (dataset JSONs and narrative markdowns).
@tsibley tsibley requested a review from a team March 31, 2022 22:02
@tsibley tsibley merged commit d7e5738 into master Apr 1, 2022
@tsibley tsibley deleted the trs/upload-speed branch April 1, 2022 17:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

2 participants