Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of the push operation #88

Merged
merged 29 commits into from
May 6, 2014

Conversation

pjanik
Copy link
Contributor

@pjanik pjanik commented May 2, 2014

These changes improve performance of push operation (in particular the diff calculation). See #44.

Main changes:

  1. Use ignore_on_server to optimise diff.
  2. Use GET Bucket instead of separate HEAD requests.
  3. Use optimised version of filey-diff gem, which:
    • doesn't send HEAD and GET requests during data sources comparison.
    • has optimised some slow methods (e.g. select_in_outer_array).
  4. MD5 of gzipped S3 objects is now calculated from gzipped content (instead of raw, non-gzipped like previously), so e.g. each time when you update your gzip config option and push, all the files affected by this setting will be updated. We can consider it as a breaking change or kind of bug fix.

3 is a separate pull request to filey-diff.

I also recorded almost all VCR tests again and did minor changes in their configuration. New tests are also added:

  • spec for gzip_helper
  • large site update (2100 files, as GET Bucket is limited to 1000 objects max)
  • update of site after gzip config change (what's related to 4.)

I didn't update gem version, as I'm 100% not sure how. These changes are mostly about performance, but it also depends on how we treat change in behaviour described in 4.

We are already testing this branch to deploy a site that consists of about 15,000 files and it seems to work fine - previously the diff operation couldn't be completed due to performance issues.

scytacki and others added 29 commits March 25, 2014 23:37
It will be necessary to calculate MD5 of gzipped files during diff
calculation.
Now it's handled by gzip helper.
.gitkeep is excluded from upload to make cassette smaller / simpler.
Note that by default gzip includes timestamp, so it's not obvious
that two gizpped files will have the same content.
@laurilehmijoki laurilehmijoki merged commit e77f24d into laurilehmijoki:master May 6, 2014
@laurilehmijoki
Copy link
Owner

Excellent work, thank you!

I've released this pull request in the version 1.7.5.

All s3_website users will benefit from this significant performance improvement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants