Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configuration options for Rack::Deflate #457

Merged
merged 2 commits into from
Aug 3, 2014
Merged

Conversation

jakubpawlowicz
Copy link
Contributor

Rack::Deflater currently does not support any configuration options which makes it cumbersome to control.

This pull request adds support for the following options:

  • include - an array of content types enabled for compression
  • if - a lambda for choosing whether to deflate based on current execution scope (request, status, body, or headers)

Examples:
use Rack::Deflater, include: %w(text/json application/json)
use Rack::Deflater, if: lambda { |env, status, headers, body| body.length > 512 }

EDIT: description was update based on a PR discussion

@maletor
Copy link

maletor commented Nov 15, 2012

Was this created with the intention of skipping image files with Deflator? Because that would be an excellent use case according to https://developers.google.com/speed/docs/best-practices/payload#GzipCompression.

@jakubpawlowicz
Copy link
Contributor Author

Yes, that's one use case I can think of. In general it's a bad idea to compress requests smaller than TCP packet size as it only adds an overhead on both ends.

This threshold feature can always be configured on the web server level (nginx, Apache) but in case of services which do not proxy requests through a web server (e.g. Heroku Cedar) having such option in Rack::Deflater is the only option available.

@maletor
Copy link

maletor commented Nov 15, 2012

Definitely important. Would love to see this get merged!

@maletor
Copy link

maletor commented Nov 15, 2012

TCP packet size is different for everybody? But in general, seems to be about 64K?

@jakubpawlowicz
Copy link
Contributor Author

64kB is the theoretical upper limit for TCP/IP. In reality it depends on the connection type and does not exceed ~1 kB - still there's no need to compress such small amounts of data.

@bpinto
Copy link

bpinto commented Dec 21, 2012

Is this ready to be merged?

@jakubpawlowicz
Copy link
Contributor Author

Not yet as some new specs are failing under 1.8.7 / jruby / ree. It will be ready to merge within the next 24 hours.

@jakubpawlowicz
Copy link
Contributor Author

@bpinto it's ready for merge. I've refactored the tests a bit more, turned off deflating to see if all relevant tests fail (they do!) and reverted to a working state. Should be great now!

@app = app

@min_content_length = options[:min_content_length] || options['min_content_length']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Lets just support symbol option keys, not strings, consistent with other middleware options.
  • Maybe min_size or min_length? Just a little shorter?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets default min-content-length to 1kb. There can be some advantages on mobile networks for smaller entities, but that domain has additional solutions and complexities - shared dictionaries are better, lets hope for them in http2.

@raggi
Copy link
Member

raggi commented Dec 30, 2012

In principle this looks great. I left you some line comments, thanks for your work so far!

@jakubpawlowicz
Copy link
Contributor Author

Thanks @raggi for all suggestions! I've just applied them so it should be more ready to merge.

@raggi
Copy link
Member

raggi commented Jan 22, 2013

This needs rebasing on top of rack master. I'll be happy to merge this in the next release, but I am out of time for 1.5.0.

@jakubpawlowicz
Copy link
Contributor Author

That's a pity. I'll merge it as soon as possible to make it into 1.6.

@jakubpawlowicz
Copy link
Contributor Author

@raggi That should be it!

@lilith
Copy link

lilith commented Jan 30, 2013

To me, the safest default scenario is opt-in based on mime-type.

HTML5 Bootstrap includes server-side configuration to compress the following mime-types

text/html    
text/css
text/plain
text/x-component
application/javascript
application/json
application/xml
application/xhtml+xml
application/x-font-ttf
application/x-font-opentype
application/vnd.ms-fontobject
image/svg+xml
image/x-icon;

I would consider the following list instead:

# All html, text, css, and csv content should be compressed
text/plain
text/html
text/csv
text/css

# Only vector graphics and uncompressed bitmaps can benefit from compression.
#GIF, JPG, and PNG already use a lz* algorithm, and certain browsers can get confused.
image/x-icon
image/svg+xml
application/x-font-ttf
application/x-font-opentype
application/vnd.ms-fontobject


# All javascript should be compressed
text/javascript
application/ecmascript
application/json
application/javascript

# All xml should be compressed
text/xml
application/xml
application/xml-dtd
application/soap+xml
application/xhtml+xml
application/rdf+xml
application/rss+xml
application/atom+xml

If it's not too late, I would suggest mime-type evaluation instead of URL regexes. Providing a sane default set would be great; the current usage pattern is causing lots of issues with images and PDFs, since old browsers lie about Accept-Encoding.

Rack::Deflate has become very important, as Heroku's new Cedar stack removed automatic gzip support, and requires that step be moved to the application itself.

@maletor
Copy link

maletor commented Jan 30, 2013

I definitely agree with you Nathanael.

On Wednesday, January 30, 2013, Nathanael Jones wrote:

To me, the safest default scenario is opt-in based on mime-type.

HTML5 Bootstrap includes server-side configuration to compress the
following mime-types

text/html
text/css
text/plain
text/x-component
application/javascript
application/json
application/xml
application/xhtml+xml
application/x-font-ttf
application/x-font-opentype
application/vnd.ms-fontobject
image/svg+xml
image/x-icon;

I would consider the following list instead:

All html, text, css, and csv content should be compressed

text/plain
text/html
text/csv
text/css

Only vector graphics and uncompressed bitmaps can benefit from compression.

#GIF, JPG, and PNG already use a lz* algorithm, and certain browsers can get confused.
image/x-icon
image/svg+xml
application/x-font-ttf
application/x-font-opentype
application/vnd.ms-fontobject

All javascript should be compressed

text/javascript
application/ecmascript
application/json
application/javascript

All xml should be compressed

text/xml
application/xml
application/xml-dtd
application/soap+xml
application/xhtml+xml
application/rdf+xml
application/rss+xml
application/atom+xml

If it's not too late, I would suggest mime-type evaluation instead of URL
regexes. Providing a sane default set would be great; the current usage
pattern is causing lots of issues with images.


Reply to this email directly or view it on GitHubhttps://github.com//pull/457#issuecomment-12893560.

@jakubpawlowicz
Copy link
Contributor Author

@raggi - any ideas about mime-type based evaluation?

@lilith
Copy link

lilith commented Feb 4, 2013

Should I submit a pull request?

@lilith
Copy link

lilith commented Feb 8, 2013

This is what I'm currently using: https://gist.github.com/nathanaeljones/4739210

@jakubpawlowicz
Copy link
Contributor Author

Any ideas how to push it forward?

@masterkain
Copy link

Looks awesome to have; HAProxy 1.5dev19 dropped (temporarly) support for transparent gzip compression of chunked responses so we have to implement it at app levels, the more options, the better.

end

# Skip if response body is too short
if @min_length > headers['Content-Length'].to_i

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm testing this but using Rainbows!, Rails 3.2.14, http 1.1 and I don't see any Content-Length header in this point, this thus fails and compression is skipped.

Rainbows! was started with the -N option to not insert any default middleware, so my config.ru is:

use Rack::ContentLength
use Rack::Chunked
run Myapp::Application

@jjb
Copy link

jjb commented Nov 24, 2013

@jakubpawlowicz -- this is an awesome patch. It has a lot of features all together, which is probably part of why it's taking so long to merge. Here are some thoughts on how it can be simplified and improved.

include/exclude

It seems like folks (including me) think that

  • whitelist is superior
  • mime-type/Content-Type is superior

I recommend:

  • remove the exclude option entirely (I can't think of a use case, tell me if I'm missing something)
  • reimplement include to operate on the Content-Type header instead of the URL.
  • thought: would be nice to allow the Content-Type list to accept regexes, to allow for text/* for example. but this could perhaps come in a subsequent pull request, to keep this one more simple.

skip_if

Maybe a better name for this to make it consistent with other DSLs would be unless. Also, IMO it's more simple to make it if. "skip/unless" make it feel like it's supposed to be a filter. "if" is more generic, allowing the user to simply put in a conditional as they see fit. (and indeed, even in the example you gave you have a !=, making your overall example a double negative).

min_content_length

tl;dr: this option should be removed entirely, because it can be trivially implemented with skip_if/unless/if

There are several problems with making a default value.

  • it's a behavior change, so it makes it more controversial to merge
  • you didn't explain how you chose 1024

There are 3 independent reasons I can think of to have a minimum value:

  1. the point at which compressing does not reduce the size, and in fact might increase it. this is an objective value that is relevant to all users. Some quick searching seems to indicate that the number is around 150 bytes.
  2. the point at which the size is smaller than a single packet anyway. to my understanding this varies between environments. Maybe others could speak to this more.
  3. the point at which the time-complexity of compression is not worth the reduction in data-transmission time. This is highly dependent on hardware and applications. Also, it can be balanced with the gzip compression level. setting the compression level to 5 results in a much different time-complexity/compression ratio than 9. so, a good solution to accommodate this tradeoff would also include adjusting the compression level based on content-type. I just checked -- rack::deflate uses zlib's default, which is 6.

To increase the chances of your patch being merged, I think the default should be removed. Instead, there could be a recommended minimum in documentation.

Furthermore, regarding point 3, this makes me think it would be nice to be able to adjust compression level as well. So after your patch is merged, I'd like to experiment with augmenting it so your compression_level option could be passed either an integer or a lambda, which would dictate if something is compressed at all, and with what compression level.

->(size){
case size
when 0..256
  nil
when 256..1024
  6
when 1025..Float::INFINITY
  9
end
}

And as I typed that out I realized: this could be achieved with skip_if -- so, you could rewrite skip_if to expect either a boolean or integer to be returned. When it's an integer, it indicates compression level.

@jjb
Copy link

jjb commented Nov 24, 2013

as soon as i posted, that, i realized that include can also be implemented with if. So now I'm a bit conflicted

  • on the one hand, only implementing if and nothing else will be much more simple overall code and allow for greater flexibility for the user
  • on the other hand, the point of offering the library is to make it easy to achieve domain-specific things, so options like include and min_content_length make a lot of sense.

I've seen other libraries offer specific options and then an all-powerful if on the side, so maybe that's fine for here. too. My gut says that going with only include and if is a good approach to start with.

@jakubpawlowicz
Copy link
Contributor Author

@jjb appreciate your awesome feedback!

  • to be honest I don't remember where 1024 came from (it's been a year since I wrote it) but I like your definition via lambda - let's have it that way.
  • whitelisting via include should work well (so exclude will be gone)
  • min_content_length was a convenience method but your idea is better

If you have a spare moment can you let me know what do you think about @nathanaeljones comment: #457 (comment) ? We could have it as an option for include, e.g.

use Rack::Deflater, { include: Rack::Deflater::DEFAULTS }

Thoughts?

@jakubpawlowicz
Copy link
Contributor Author

@jjb - that should be it!

I decided to use Rack::Mime::MIME_TYPES instead of MIME::Types to save on adding a dependency. So a regular expression can be passed and it is first matched against Content-Type header and then against list of known mime types. We should probably have some sanity checks first against missing Content-Type header too.

Once we are all set I will squeeze all the commits into two - one for refactored spec_deflater.rb and the other one for Deflater options.

Please let me know how it looks to you.

@jakubpawlowicz
Copy link
Contributor Author

And regarding failing specs I noticed master fails on them too. It shouldn't be an excuse though...

@jjb
Copy link

jjb commented Dec 9, 2013

@jakubpawlowicz awesome. the discussion about mime types was regarding where to get a list of defaults. it looks like you are using it to make sure that the provided types are valid. my feeling is that this isn't very useful and you can exclude this entirely.

# Skip if :if lambda is given and evaluates to false
if @if &&
!@if.call(env, status, headers, body)
return false
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return false if @if && !@if.call(env, status, headers, body)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(just my style opinion...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, it reads better.

@jakubpawlowicz
Copy link
Contributor Author

@jjb sorry my mistake! So let's make sure we are on the same page regarding :include:

  • it should be an array (assembled anyhow user wants)
  • we should default to all content types matching /^text/ - what about application/json?
  • if above then there's no need to make it default explicitly via use Rack::Deflater, { include: Rack::Deflater::DEFAULTS } as plain use Rack::Deflater will be enough

@jjb
Copy link

jjb commented Dec 9, 2013

that is one way. but, the current deflater approach is to deflate everything. so, you could keep this default behavior, and only change the behavior if include is specified. I recommend this as it's more likely to be accepted by the maintainers and is just generally a more conservative approach.

So your code already does this :-D. so if you agree, you can just remove the check that it's in Rack::Mime::MIME_TYPES.values and IMO the code will be more simple.

(i think i overexplained this but just trying to be extra clear :-D)

!(@include.match(headers['Content-Type']) && Rack::Mime::MIME_TYPES.values.include?(headers['Content-Type']))
return false
end

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return false if @include && !@include.match(headers['Content-Type'])

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since @include should be an array it should rather read:

return false if @include && !@include.include?(headers['Content-Type'])

or to make it more readable

return false if @include && @include.index(headers['Content-Type']).nil?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good. i find the first more readable, i'm not familiar with index and the required .nil? at the end seems complicated

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as long as it works as expected I'm fine with the first too

@jakubpawlowicz
Copy link
Contributor Author

Cool, so we are on the same page. Will update the code shortly.

@app = app

@if = options[:if]
@include = options[:include]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the symbols are for the DSL but you could make more descriptive variables within the class

@condition = options[:if]
@compressible_types = options[:include]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 nice one!

@jjb
Copy link

jjb commented Dec 9, 2013

I made this SO question to try to find a nice list of types for us to recommended in the documentation http://stackoverflow.com/questions/20477558/where-can-i-find-a-list-of-textual-mime-types

@jakubpawlowicz
Copy link
Contributor Author

@jjb here you go! Let's see what your SO question brings.

@jakubpawlowicz
Copy link
Contributor Author

@jjb So it's quashed into two commits, with better docs, and updated PR description

@raggi @nathanaeljones You may like it much more right now!

end

# Skip if @compressible_types are given and does not include request's content type
return false if @compressible_types && !@compressible_types.include?(headers['Content-Type'])
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First thing, thanks for this code it is a big help. One issue I ran into was this line. When the Content-type has the optional parameter value (in my case the character encoding) the content-type check fails. See RFC-1341. I am using .split(";")[0] and it is working for my use cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jayschab for a valuable input. I'll add a test case and your solution and we should be all good. 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jayschab Solved in bd3723b

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found a bug in my suggested code in my testing. Content-Type isn't always set.

I ended up changing my version to this:

return false if @compressible_types && !(headers.has_key('Content-Type') && @compressible_types.include?(headers['Content-Type'][/[^;]*/]))

* Adds :if option which should be given a lambda accepting env, status, headers, and body options.
* When :if evaluates to false a response body won't be compressed.
* Adds :include option which should be given an array of compressible content types.
* When :include don't include request's content type then response body won't be compressed.
end

# Skip if @compressible_types are given and does not include request's content type
return false if @compressible_types && !(headers.has_key?('Content-Type') && @compressible_types.include?(headers['Content-Type'][/[^;]*/]))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jayschab Updated it per your suggestion.

Found it problematic to test scenario without the 'Content-Type' header as it gets set every time in specs.
Any ideas?

@jakubpawlowicz
Copy link
Contributor Author

is there anything we can do to push it forward after 1.5 years in PR?

@felixbuenemann
Copy link
Contributor

This would be nice to have. I'm currently shuffling around middleware to keep dragonfly jobs from being compressed.

raggi added a commit that referenced this pull request Aug 3, 2014
Configuration options for Rack::Deflate
@raggi raggi merged commit 62dcc83 into rack:master Aug 3, 2014
@jjb
Copy link

jjb commented Aug 3, 2014

😹

@jakubpawlowicz
Copy link
Contributor Author

Wicked! Thanks @raggi!

@jjb
Copy link

jjb commented Mar 19, 2015

in case someone finds this thread, here's what I'm using for mime-types:

include: Rack::Mime::MIME_TYPES.select{|k,v| v =~ /text|json|javascript/ }.values.uniq

@campbecf
Copy link

I had to do use Rack::Deflater, if: lambda { |env, status, headers, body| body.body.length > 512 } to get the example to work.

@jjb
Copy link

jjb commented Mar 11, 2017

@campbecf looks like the example got fixed here 🎉 : c987ffa

@axelson
Copy link

axelson commented May 10, 2018

Note: the example has been updated again in #1211

Now it is:

use Rack::Deflater, :if => lambda { |*, body| sum=0; body.each { |i| sum += i.length }; sum > 512 }

@AlessandroMinali
Copy link
Contributor

AlessandroMinali commented Feb 20, 2021

Even though I made that last update I was looking again at this today and this is a better solution:

use Rack::Deflater, if: lambda { |*, body| body.each { |i| return i.bytesize > 512 } }

The original change was due to the need to handle Rack::File::Iterator. Since the first chunk of the iterator is buffered then we can exit early after examining it vs wasting time iterating.

part = file.read([8192, remaining_len].min)

As long as your threshold is below the buffer size(ie. 8192) then we can use this quicker evaluation.

I won't update the code documentation as this is not a general solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.