Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve resource caching #973

Merged
merged 21 commits into from
Oct 28, 2019
Merged

Improve resource caching #973

merged 21 commits into from
Oct 28, 2019

Conversation

Marc-Andre-Rivet
Copy link
Contributor

@Marc-Andre-Rivet Marc-Andre-Rivet commented Oct 22, 2019

Currently Dash uses caching on the browser side combined with query string cache busting (e.g. ?v=141&m=29384720349).

This is generally considered an anti-pattern as browsers have sometimes problems dealing with the query string correctly and cause the cache to always be bust. This is our case. When checking caching on dash.plot.ly it turns out that no JS from _dash-component-suites is ever cached (tested on latest Chrome). Apparently we do not set cache-control max-age.

To keep with the existing behavior but also providing a second caching mechanism, adding suport for ETag and If-None-Match provides the BE with the ability to tell the browser that the cached version of the resource is indeed the latest version. When the tag received matches the resource's tag, a 304 is returned instead of a 200+data, significantly reducing network traffic and load time. This won't impact debug/dev as the resource updating will update the etag.

Fingerprinting for async artifacts has also been added as part of this PR. ETag is now a fallback for resources that are not fingerprinted, whatever the reason.


(before) against dash.plot.ly 1st and 2nd run, all resources return 200 and are fetched, no caching

(after) against local dash-docs 1st run (200) and 2nd run (304, or loaded from cache with fingerprinting) -- the calls are still being made to the server but the server confirms the cached resource is not stale.

On local dash-docs this is enough to reduce the time to DOMContentLoaded for http://127.0.0.1:8000/datatable down from 600ms to 300ms without throttling, from ~11 seconds to ~2.5 seconds with throttling the network speed to Fast 3G and from ~35 to ~6 seconds on Slow 3G.

** Having trouble uploading animated gifs with the browser's behavior at the moment - will try again later

@Marc-Andre-Rivet Marc-Andre-Rivet marked this pull request as ready for review October 23, 2019 00:23
@chriddyp
Copy link
Member

Nice!

@alexcjohnson
Copy link
Collaborator

Wow, good catch! Did this change recently? I could swear I saw this working not too long ago...

I wonder though, wouldn't it be pretty easy to include the same mod time info we do now, just as part of the path rather than as a query string, and simply strip it back out when serving the file?

ie right now we take a path like:
dash_core_components/dash_core_components.min.js
and turn it into a url like:
/_dash-component-suites/dash_core_components/dash_core_components.min.js?v=1.3.1&m=1571759511

The query string gets automatically dropped by the route
"{}_dash-component-suites/<string:package_name>/<path:path_in_package_dist>"

But without changing the file names/paths on disk at all we could just construct urls like:
/_dash-component-suites/dash_core_components/dash_core_components.min.v1_3_1m1571759511.js
and have the route handler strip it out before looking for the file

path_parts = path_in_package_dist.split(".")
actual_path = ".".join(path_parts[:-2] + [path_parts[-1]])

@Marc-Andre-Rivet
Copy link
Contributor Author

Did this change recently?

Digging deeper we're apparently never cache-control max-age or anything like that -- except if it was removed or that some previous version of Flask handled that by default before, I don't see how this could have ever worked.

/_dash-component-suites/dash_core_components/dash_core_components.min.v1_3_1m1571759511.js

Yes. That will work fine for all resources that Dash is injecting into the html passed to the browser, but this strategy will not work for async resources.

@Marc-Andre-Rivet
Copy link
Contributor Author

Marc-Andre-Rivet commented Oct 23, 2019

Based on feedback / discussion with @alexcjohnson, expending the scope to cover:

  1. moving from query string to fingerprinting for both standard and async js resources
  2. updating @plotly/webpack-dash-dynamic-import to fingerprint the async resources in accordance with the DashPy expectations. This will require the DashR to implement these changes too @rpkyle
  3. DashPy checks for fingerprinting information through pattern matching - if there is a fingerprint, use cache-control max-age with a value of 1 year for caching purposes, otherwise, use ETag/If-None-Match as a general catch-all mechanism

Libraries using the old version of @plotly/webpack-dash-dynamic-import will need to update their usage to benefit from the cache-control, otherwise ETag will be used.

table and dcc will be updated in followup PRs once dynamic-import 1.1.0 is published.

Marc-André Rivet added 2 commits October 23, 2019 14:31
srcFragments.splice(-1, 0, ${getFingerprint()});

return srcFragments.join('.');
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Override webpack's async chunks path resolution; add fingerprint information.

dash/dash.py Outdated
modified,
'.'.join(relative_package_path.split('.')[-1:]),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new pattern!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh hmm, .js.map again...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to use a specialized fingerprint building function that should handle these cases consistently in regards to the fingerprint check. https://github.com/plotly/dash/pull/973/files#diff-2b63b39f9669123b2341a57a37913ad8R10

# Resolve real resource name from fingerprinted resource path
path_in_package_dist = (
res[1] + res[3] if res is not None else path_in_package_dist
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the resource was requested with the correct pattern (from dash filling the html page or from async resources override) - use cache-control, otherwise use Etag

Marc-André Rivet added 2 commits October 23, 2019 15:23
Marc-André Rivet added 2 commits October 23, 2019 15:38
@Marc-Andre-Rivet Marc-Andre-Rivet changed the title Improve JS resource caching Improv resource caching Oct 23, 2019
@Marc-Andre-Rivet Marc-Andre-Rivet changed the title Improv resource caching Improve resource caching Oct 23, 2019
Copy link
Contributor

@byronz byronz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💃

dash/dash.py Outdated
@@ -676,6 +679,19 @@ def _generate_meta_html(self):

# Serve the JS bundles for each package
def serve_component_suites(self, package_name, path_in_package_dist):
# Check if the resource has a fingerprint
res = re.match(
r'^(.*)[.]v\d+_\d+_\d+(-\w+[.]?\d+)?m\d{10}([.]js)$',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this going to be used for .css and .js.map too? Anyway those are included in the mimetype dict below.

Also wondering if we really need to be that specific with the pattern, and if it really needs to have a . in the (-\w+[.]?\d+) group? What about:

r'^(.*)[.]v[\w_-]+m\d+([.][\w.]+)$'

Copy link
Contributor Author

@Marc-Andre-Rivet Marc-Andre-Rivet Oct 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point about the various extensions.

.js.map is not really an issue here as the requested resource comes from the //# sourceMappingURL=bundle.js.map at the end of the bundle file, is handled by the browser and will never get fingerprinted in the front-end or be provided by the sever in the html. It will get an etag but I'm not really sure if that means anything for a source-map.

As per #973 (comment) this may indeed be problematic for other file extensions of the form xx.yy. Instead of operating under the assumption that only the last group is the extension, could be updated to only use the first group as the filename.

As for the regex being stronger vs. weaker, I was coming at it from the perspective that we have full control over the format that will be provided, so might as well make a strong check here too. That said, there's no obvious advantage to making a stronger check and maybe we end up losing some flexibility if edge-cases appear. Will update with a more general pattern.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK cool, I'll buy that about the maps - if you tell me you've seen it working 😅

Yeah my thinking about the regex was just it's safer to match anything we're sure won't be in the regular filename. Like, what if there's some system where mod time isn't 10 digits for whatever reason, or someone has a really nonstandard version identifier.

The mimetype dict as written right now is explicit - whether or not intentionally so - that the final extension can only be js, css, or map - anything else will throw a KeyError. Currently anything else you need would go through assets (which would be interesting to investigate for caching, though generally much less important than component js), not component suites.

Copy link
Contributor Author

@Marc-Andre-Rivet Marc-Andre-Rivet Oct 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you tell me you've seen it working 😅

Yup. They work :) The "routing/fingerprint" done by Dash only affects the resources put in the HTML, in that sense it's more trickery than routing 😸

For the regex, generalized the pattern here https://github.com/plotly/dash/pull/973/files#diff-2b63b39f9669123b2341a57a37913ad8R6 and its inverse. They behave as follow for various valid / invalid strings:

image

image

These strings are now tested in unit tests within Dash.

The mimetype dict as written right now is explicit

My understanding is that this was intentional. Learned about it when I added the map extension. There's apparently a library that offers mimetype mapping but I was told at the time that it was sketchy. Needs to be revisited. If only for the woff/ddk issue.

- improve recognizing fingerprints
- add fingerprint tests
const __jsonpScriptSrc__ = jsonpScriptSrc;
jsonpScriptSrc = function(chunkId) {
const srcFragments = __jsonpScriptSrc__(chunkId).split('.');
srcFragments.splice(1, 0, ${getFingerprint()});
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed so that only the first fragment is considered the filename.

relative_package_path,
importlib.import_module(namespace).__version__,
modified,
),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isolated the fingerprinting logic with its inverse in a easily testable file / function

Copy link
Collaborator

@alexcjohnson alexcjohnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic! 💃

Don't need the parens

Co-Authored-By: alexcjohnson <johnson.alex.c@gmail.com>
@Marc-Andre-Rivet
Copy link
Contributor Author

@alexcjohnson Good catch on plotly/dash-table#632 (comment) -- testing it out.

@Marc-Andre-Rivet
Copy link
Contributor Author

Marc-Andre-Rivet commented Oct 28, 2019

@alexcjohnson Non-async resources will work fine because Dash Py distinguishes the relative_package_path usage (https://github.com/plotly/dash/blob/dev/dash/dash.py#L544) from the external_url usage (https://github.com/plotly/dash/blob/dev/dash/dash.py#L564), causing only the relative path to get a fingerprint.

Async resources do not distinguish between being external or internal and will cause problems as currently implemented in the 1.1 version of the plugin.

@Marc-Andre-Rivet
Copy link
Contributor Author

I think the Dash Py code can be updated relatively easily to decorate the script tag with an attribute telling the piugin if it is local/external and allowing toggling fingerprinting that way.

@Marc-Andre-Rivet
Copy link
Contributor Author

Marc-Andre-Rivet commented Oct 28, 2019

I think the Dash Py code can be updated relatively easily to decorate the script tag with an attribute telling the plugin if it is local/external and allowing toggling fingerprinting that way.

Turns out there's a few scenarios for async resources. Amongst others, chaining import(...) has impact over the "currently executing script" and the webpack implementation for script creation can't be augmented/decorated - making it impossible to know if the async resource is local/remote vs. always possible for statically loaded resources in the initial html. Instead of some complex BE logic and weird webpack overrides, pushing this into the webpack plugin: if the script src contains _dash-components-suite it's local and will get fingerprinted, otherwise it's not.

f156ede#diff-223b8e12c9c65075d23c7d56cd3270d7R27 - if not local, just use the resource as-is, without fingerprinting.

let src = __jsonpScriptSrc__(chunkId);

if(!isLocal) {
return src;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't fingerprint if using an external resource!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants