-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve resource caching #973
Conversation
Nice! |
Wow, good catch! Did this change recently? I could swear I saw this working not too long ago... I wonder though, wouldn't it be pretty easy to include the same mod time info we do now, just as part of the path rather than as a query string, and simply strip it back out when serving the file? ie right now we take a path like: The query string gets automatically dropped by the route But without changing the file names/paths on disk at all we could just construct urls like: path_parts = path_in_package_dist.split(".")
actual_path = ".".join(path_parts[:-2] + [path_parts[-1]]) |
Digging deeper we're apparently never cache-control max-age or anything like that -- except if it was removed or that some previous version of Flask handled that by default before, I don't see how this could have ever worked.
Yes. That will work fine for all resources that Dash is injecting into the |
Based on feedback / discussion with @alexcjohnson, expending the scope to cover:
Libraries using the old version of table and dcc will be updated in followup PRs once dynamic-import |
srcFragments.splice(-1, 0, ${getFingerprint()}); | ||
|
||
return srcFragments.join('.'); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Override webpack's async chunks path resolution; add fingerprint information.
dash/dash.py
Outdated
modified, | ||
'.'.join(relative_package_path.split('.')[-1:]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new pattern!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh hmm, .js.map
again...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to use a specialized fingerprint building function that should handle these cases consistently in regards to the fingerprint check. https://github.com/plotly/dash/pull/973/files#diff-2b63b39f9669123b2341a57a37913ad8R10
# Resolve real resource name from fingerprinted resource path | ||
path_in_package_dist = ( | ||
res[1] + res[3] if res is not None else path_in_package_dist | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the resource was requested with the correct pattern (from dash filling the html page or from async resources override) - use cache-control, otherwise use Etag
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💃
dash/dash.py
Outdated
@@ -676,6 +679,19 @@ def _generate_meta_html(self): | |||
|
|||
# Serve the JS bundles for each package | |||
def serve_component_suites(self, package_name, path_in_package_dist): | |||
# Check if the resource has a fingerprint | |||
res = re.match( | |||
r'^(.*)[.]v\d+_\d+_\d+(-\w+[.]?\d+)?m\d{10}([.]js)$', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this going to be used for .css
and .js.map
too? Anyway those are included in the mimetype
dict below.
Also wondering if we really need to be that specific with the pattern, and if it really needs to have a .
in the (-\w+[.]?\d+)
group? What about:
r'^(.*)[.]v[\w_-]+m\d+([.][\w.]+)$'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point about the various extensions.
.js.map
is not really an issue here as the requested resource comes from the //# sourceMappingURL=bundle.js.map
at the end of the bundle file, is handled by the browser and will never get fingerprinted in the front-end or be provided by the sever in the html. It will get an etag
but I'm not really sure if that means anything for a source-map.
As per #973 (comment) this may indeed be problematic for other file extensions of the form xx.yy
. Instead of operating under the assumption that only the last group is the extension, could be updated to only use the first group as the filename.
As for the regex being stronger vs. weaker, I was coming at it from the perspective that we have full control over the format that will be provided, so might as well make a strong check here too. That said, there's no obvious advantage to making a stronger check and maybe we end up losing some flexibility if edge-cases appear. Will update with a more general pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK cool, I'll buy that about the maps - if you tell me you've seen it working 😅
Yeah my thinking about the regex was just it's safer to match anything we're sure won't be in the regular filename. Like, what if there's some system where mod time isn't 10 digits for whatever reason, or someone has a really nonstandard version identifier.
The mimetype
dict as written right now is explicit - whether or not intentionally so - that the final extension can only be js
, css
, or map
- anything else will throw a KeyError
. Currently anything else you need would go through assets (which would be interesting to investigate for caching, though generally much less important than component js), not component suites.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you tell me you've seen it working 😅
Yup. They work :) The "routing/fingerprint" done by Dash only affects the resources put in the HTML, in that sense it's more trickery than routing 😸
For the regex, generalized the pattern here https://github.com/plotly/dash/pull/973/files#diff-2b63b39f9669123b2341a57a37913ad8R6 and its inverse. They behave as follow for various valid / invalid strings:
These strings are now tested in unit tests within Dash.
The mimetype dict as written right now is explicit
My understanding is that this was intentional. Learned about it when I added the map
extension. There's apparently a library that offers mimetype mapping but I was told at the time that it was sketchy. Needs to be revisited. If only for the woff/ddk issue.
- improve recognizing fingerprints - add fingerprint tests
const __jsonpScriptSrc__ = jsonpScriptSrc; | ||
jsonpScriptSrc = function(chunkId) { | ||
const srcFragments = __jsonpScriptSrc__(chunkId).split('.'); | ||
srcFragments.splice(1, 0, ${getFingerprint()}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed so that only the first fragment is considered the filename.
relative_package_path, | ||
importlib.import_module(namespace).__version__, | ||
modified, | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isolated the fingerprinting logic with its inverse in a easily testable file / function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic! 💃
Don't need the parens Co-Authored-By: alexcjohnson <johnson.alex.c@gmail.com>
@alexcjohnson Good catch on plotly/dash-table#632 (comment) -- testing it out. |
@alexcjohnson Non-async resources will work fine because Dash Py distinguishes the Async resources do not distinguish between being external or internal and will cause problems as currently implemented |
I think the Dash Py code can be updated relatively easily to decorate the script tag with an attribute telling the piugin if it is local/external and allowing toggling fingerprinting that way. |
Turns out there's a few scenarios for async resources. Amongst others, chaining f156ede#diff-223b8e12c9c65075d23c7d56cd3270d7R27 - if not local, just use the resource as-is, without fingerprinting. |
let src = __jsonpScriptSrc__(chunkId); | ||
|
||
if(!isLocal) { | ||
return src; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't fingerprint if using an external resource!
…ist-4.16.6 Bump browserslist from 4.5.1 to 4.16.6
Currently Dash uses caching on the browser side combined with query string cache busting (e.g.
?v=141&m=29384720349
).This is generally considered an anti-pattern as browsers have sometimes problems dealing with the query string correctly and cause the cache to always be bust. This is our case. When checking caching on
dash.plot.ly
it turns out that no JS from_dash-component-suites
is ever cached (tested on latest Chrome). Apparently we do not setcache-control max-age
.To keep with the existing behavior but also providing a second caching mechanism, adding suport for
ETag
andIf-None-Match
provides the BE with the ability to tell the browser that the cached version of the resource is indeed the latest version. When the tag received matches the resource's tag, a 304 is returned instead of a 200+data, significantly reducing network traffic and load time. This won't impact debug/dev as the resource updating will update the etag.Fingerprinting for async artifacts has also been added as part of this PR. ETag is now a fallback for resources that are not fingerprinted, whatever the reason.
(before) against
dash.plot.ly
1st and 2nd run, all resources return200
and are fetched, no caching(after) against local dash-docs 1st run (200) and 2nd run (304, or loaded from cache with fingerprinting) -- the calls are still being made to the server but the server confirms the cached resource is not stale.
On local dash-docs this is enough to reduce the time to DOMContentLoaded for
http://127.0.0.1:8000/datatable
down from 600ms to 300ms without throttling, from ~11 seconds to ~2.5 seconds with throttling the network speed toFast 3G
and from ~35 to ~6 seconds onSlow 3G
.** Having trouble uploading animated gifs with the browser's behavior at the moment - will try again later