-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perf: avoid (un)premultiply when using overlayWith without alpha channel #573
Comments
Your experience is as expected, but perhaps not documented clearly enough. The discussion at #241 is the best place for this.
It's probably not as bad as you think, but happy to be corrected by evidence to the contrary. |
I prepared a benchmark for sharp and canvas, in both cases image files are The single sharp operations are nicely fast, but writing to raw buffer and reading back into new sharp instance every time, which is required for overlaying multiple image files, seems to be a bottleneck. I get 0.70 ops/sec ±1.79% (7 runs sampled) for the canvas test on the test system, Setup: Debian Jessie amd64, node v6.3.0, sharp@0.16.0 and canvas@1.5.0. |
Thanks for these benchmark tests; the bottleneck is the premultiply/unpremultiply roundtrip per overlay. The "faded out using the alpha value" statement in the cairo_paint_with_alpha docs suggest cairo will suffer this also. A possible performance improvement to sharp would be to skip the (un)premultiply part of the process entirely when the overlay image does not contain an alpha channel. Let's use this issue to track that work as it will benefit everyone using |
@lovell Any updates on this? I'm trying to move some tasks from running in a background queue (current) to being on-the-fly. I do about 10-100 overlayWith statements per job and it takes a considerable/non-negligible amount of time, even on a powerful machine! |
@sraka1 Not yet, but I'd be happy to help with and accept a PR if you'd like to work on it. |
Commit 35859fd on the avoid-premultiply-for-overlay-without-alpha branch should do this. @strarsis @sraka1 Are you able to verify this improves performance?
|
@lovell: I installed the patched sharp and run the previously used benchmark with it: With current sharp release (unpatched): There is a definite improvement of memory consumption and stability, You can also try it out yourself here: https://github.com/strarsis/sharp-overlay-benchmark/tree/sharp-patch |
@strarsis Thanks for checking. I've thought of a (hopefully) much faster approach for the non-alpha overlay case; leave it with me. |
Commit 5c5b708 on the avoid-premultiply-for-overlay-without-alpha branch should improve things and even adds a test or two. If the overlay image does not contain an alpha channel, and premultiplication is not otherwise required, a faster image "insert" will take place instead. This will now always be the case when overlaying non-alpha onto non-alpha. |
@strarsis You'll need to set |
Yes, it is faster now. Interestingly it also seems to cache the previous operations. But then in further test cycles it suddenly slows down a lot towards the middle/end. |
@strarsis Thanks for confirming, I noticed memory usage was high too. My first thought is that @sraka1 Are you able to test with the commit referred to above to see if it helps the performance of your use case? |
The change here is slightly breaking in that it's possible for output images that used to have 4 channels to now end up with only 3, so this will have to wait until v0.18.0. |
Updated sharp dependency. channel count and some minor improvements in benchmark code: https://github.com/strarsis/sharp-overlay-benchmark With a more larger amount of images my system hangs because of too high memory consumption, |
@lovell: With ridge branch performance increased 0.07 ops/sec. Runs very fast first, then sporadic slow downs/halts towards the end, faster after initial "warm up", heavy memory issues after 2nd consecutive run. |
I'm attempting to get this technique to work, in order to do multiple overlayWith (to stitch together stripes from headless Chrome, which can only screenshot images < 16384 high). In my case, the overlay sources have 4 channels. Is there are a way to force use of unmultiply (i.e. ignore the alpha channel) since I know in advance it will be full of zeros? My code, if it helps:
Each loop around the overlayWith takes around 1 sec (Mac OS X), with the On a side note (I know discussed in various issues), the variety and potential inefficiency in all this means supporting multiple Specifically, whatever optimisations can be performed such as buffer reuse or other optimisations like the (un)multiply are in general way beyond simple devs like me :) |
@matAtWork: Are you using the ridge branch of sharp which contains these new optimizations?
|
Yep |
@matAtWork: Also set |
Yeah, I saw that. The issue is that although I create the destination with |
Actually, forget that last bit - channels:3 doesn't work as the PNG buffer always renders into RGBA. It's that that is (probably) defeating the optimisation. So the question is a more general "How do I get a 3 channel raw data from sharp?" |
[removed] |
I've tried various combinations of |
@matAtWork: Also see #680 , #534 . |
Sorry if this has all gone a bit off-topic, but I found the fastest (by a long, long way) solution was to simply append the raw buffers (either 3 or 4 channel) produced by sharp(pngSlices) in memory using node, and then using sharp() again to generate an image based on the big buffer and write it to disk. When I say faster, I mean like 10x faster:
So the solution here is to avoid Thanks for all your hints/tips - it definitely helped by understand the issues a lot better. |
v0.18.0 now available with this improvement, thanks all for the feedback. |
It appears that overlayWith used multiple time on the same sharp instance results
only in the last overlayed image being actually applied.
The workaround, writing to raw buffer and loading into new sharp instance
each time when overlaying a new image, would be wasteful.
The text was updated successfully, but these errors were encountered: