-
Notifications
You must be signed in to change notification settings - Fork 7.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ObserveOn performance improvements #2773
Conversation
Benchmark (i7 4770K, Windows 7 x64, Java 1.8u31) PR up to 366598a
Strangely, adding an innocent isUnsubscribed check breaks 4 tests, don't know why yet. |
Switching to j.u.c.Lock in SubscriptionList benefits the observeOn because its spinning behavior (less likely to park/unpark a thread which may take 3ms on Windows). But unfortunately, it introduces higher variance on small subscribeOn runs.
|
These are the results if the range is replaced by a value repeater:
Note that size = 1 doesn't run the optimized scalar scheduling code. |
Do you mind rebasing this so we don't have the development path in the log? |
I see significant performance testing on my machine:
|
Sure. |
@akarnokd great work on all those enhancements! |
Further optimizations to
observeOn
.RingBuffer
to avoid the synchronization blockEventLoopsScheduler
which improves the sequential scheduling performance because a completing task's subscription will be most likely the first item in the underlying LinkedList.Benchmark: (i7 920, Window 7 x64, Java 1.8u31, 5x1s warmup, 5x5s iteration)
Notes:
size = 1
, the throughput varies in a +/- 3000 range on each run, and since the changes don't touch the scalar optimization, there is no real improvement there.size = 10.000
my system reached either the cache capacity or the OS scheduler's time resolution so there no improvement there on.size = 100.000
andsize = 1.000.000
the throughput doubles if I introduce some extra delay (i.e., via sleep(1) or some extra work).subscribeOn(1.000.000)
from 91 to 136.Since it conflicts with #2772 anyway, this is PR is to let others verify the optimizations actually work on other OSes, because on my Windows, I sometimes get significant variance in the throughput during iterations. Increased iteration time may be required as well.