Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ObserveOn performance improvements #2773

Closed
wants to merge 0 commits into from
Closed

Conversation

akarnokd
Copy link
Member

Further optimizations to observeOn.

  • Using SpscArrayQueue directly in observeOn instead of RingBuffer to avoid the synchronization block
  • Split tracking structure to serial (SubscriptionList) and timed (CompositeSubscription) in EventLoopsScheduler which improves the sequential scheduling performance because a completing task's subscription will be most likely the first item in the underlying LinkedList.

Benchmark: (i7 920, Window 7 x64, Java 1.8u31, 5x1s warmup, 5x5s iteration)

Benchmark      (size)         1.x    1.x error      this PR   this error
observeOn           1  162326,012     2458,085   166536,559     3154,174
observeOn          10  132471,205     1857,434   142517,407     3734,424 ++
observeOn         100   43282,527     2145,910   112238,179     2270,103 ++
observeOn        1000   11779,482      173,370    25726,564      309,193 ++
observeOn        2000    6756,211       89,196    12123,276      276,470 ++
observeOn        3000    4736,893      253,796     9342,673      263,667 ++
observeOn        4000    3661,874       51,359     7346,015      123,049 ++
observeOn       10000    1519,282      108,503     1546,547       21,885
observeOn      100000     151,193        2,569      156,160        1,974
observeOn     1000000      15,373        1,310       15,660        0,153
subscribeOn         1  161290,037     2867,882   164952,259      797,408
subscribeOn        10  151842,821     2448,734   147906,491     4373,682
subscribeOn       100  136418,065     1773,558   136889,052     2362,203
subscribeOn      1000   58389,066     4559,030    59482,225     1372,692
subscribeOn      2000   34089,152     9318,205    36581,203     1264,100
subscribeOn      3000   26712,331     1265,442    26519,320     1319,293
subscribeOn      4000   20118,326     2018,439    20163,395      839,709
subscribeOn     10000    8914,213      677,164     9059,934      200,158
subscribeOn    100000     958,038       43,349      965,663       60,708
subscribeOn   1000000      91,849        2,148       92,706        1,202

Notes:

  • At size = 1, the throughput varies in a +/- 3000 range on each run, and since the changes don't touch the scalar optimization, there is no real improvement there.
  • At size = 10.000 my system reached either the cache capacity or the OS scheduler's time resolution so there no improvement there on.
  • At size = 100.000 and size = 1.000.000 the throughput doubles if I introduce some extra delay (i.e., via sleep(1) or some extra work).
  • The benchmark generates a lot of garbage due to boxing: switching to a constant emitter increases the throughput subscribeOn(1.000.000) from 91 to 136.

Since it conflicts with #2772 anyway, this is PR is to let others verify the optimizations actually work on other OSes, because on my Windows, I sometimes get significant variance in the throughput during iterations. Increased iteration time may be required as well.

@akarnokd
Copy link
Member Author

Benchmark (i7 4770K, Windows 7 x64, Java 1.8u31) PR up to 366598a

Benchmark      (size)         1.x      1.x error      this PR    this error
observeOn           1    204372,986    45147,750    207462,343     3348,429
observeOn          10    170321,219    30519,528    180349,729     9635,029
observeOn         100     66150,820     3911,887    151773,980     8819,016
observeOn        1000     11387,782     3620,545     28427,477     8108,015
observeOn        2000      7180,268      899,369     15044,075     2107,491
observeOn        3000      4458,529     1949,745     10050,448     1945,057
observeOn        4000      3294,942     2865,810      4627,753      369,396
observeOn       10000      1509,448      646,732      3331,416      302,650
observeOn      100000       184,213       21,344       385,208        4,621
observeOn     1000000        18,447        1,594        21,572        0,221
subscribeOn         1    198566,731    26191,145    204882,731     7505,171
subscribeOn        10    194194,868     7907,757    193459,202     8645,835
subscribeOn       100    160472,849    75535,431    147738,528    61057,919
subscribeOn      1000     69123,783    51116,790     88955,619    25329,057
subscribeOn      2000     41765,423    58779,642     54281,820    25307,480
subscribeOn      3000     42094,519    14935,575     46571,429     3136,216
subscribeOn      4000     28593,237    31337,648     35484,209     5793,749
subscribeOn     10000     11492,688     7818,150     13295,895     5687,375
subscribeOn    100000       911,157      311,834       973,503       33,258
subscribeOn   1000000       169,743       37,696       176,479       22,568

Strangely, adding an innocent isUnsubscribed check breaks 4 tests, don't know why yet.

@akarnokd
Copy link
Member Author

Switching to j.u.c.Lock in SubscriptionList benefits the observeOn because its spinning behavior (less likely to park/unpark a thread which may take 3ms on Windows). But unfortunately, it introduces higher variance on small subscribeOn runs.

Benchmark      (size)         1.x      1.x error      this PR    this error
observeOn           1    204372,986    45147,750    202173,732    12320,313
observeOn          10    170321,219    30519,528    182154,095    11144,205
observeOn         100     66150,820     3911,887    153120,079    10437,195
observeOn        1000     11387,782     3620,545     29951,053     3853,397
observeOn        2000      7180,268      899,369     13866,119     4136,655
observeOn        3000      4458,529     1949,745      9109,964     2767,763
observeOn        4000      3294,942     2865,810      7439,672      781,102
observeOn       10000      1509,448      646,732      1893,761      165,063
observeOn      100000       184,213       21,344       221,382        7,028
observeOn     1000000        18,447        1,594        43,000        0,351
subscribeOn         1    198566,731    26191,145    204402,609     4919,448
subscribeOn        10    194194,868     7907,757    183619,836    28797,890
subscribeOn       100    160472,849    75535,431    147244,447   101925,905
subscribeOn      1000     69123,783    51116,790     84392,086    55782,068
subscribeOn      2000     41765,423    58779,642     60341,991    17596,950
subscribeOn      3000     42094,519    14935,575     42439,368    11841,639
subscribeOn      4000     28593,237    31337,648     35660,124     5625,456
subscribeOn     10000     11492,688     7818,150     13021,675     1460,180
subscribeOn    100000       911,157      311,834      1664,599      332,131
subscribeOn   1000000       169,743       37,696       180,759       21,844

@akarnokd
Copy link
Member Author

These are the results if the range is replaced by a value repeater:

Benchmark      (size)         1.x      1.x error    1.x no box       error    this no box     error
observeOn           1    204372,986    45147,750    188819,849    15431,568    181595,983    14377,740
observeOn          10    170321,219    30519,528    177593,605    12877,222    175087,705    18256,765
observeOn         100     66150,820     3911,887     68652,045     1711,296    144814,588    20151,322
observeOn        1000     11387,782     3620,545     14779,159      989,993     19994,914     2982,607
observeOn        2000      7180,268      899,369      8075,593      929,669      9877,035      560,081
observeOn        3000      4458,529     1949,745      5235,137      463,936      6393,597      605,021
observeOn        4000      3294,942     2865,810      4017,062      247,366      5045,878      133,408
observeOn       10000      1509,448      646,732      1644,436      301,704      3714,585      270,400
observeOn      100000       184,213       21,344       185,625        7,409       235,961       18,194
observeOn     1000000        18,447        1,594        20,052        1,631        24,297        0,232
subscribeOn         1    198566,731    26191,145    190542,030    87491,106    194069,237    13380,697
subscribeOn        10    194194,868     7907,757    192794,548    30258,019    183716,766    22495,251
subscribeOn       100    160472,849    75535,431    148739,487    47300,117    154242,613    63197,068
subscribeOn      1000     69123,783    51116,790     98292,952    34497,158     90132,783    54405,704
subscribeOn      2000     41765,423    58779,642     72655,350     9218,318     72559,922    14591,547
subscribeOn      3000     42094,519    14935,575     55371,153    11954,736     53789,237    12727,973
subscribeOn      4000     28593,237    31337,648     46864,543     3127,864     45835,097     4896,779
subscribeOn     10000     11492,688     7818,150     19685,490     9422,445     18831,519     7730,286
subscribeOn    100000       911,157      311,834      2225,610      154,474      2201,344      318,462
subscribeOn   1000000       169,743       37,696       257,446        7,753       257,803       11,236

Note that size = 1 doesn't run the optimized scalar scheduling code.

@benjchristensen
Copy link
Member

Do you mind rebasing this so we don't have the development path in the log?

@benjchristensen
Copy link
Member

I see significant performance testing on my machine:

1.x

Benchmark                                         (size)   Mode   Samples        Score  Score error    Units
r.o.OperatorObserveOnPerf.observeOnComputation         1  thrpt         5   113223.051     9007.330    ops/s
r.o.OperatorObserveOnPerf.observeOnComputation      1000  thrpt         5    13108.671      740.532    ops/s
r.o.OperatorObserveOnPerf.observeOnComputation   1000000  thrpt         5       15.414        0.988    ops/s
r.o.OperatorObserveOnPerf.observeOnImmediate           1  thrpt         5 10813751.080   943281.316    ops/s
r.o.OperatorObserveOnPerf.observeOnImmediate        1000  thrpt         5   227083.165     9356.767    ops/s
r.o.OperatorObserveOnPerf.observeOnImmediate     1000000  thrpt         5      193.273       21.516    ops/s
r.o.OperatorObserveOnPerf.observeOnNewThread           1  thrpt         5    16307.144     1100.723    ops/s
r.o.OperatorObserveOnPerf.observeOnNewThread        1000  thrpt         5     8365.615      235.292    ops/s
r.o.OperatorObserveOnPerf.observeOnNewThread     1000000  thrpt         5       16.573        0.892    ops/s


This PR

Benchmark                                         (size)   Mode   Samples        Score  Score error    Units
r.o.OperatorObserveOnPerf.observeOnComputation         1  thrpt         5   113905.358    32165.659    ops/s
r.o.OperatorObserveOnPerf.observeOnComputation      1000  thrpt         5    28618.423      507.627    ops/s
r.o.OperatorObserveOnPerf.observeOnComputation   1000000  thrpt         5       32.166        2.736    ops/s
r.o.OperatorObserveOnPerf.observeOnImmediate           1  thrpt         5  8402487.179   456209.469    ops/s
r.o.OperatorObserveOnPerf.observeOnImmediate        1000  thrpt         5   217228.990     9486.298    ops/s
r.o.OperatorObserveOnPerf.observeOnImmediate     1000000  thrpt         5      198.274       13.425    ops/s
r.o.OperatorObserveOnPerf.observeOnNewThread           1  thrpt         5    16996.020     2524.557    ops/s
r.o.OperatorObserveOnPerf.observeOnNewThread        1000  thrpt         5    11612.989      487.775    ops/s
r.o.OperatorObserveOnPerf.observeOnNewThread     1000000  thrpt         5       34.498        1.914    ops/s

@akarnokd
Copy link
Member Author

akarnokd commented Mar 4, 2015

Sure.

@daschl
Copy link
Contributor

daschl commented Mar 5, 2015

@akarnokd great work on all those enhancements!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants