Switch from gauge to tasty-bench #100

Bodigrim · 2021-04-14T22:25:00Z

gauge still cannot be compiled with GHC 9.0 because of basement, and further it will be broken once again by sized primitives in GHC 9.2. Switching to tasty-bench allows running benchmarks against GHC 9.0 and 9.2, which reveals pretty gruesome picture.

cabal bench -w ghc-8.10.4 --ghc-options '-fproc-alignment=64' --benchmark-options '--csv 8.10.4.csv --hide-successes' random:bench

GHC 8.10.4 vs. GHC 9.0.1:

cabal bench -w ghc-9.0.1  --ghc-options '-fproc-alignment=64' --benchmark-options '--baseline 8.10.4.csv --csv 9.0.1.csv --hide-successes --fail-if-slower 50' random:bench

All
  pure
    uniformR
      full
        Word:                       FAIL (0.26s)
          488 μs ±  22 μs, 879% slower than baseline
        Int:                        FAIL (0.13s)
          485 μs ±  43 μs, 885% slower than baseline
        Char:                       FAIL (0.18s)
           25 ms ± 1.7 ms, 8733% slower than baseline
      excludeMax
        Char:                       FAIL (0.39s)
           25 ms ± 2.2 ms, 6210% slower than baseline
      includeHalf
        Char:                       FAIL (0.18s)
           26 ms ± 1.6 ms, 7748% slower than baseline
      floating
        St
          uniformFloat01M:          FAIL (0.16s)
            623 μs ±  45 μs, 1169% slower than baseline
          uniformFloatPositive01M:  FAIL (0.33s)
            642 μs ±  22 μs, 1206% slower than baseline
          uniformDouble01M:         FAIL (0.17s)
            626 μs ±  44 μs, 1157% slower than baseline
          uniformDoublePositive01M: FAIL (0.17s)
            631 μs ±  46 μs, 1177% slower than baseline

9 out of 40 tests failed (9.74s)

GHC 8.10.4 vs GHC 9.2.0 alpha:

cabal bench -w ghc-9.2.0.20210331 --allow-newer='split:base,splitmix:base,tagged:template-haskell' --ghc-options '-fproc-alignment=64' --benchmark-options '--baseline 8.10.4.csv --csv 9.2.0.csv --hide-successes --fail-if-slower 50' random:bench

All
  pure
    uniformR
      full
        Word:                       FAIL (0.24s)
          488 μs ±  38 μs, 879% slower than baseline
        Int:                        FAIL (0.11s)
          485 μs ±  47 μs, 886% slower than baseline
        Char:                       FAIL (0.26s)
           37 ms ± 2.1 ms, 13077% slower than baseline
      excludeMax
        Char:                       FAIL (0.25s)
           37 ms ± 1.7 ms, 9097% slower than baseline
      includeHalf
        Char:                       FAIL (0.27s)
           38 ms ± 3.1 ms, 11484% slower than baseline
      floating
        IO
          uniformFloatPositive01M:  FAIL (0.19s)
             27 ms ± 1.5 ms, 55090% slower than baseline
          uniformDoublePositive01M: FAIL (0.18s)
             26 ms ± 1.3 ms, 52708% slower than baseline
        St
          uniformFloat01M:          FAIL (0.17s)
            645 μs ±  52 μs, 1216% slower than baseline
          uniformFloatPositive01M:  FAIL (0.14s)
             19 ms ± 1.7 ms, 38263% slower than baseline
          uniformDouble01M:         FAIL (0.17s)
            646 μs ±  58 μs, 1197% slower than baseline
          uniformDoublePositive01M: FAIL (0.27s)
             17 ms ± 882 μs, 35324% slower than baseline
        pure
          uniformFloatPositive01M:  FAIL (0.14s)
             19 ms ± 1.8 ms, 38854% slower than baseline
          uniformDoublePositive01M: FAIL (0.12s)
             17 ms ± 1.5 ms, 34514% slower than baseline

13 out of 40 tests failed (10.43s)

It seems that inlining has significantly changed in GHC 9.0 (e. g., {-# INLINE unbiasedWordMult32RM #-} fixes couple of regressions). I intend to relay this data to GHC team, once the branch is merged.

lehins

PR looks good. Benchmark results however look very concerning.

@Bodigrim I'll merge it tomorrow, just in case you think of last minute changes.

idontgetoutmuch · 2021-04-15T05:58:37Z

@Bodigrim great work finding this before it's too late (I hope it's not too late).

Shimuuar · 2021-04-15T08:41:37Z

@Bodigrim great (and scary) find. It also means that optimizations aren't very robust

Bodigrim · 2021-04-15T19:20:13Z

There is something weird going on. I'll be off for several days, so just dump my observations here.

If I run pure/uniformR/full/CUShort benchmark on my machine with GHC 8.10 (both with gauge and with tasty-bench), I see that 100000 random numbers are generated in around 30 microseconds. Which means 3 random numbers per nanosecond. Which is way too fast, right?

If I scrap everything else except

main :: IO ()
main = do
  let !sz = 100000
  defaultMain
    [ bgroup "pure"
      [ bgroup "uniformR"
        [ bgroup "full"
          [ pureUniformRFullBench (Proxy :: Proxy CUShort) sz
          ]
        ]
      ]
    ]

and look at generated Core, there is no random number generation at all. The main routine looks like

main_$s$wgo
  :: State# RealWorld -> Int# -> Int -> (# State# RealWorld, () #)
main_$s$wgo
  = \ (sc_s7ND :: State# RealWorld)
      (sc1_s7NC :: Int#)
      (sc2_s7NB :: Int) ->
      case <=# sc1_s7NC 0# of {
        __DEFAULT ->
          case seq#
                 (case sc2_s7NB of { I# ww1_s7FD ->
                  joinrec {
                    $wgo_s7Fz :: Int# -> ()
                    $wgo_s7Fz (ww2_s7Fx :: Int#)
                      = case <# ww2_s7Fx ww1_s7FD of {
                          __DEFAULT -> ();
                          1# -> jump $wgo_s7Fz (+# ww2_s7Fx 1#)
                        }; } in
                  jump $wgo_s7Fz 0#
                  })
                 sc_s7ND
          of
          { (# ipv_a7cy, ipv1_a7cz #) ->
          main_$s$wgo ipv_a7cy (-# sc1_s7NC 1#) sc2_s7NB
          };
        1# -> (# sc_s7ND, () #)
      }

which is just an empty loop.

lehins · 2021-04-15T21:24:21Z

@Bodigrim Yep, really good catch. All those benchmarks turned out to be bogus. ghc was "smart enough" to get rid of "unneeded" computation and all those benchmarks were checking was the performance of the loop.

I'll have a fix for the suite later on today.

lehins · 2021-04-16T16:19:21Z

Fix for benchmarks and the major regression: #101

Switch from gauge to tasty-bench

7fdde0e

lehins self-requested a review April 15, 2021 00:09

lehins approved these changes Apr 15, 2021

View reviewed changes

lehins merged commit ac3fbbb into haskell:master Apr 15, 2021

Bodigrim deleted the tasty-bench branch April 15, 2021 19:34

Bodigrim mentioned this pull request Apr 28, 2021

Inspection testing? #102

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch from gauge to tasty-bench #100

Switch from gauge to tasty-bench #100

Bodigrim commented Apr 14, 2021

lehins left a comment

idontgetoutmuch commented Apr 15, 2021

Shimuuar commented Apr 15, 2021

Bodigrim commented Apr 15, 2021

lehins commented Apr 15, 2021

lehins commented Apr 16, 2021

Switch from gauge to tasty-bench #100

Switch from gauge to tasty-bench #100

Conversation

Bodigrim commented Apr 14, 2021

lehins left a comment

Choose a reason for hiding this comment

idontgetoutmuch commented Apr 15, 2021

Shimuuar commented Apr 15, 2021

Bodigrim commented Apr 15, 2021

lehins commented Apr 15, 2021

lehins commented Apr 16, 2021