-
Notifications
You must be signed in to change notification settings - Fork 27
Benchmarker
Tortoise has a benchmarking tool that allows you to run benchmarks in Node.js using the V8 JavaScript runtime engine.Benchmarking results are dumped to the engine-benchmarks.txt
file at the root of the repository. To run the benchmarker, simply run the netLogoWeb/benchmark
task in sbt.
The benchmarker also accepts some options to customize your benchmarking runs. They are as follows:
-
--comment
/--comments
- Takes one argument, which is a string comment that will be associated with your benchmarking run. Useful to help not forget why you ran the benchmark or specifying the "before" and "after" of a series of benchmarks.
-
--quick
- Takes no arguments. Overrides all of the options listed below to force a fast benchmarking run (1 iteration of BZ Benchmark in V8).
-
--count
/--iters
/--num
- Takes one argument, which is the number of times you would like the benchmarks run in each engine. The default value is
3
.
- Takes one argument, which is the number of times you would like the benchmarks run in each engine. The default value is
-
--ticks
- Takes one argument, which is the number of ticks to run the
go
procedure of a model for if that model doesn't have its ownbenchmark
procedure. The default is100
. Note thatsetup
will also be assumed to exist and be run first when using this option. A lot of models have a non-default number ofticks
to run set in theModel.scala
file.
- Takes one argument, which is the number of ticks to run the
-
--engine
- Takes one or more arguments, each of which indicates an engine in which the benchmarks should be run.
node
,v8
,google
, orchrome
indicates V8;mozilla
,firefox
, orspidermonkey
indicates SpiderMonkey;graal
,java
,oracle
,rhino
, ornashorn
indicates the GraalVM JS engine. You can install SpiderMonkey locally on your machine and make sure it's on yourPATH
to try to get it working, too, but it hasn't been tested in a long time. Typically we only test V8 unless we have a good reason to check the others.
- Takes one or more arguments, each of which indicates an engine in which the benchmarks should be run.
Here are some examples (assuming that these are being run from the root of the repository):
-
./sbt.sh netLogoWeb/benchmark
- Run the benchmarker with the default configuration (3 iterations, all models, all engines)
-
./sbt.sh 'netLogoWeb/benchmark --quick --comment "Redesigned turtle jumping"'
- Run the quick benchmarking mode (1 iteration, BZ Benchmark, only in V8), using the comment "Redesigned turtle jumping"
-
./sbt.sh 'netLogoWeb/benchmark --iters 5 --engine graal --models "Wealth Benchmark" "Heatbugs Benchmark" "Erosion Benchmark"'
- Run 5 iterations each of three different models in the GraalVM JS engine.
-
./sbt.sh netLogoWeb/benchmark --count 9001 --engine oracle mozilla
- Run the benchmarker 9001 times in both Nashorn and SpiderMonkey. (P.S. This will take forever.)
Benchmarking should be done a system with no other activities going on. Browsing the web, working on other code, or watching videos will impact the results due to processor and memory contention. Disabling networking can help ensure no background process fires up to download updates or do other idle work. For the best results run a lot of iters
and a high number of ticks
. The faster the model finishes a single iteration the more variance you're likely to see due to startup/warmup time. The benchmarker, by default, uses the same random seed for all runs. This means the variance between runs should be low, so seeing a high variance is a good indication something was impacting the available resources for the run.
To put it another way, running a low number of iterations with a modest 100 ticks (the defaults) on a busy machine will only reliably show very large performance differences. Any other differences you see are likely to be within the margin of error of the noise and variance. The solution to get results that can be trusted for changes with low impact (say 1-5% change in performance) is to run a higher number of iterations and ticks together on a very quiet machine.
Writing out your benchmarking command(s) into a simple shell script is really useful for making it easy to run the before and after sequentially (with the appropriate git commands in-between) without user intervention, or just having the benchmark be repeatable as work is done.