Merge pull request #281 from automl/development

Release 1.3.3
automl · Jun 13, 2020 · 77bc0be · 77bc0be
2 parents a3bad4b + 6fd0969
commit 77bc0be
Show file tree

Hide file tree

Showing 21 changed files with 405 additions and 301 deletions.
diff --git a/.travis.yml b/.travis.yml
@@ -13,8 +13,9 @@ before_cache:
 
 env:
   matrix:
-    - DISTRIB='conda' COVERAGE="true" PYTHON_VERSION="3.5"
-    - DISTRIB="conda" COVERAGE="true" PYTHON_VERSION="3.6" DOCPUSH="true"
+    - DISTRIB="conda" COVERAGE="true" PYTHON_VERSION="3.6"
+    - DISTRIB="conda" COVERAGE="true" PYTHON_VERSION="3.7" DOCPUSH="true"
+    - DISTRIB="conda" COVERAGE="true" PYTHON_VERSION="3.8"
 
 install:
   - source ci_scripts/install.sh

diff --git a/README.md b/README.md
@@ -1,9 +1,11 @@
-Status for master branch / development branch:
+# CAVE
+## Configuration Assessment, Visualization and Evaluation
 
-[![Build Status](https://travis-ci.org/automl/CAVE.svg?branch=master)](https://travis-ci.org/automl/CAVE) / [![Build Status](https://travis-ci.org/automl/CAVE.svg?branch=development)](https://travis-ci.org/automl/CAVE)
+| master ([docs](https://automl.github.io/CAVE/stable/)) | development ([docs](https://automl.github.io/CAVE/dev/)) |
+| --- | --- |
+| [![Build Status](https://travis-ci.org/automl/CAVE.svg?branch=master)](https://travis-ci.org/automl/CAVE) | [![Build Status](https://travis-ci.org/automl/CAVE.svg?branch=development)](https://travis-ci.org/automl/CAVE) |   |
 
-# CAVE
-CAVE is a versatile analysis tool for automatic algorithm configurators. It generates comprehensive reports (e.g. http://ml.informatik.uni-freiburg.de/~biedenka/cave.html) to
+CAVE is a versatile analysis tool for automatic algorithm configurators. It generates comprehensive reports to
 give insights into the configured algorithm, the instance/feature set and also the configuration tool itself.
 
 The current version works out-of-the-box with [BOHB](https://github.com/automl/HpBandSter) and [SMAC3](https://github.com/automl/SMAC3), but can be easily adapted to other configurators: either add a custom reader or use [the CSV-Reader](https://automl.github.io/CAVE/stable/manualdoc/fileformats.html#csv) integrated in CAVE.
@@ -14,22 +16,29 @@ If you use this tool, please [cite us](#license).
 If you have feature requests or encounter bugs, feel free to contact us via the issue-tracker.
 
 # OVERVIEW 
-CAVE is an analysis tool.
-It is written in Python 3.6 and uses [SMAC3](https://github.com/automl/SMAC3), [pimp](https://github.com/automl/ParameterImportance),  and [ConfigSpace](https://github.com/automl/ConfigSpace).  
-CAVE generates performance-values (e.g. PAR10), scatter- and cdf-plots to compare the default and the optimized incumbent and provides further inside into the optimization process by quantifying the parameter- and feature-importance.  
-CAVE also generates configurator footprints to get a grip on the search behaviour of the configurator and many budget-based analyses.  
-CAVE integrates seamlessly with [jupyter-notebooks](https://github.com/automl/CAVE/blob/master/examples/cave_notebook.ipynb).
+CAVE is an analysis tool for algorithm configurators.
+The results of an algorithm configurator, e.g. SMAC or BOHB, are processed and visualized to elevate the understanding of the optimization.
+
+It is written in Python 3 and builds on [SMAC3](https://github.com/automl/SMAC3), [pyimp](https://github.com/automl/ParameterImportance),  and [ConfigSpace](https://github.com/automl/ConfigSpace).  
+
+Core features:
+  * insights into optimization process by comparison of evolution of configurations over time and budgets
+  * scatter- and cdf-plots to compare the default and the optimized incumbent and relate to the instance features
+  * quantifying parameter- and feature-importance using fANOVA, ablation or local parameter importance  
+  * interactive configurator footprints and parallel coordinate plots to get a grip on the search behaviour of the configurator
+  * using additional data generated in validation to improve performance estimations
+  * seamlessly integration with [jupyter-notebooks](https://github.com/automl/CAVE/blob/master/examples/cave_notebook.ipynb)
 
 # REQUIREMENTS
 - Python 3.6
-- SMAC3 and all its dependencies
-- ParameterImportance and all its dependencies
-- HpBandSter and all its dependencies
+- [SMAC3](https://github.com/automl/SMAC3)
+- [pyimp](https://github.com/automl/ParameterImportance)
+- [ConfigSpace](https://github.com/automl/ConfigSpace)
+- [HpBandSter](https://github.com/automl/HpBandSter)
 - everything specified in requirements.txt
 
 Some of the plots in the report are generated using [bokeh](https://bokeh.pydata.org/en/latest/). To automagically export them as `.png`s, you need to also install [phantomjs-prebuilt](https://www.npmjs.com/package/phantomjs-prebuilt). CAVE will run without it, but you will need to manually export the plots if you wish to use them (which is easily done through a button in the report).
 
-
 # INSTALLATION
 You can install CAVE via pip:
 ```
@@ -39,69 +48,104 @@ or clone the repository and install requirements into your virtual environment.
 ```
 git clone https://github.com/automl/CAVE.git && cd CAVE
 pip install -r requirements.txt
+python3 setup.py install  # (or: python3 setup.py develop)
 ```
-To have some `.png`s automagically available, you also need phantomjs.
+Optional: To have some `.png`s automagically available, you also need phantomjs.
 ```
 npm install phantomjs-prebuilt
 ```
 
 # USAGE
-Have a look at the [documentation](https://automl.github.io/CAVE/stable/) of CAVE. Here a little Quickstart-Guide for the CLI.
+Have a look at the [docs](https://automl.github.io/CAVE/stable/) of CAVE for details. Here a little Quickstart-Guide.
+
+There are two ways to use CAVE: via the commandline (CLI) or in a jupyter-notebook / python script.
 
-You can analyze results of an optimizer in one or multiple folders (that are generated with the same scenario, i.e. parallel runs).
-Provide paths to all the individual parallel results using `--folders`.
+## Jupyter-Notebooks / Python
 
-Some helpful commandline arguments:
-- `--folders`: path(s) to folder(s) containing the configurator-output (works with `output/run_*`)
+Using CAVE in your scripts is very similar to using CAVE in a jupyter-notebook.
+Take a look at [the demo](https://github.com/automl/CAVE/blob/master/examples/cave_notebook.ipynb).
+
+## CLI
+
+You can analyze results of an optimizer in one or multiple folders (multiple folders assume the same scenario, i.e. parallel runs within a single optimization).
+CAVE generates a HTML-report with all the specified analysis methods.
+Provide paths to all the individual parallel results.
+
+```
+cave /path/to/configurator/output
+```
 
-**NOTE:** *the keyword `--folders` is optional, CAVE interprets positional arguments in the commandline as folders of parallel runs*
+**NOTE:** *CAVE supports [glob](https://docs.python.org/3/library/glob.html) like path-expansion (as in `output/run_*` for multiple folders starting with `output/run(...)`*
 
-Optional:
+**NOTE:** *the `--folders`-flag is optional, CAVE interprets positional arguments in the commandline as folders of parallel runs*
+
+Important optional flags:
 - `--output`: where to save the CAVE-output
-- `--file_format`: if the automatic file-detection fails for some reason, choose from [SMAC3](https://github.com/automl/SMAC3), [SMAC2](https://www.cs.ubc.ca/labs/beta/Projects/SMAC), [CSV](https://automl.github.io/CAVE/stable/quickstart.html#csv) or [BOHB](https://github.com/automl/HpBandSter)
-- `--validation_format`: of (optional) validation data (to enhance epm-quality where appropriate), choose from [SMAC3](https://github.com/automl/SMAC3), [SMAC2](https://www.cs.ubc.ca/labs/beta/Projects/SMAC), [CSV](https://automl.github.io/CAVE/stable/quickstart.html#csv) or NONE
 - `--ta_exec_dir`: target algorithm execution directories, this should be one or multiple path(s) to
-  the directories from which the configurator was run initially. not necessary for all configurators (BOHB doesn't need it). used to find instance-files and
-  if necessary execute the `algo`-parameter of the SMAC-scenario (DEFAULT: current working directory)
-- `--parameter_importance`: calculating parameter importance is expensive, so you can
-  specify which plots you desire: `ablation`, `forward_selection`, `fanova` and/or `lpi`.
-  either provide a combination of those or use `all` or `none`
-- `--feature_analysis`: analysis features is expensive, so you can specify which
-  algorithm to run: `box_violin`, `clustering`, `importance` and/or `feature_cdf`.
-  either provide a combination of those or use `all` or `none`
-- `--no_performance_table`: toggles the tabular analysis
-- `--no_ecdf`, `--no_scatter_plots`: toggle ecdf- and scatter-plots
-- `--no_cost_over_time`: toggles the cost-over-time plot
-- `--no_parallel_coordinates`: toggles the parallel-coordinates plot
-- `--no_configurator_footprint`: toggles the configurator-footprints
-- `--no_algorithm_footprints`: toggles the algorithm-footprints
+  the directories from which the configurator was run initially.
+  Not necessary for all configurators (mainly SMAC needs it).
+  Used to find instance-files and if necessary execute the `algo`-parameter of the SMAC-scenario (DEFAULT: current working directory)
+- `--skip` and `--only`: specify any number of analyzing methods here.
+  when using `--skip` CAVE runs all *except* those, when using `--only` CAVE runs *only* those specified.
+  `--skip` and `--only` are mutually exclusive.
+  Legal values include:
+   * ablation
+   * algorithm_footprints
+   * bohb_learning_curves
+   * box_violin
+   * budget_correlation
+   * clustering
+   * configurator_footprint
+   * correlation
+   * cost_over_time
+   * ecdf
+   * fanova
+   * forward_selection
+   * importance
+   * incumbents_over_budgets
+   * local_parameter_importance
+   * lpi
+   * parallel_coordinates
+   * performance_table
+   * scatter_plot
+
+Some flags provide additional fine-tuning of the analysis methods:
+
 - `--cfp_time_slider`: `on` will add a time-slider to the interactive configurator footprint which will result in longer loading times, `off` will generate static png's at the desired quantiles
 - `--cfp_number_quantiles`: determines how many time-steps to prerender from in the configurator footprint
 - `--cot_inc_traj`: how the incumbent trajectory for the cost-over-time plot will be generated if the optimizer is BOHB (from [`racing`, `minimum`, `prefer_higher_budget`])
 
-For further information on  to use CAVE, see:
-`cave -h`
+For a full list and further information on how to use CAVE, see:
+`cave --help`
 
-# EXAMPLE
-## SMAC3
-Run CAVE on SMAC3-data for the spear-qcp example:
+### EXAMPLE
+#### SMAC3
+Run CAVE on SMAC3-data for the spear-qcp example, skipping budget-correlation:
 ```
-cave examples/smac3/example_output/* --ta_exec_dir examples/smac3/ --output output/smac3_example
+cave examples/smac3/example_output/* --ta_exec_dir examples/smac3/ --output output/smac3_example --skip budget_correlation
 ```
-This will analyze the results located in `examples/smac3` in the dirs `example_output/run_1` and `example_output/run_2`.
-The report is located in `CAVE_results/report.html`.
+This analyzes the results located in `examples/smac3` in the directories `example_output/run_1` and `example_output/run_2`.
+The resulting report is located in `CAVE_results/report.html`. View it in your favourite browser.
 `--ta_exec_dir` corresponds to the folder from which the optimizer was originally executed (used to find the necessary files for loading the `scenario`).
-For other formats, e.g.:
+
+#### BOHB
+You can also use CAVE with configurators that use budgets to estimate a quality of a certain algorithm (e.g. epochs in neural networks).
+A good example for this behaviour is [BOHB](https://github.com/automl/HpBandSter).
+To call it, for exemplary purposes only on a selection of analyzers, run:
 ```
-cave examples/smac2/ --ta_exec_dir examples/smac2/smac-output/aclib/state-run1/ --output output/smac2_example
-cave examples/csv_allinone/ --ta_exec_dir examples/csv_allinone/ --output output/csv_example
+cave examples/bohb --output output/bohb_example --only fanova ablation budget_correlation parallel_coordinates
+```
+
+#### CSV
+All your favourite configurators can be processed using [this simple CSV-format](https://automl.github.io/CAVE/stable/manualdoc/fileformats.html#csv).
+```
+cave examples/csv_allinone/run_* --ta_exec_dir examples/csv_allinone/ --output output/csv_example
 ```
 
-## BOHB
-You can also use CAVE with configurators that use budgets to estimate a quality of a certain algorithm (e.g. epochs in
-neural networks), a good example for this behaviour is [BOHB](https://github.com/automl/HpBandSter).
+#### SMAC2
+The legacy format of SMAC2 is still supported, though not extensively tested
 ```
-cave examples/bohb --output output/bohb_example
+cave examples/smac2/ --ta_exec_dir examples/smac2/smac-output/aclib/state-run1/ --output output/smac2_example
 ```
 
 # LICENSE 
@@ -124,5 +168,3 @@ If you use out tool, please cite us:
 }
 ```
 
-
-
diff --git a/cave/__version__.py b/cave/__version__.py
@@ -1 +1 @@
-__version__ = "1.3.2"
+__version__ = "1.3.3"
diff --git a/cave/analyzer/overview_table.py b/cave/analyzer/overview_table.py
@@ -19,14 +19,7 @@ def __init__(self, runscontainer):
         super().__init__(runscontainer)
         self.output_dir = runscontainer.output_dir
 
-        html_table_general, html_table_specific, html_table_cs = self.run()
-        self.result["General"] = {"table": html_table_general,
-                                  "tooltip": "General information about the optimization scenario."}
-        self.result["Run-Specific"] = {"table": html_table_specific,
-                                       "tooltip": "Information to specific runs (if there are multiple runs). Interesting "
-                                                  "for parallel optimizations or usage of budgets/fidelities."}
-        self.result["Configuration Space"] = {"table": html_table_cs,
-                                              "tooltip": "The parameter configuration space. (See github.com/automl/ConfigSpace)"}
+        self.run()
 
     def get_name(self):
         return "Meta Data"
@@ -40,21 +33,34 @@ def run(self):
         html_table_general = DataFrame(data=OrderedDict([('General', general_dict)]))
         html_table_general = html_table_general.reindex(list(general_dict.keys()))
         html_table_general = html_table_general.to_html(escape=False, header=False, justify='left')
+        self.result["General"] = {"table": html_table_general,
+                                  "tooltip": "General information about the optimization scenario."}
 
         # Run-specific / budget specific infos
-        runs = self.runscontainer.get_aggregated(keep_folders=True, keep_budgets=False)
-        runspec_dict = self._runspec_dict(runs)
-        order_spec = list(list(runspec_dict.values())[0].keys())  # Get keys of any sub-dict for order
-        html_table_specific = DataFrame(runspec_dict)
-        html_table_specific = html_table_specific.reindex(order_spec)
-        html_table_specific = html_table_specific.to_html(escape=False, justify='left')
+        for mode in ['parallel', 'budget']:
+            runspec_dict = self._runspec_dict(identify=mode)
+            if not runspec_dict:
+                continue
+            order_spec = list(list(runspec_dict.values())[0].keys())  # Get keys of any sub-dict for order
+            html_table_specific = DataFrame(runspec_dict)
+            html_table_specific = html_table_specific.reindex(order_spec)
+            html_table_specific = html_table_specific.to_html(escape=False, justify='left')
+            if mode == 'parallel':
+                self.result["Parallel Runs"] = {"table": html_table_specific,
+                                                "tooltip": "Information to individual parallel runs."}
+            if mode == 'budget':
+                self.result["Budgets"] = {"table": html_table_specific,
+                                          "tooltip": "Statistics related to the budgets used in this optimization."}
 
         # ConfigSpace in tabular form
         cs_dict = self._configspace(scenario.cs)
         cs_table = DataFrame(data=cs_dict)
         html_table_cs = cs_table.to_html(escape=False, justify='left', index=False)
+        self.result["Configuration Space"] = {"table": html_table_cs,
+                                              "tooltip": "The parameter configuration space. "
+                                                         "(See github.com/automl/ConfigSpace)"}
 
-        return html_table_general, html_table_specific, html_table_cs
+        return self.result
 
     def _general_dict(self, scenario):
         """ Generate the meta-information that holds for all runs (scenario info etc)
@@ -67,13 +73,8 @@ def _general_dict(self, scenario):
         # general stores information that holds for all runs, runspec holds information on a run-basis
         general = OrderedDict()
 
-        # TODO with multiple BOHB-run-integration
-        #    overview['Run with best incumbent'] = os.path.basename(best_run.folder)
-        #if num_conf_runs != 1:
-        #    overview['Number of configurator runs'] = num_conf_runs
-
         if len(self.runscontainer.get_budgets()) > 1:
-            general['# budgets'] = len(self.runscontainer.get_folders())
+            general['# budgets'] = len(self.runscontainer.get_budgets())
         if len(self.runscontainer.get_folders()) > 1:
             general['# parallel runs'] = len(self.runscontainer.get_folders())
 
@@ -100,14 +101,39 @@ def _general_dict(self, scenario):
         if num_feats > 0:
             general['# features (duplicates)'] = "{} ({})".format(num_feats, num_dup_feats)
 
+        general['----------'] = '----------'
+
+        combined_run = self.runscontainer.get_aggregated(False, False)[0]
+        combined_stats = self._stats_for_run(combined_run.original_runhistory,
+                                             combined_run.scenario,
+                                             combined_run.incumbent)
+        for k, v in combined_stats.items():
+            general[k] = v
+
         return general
 
-    def _runspec_dict(self, runs):
+    def _runspec_dict(self, identify='parallel'):
+        """
+        identify-keyword specifies whether to use path or budget for name
+        """
+        if identify not in ['parallel', 'budget']:
+            raise ValueError("illegal use of _runspec_dict")
+        if (identify == 'budget' and len(self.runscontainer.get_budgets()) <= 1 and
+            (self.runscontainer.get_budgets() is None or self.runscontainer.get_budgets()[0] == 0.0)):
+            return False
+        if (identify == 'parallel' and len(self.runscontainer.get_folders()) <= 1):
+            return False
+
         runspec = OrderedDict()
+        runs = self.runscontainer.get_aggregated(keep_folders=identify=='parallel',
+                                                 keep_budgets=identify=='budget')
 
         for idx, run in enumerate(runs):
+            if identify == 'budget' and len(set(run.reduced_to_budgets)) != 1:
+                raise ValueError("Runs processed here should only have a single budget specified (%s)." %
+                                 run.reduced_to_budgets)
             self.logger.debug("Path to folder for run no. {}: {}".format(idx, str(run.path_to_folder)))
-            name = os.path.basename(run.path_to_folder)
+            name = os.path.basename(run.path_to_folder) if identify == 'parallel' else str(run.reduced_to_budgets[0])
             runspec[name] = self._stats_for_run(run.original_runhistory,
                                                 run.scenario,
                                                 run.incumbent)

diff --git a/cave/analyzer/plot_scatter.py b/cave/analyzer/plot_scatter.py
@@ -43,7 +43,7 @@ def __init__(self,
             )
 
     def get_name(self):
-        return "Scatterplot"
+        return "Scatter Plot"
 
     def _plot_scatter(self,
                       default: Configuration,