Skip to content

Commit

Permalink
Merge pull request #281 from automl/development
Browse files Browse the repository at this point in the history
Release 1.3.3
  • Loading branch information
shukon authored Jun 13, 2020
2 parents a3bad4b + 6fd0969 commit 77bc0be
Show file tree
Hide file tree
Showing 21 changed files with 405 additions and 301 deletions.
5 changes: 3 additions & 2 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,9 @@ before_cache:

env:
matrix:
- DISTRIB='conda' COVERAGE="true" PYTHON_VERSION="3.5"
- DISTRIB="conda" COVERAGE="true" PYTHON_VERSION="3.6" DOCPUSH="true"
- DISTRIB="conda" COVERAGE="true" PYTHON_VERSION="3.6"
- DISTRIB="conda" COVERAGE="true" PYTHON_VERSION="3.7" DOCPUSH="true"
- DISTRIB="conda" COVERAGE="true" PYTHON_VERSION="3.8"

install:
- source ci_scripts/install.sh
Expand Down
150 changes: 96 additions & 54 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
Status for master branch / development branch:
# CAVE
## Configuration Assessment, Visualization and Evaluation

[![Build Status](https://travis-ci.org/automl/CAVE.svg?branch=master)](https://travis-ci.org/automl/CAVE) / [![Build Status](https://travis-ci.org/automl/CAVE.svg?branch=development)](https://travis-ci.org/automl/CAVE)
| master ([docs](https://automl.github.io/CAVE/stable/)) | development ([docs](https://automl.github.io/CAVE/dev/)) |
| --- | --- |
| [![Build Status](https://travis-ci.org/automl/CAVE.svg?branch=master)](https://travis-ci.org/automl/CAVE) | [![Build Status](https://travis-ci.org/automl/CAVE.svg?branch=development)](https://travis-ci.org/automl/CAVE) | |

# CAVE
CAVE is a versatile analysis tool for automatic algorithm configurators. It generates comprehensive reports (e.g. http://ml.informatik.uni-freiburg.de/~biedenka/cave.html) to
CAVE is a versatile analysis tool for automatic algorithm configurators. It generates comprehensive reports to
give insights into the configured algorithm, the instance/feature set and also the configuration tool itself.

The current version works out-of-the-box with [BOHB](https://github.com/automl/HpBandSter) and [SMAC3](https://github.com/automl/SMAC3), but can be easily adapted to other configurators: either add a custom reader or use [the CSV-Reader](https://automl.github.io/CAVE/stable/manualdoc/fileformats.html#csv) integrated in CAVE.
Expand All @@ -14,22 +16,29 @@ If you use this tool, please [cite us](#license).
If you have feature requests or encounter bugs, feel free to contact us via the issue-tracker.

# OVERVIEW
CAVE is an analysis tool.
It is written in Python 3.6 and uses [SMAC3](https://github.com/automl/SMAC3), [pimp](https://github.com/automl/ParameterImportance), and [ConfigSpace](https://github.com/automl/ConfigSpace).
CAVE generates performance-values (e.g. PAR10), scatter- and cdf-plots to compare the default and the optimized incumbent and provides further inside into the optimization process by quantifying the parameter- and feature-importance.
CAVE also generates configurator footprints to get a grip on the search behaviour of the configurator and many budget-based analyses.
CAVE integrates seamlessly with [jupyter-notebooks](https://github.com/automl/CAVE/blob/master/examples/cave_notebook.ipynb).
CAVE is an analysis tool for algorithm configurators.
The results of an algorithm configurator, e.g. SMAC or BOHB, are processed and visualized to elevate the understanding of the optimization.

It is written in Python 3 and builds on [SMAC3](https://github.com/automl/SMAC3), [pyimp](https://github.com/automl/ParameterImportance), and [ConfigSpace](https://github.com/automl/ConfigSpace).

Core features:
* insights into optimization process by comparison of evolution of configurations over time and budgets
* scatter- and cdf-plots to compare the default and the optimized incumbent and relate to the instance features
* quantifying parameter- and feature-importance using fANOVA, ablation or local parameter importance
* interactive configurator footprints and parallel coordinate plots to get a grip on the search behaviour of the configurator
* using additional data generated in validation to improve performance estimations
* seamlessly integration with [jupyter-notebooks](https://github.com/automl/CAVE/blob/master/examples/cave_notebook.ipynb)

# REQUIREMENTS
- Python 3.6
- SMAC3 and all its dependencies
- ParameterImportance and all its dependencies
- HpBandSter and all its dependencies
- [SMAC3](https://github.com/automl/SMAC3)
- [pyimp](https://github.com/automl/ParameterImportance)
- [ConfigSpace](https://github.com/automl/ConfigSpace)
- [HpBandSter](https://github.com/automl/HpBandSter)
- everything specified in requirements.txt

Some of the plots in the report are generated using [bokeh](https://bokeh.pydata.org/en/latest/). To automagically export them as `.png`s, you need to also install [phantomjs-prebuilt](https://www.npmjs.com/package/phantomjs-prebuilt). CAVE will run without it, but you will need to manually export the plots if you wish to use them (which is easily done through a button in the report).


# INSTALLATION
You can install CAVE via pip:
```
Expand All @@ -39,69 +48,104 @@ or clone the repository and install requirements into your virtual environment.
```
git clone https://github.com/automl/CAVE.git && cd CAVE
pip install -r requirements.txt
python3 setup.py install # (or: python3 setup.py develop)
```
To have some `.png`s automagically available, you also need phantomjs.
Optional: To have some `.png`s automagically available, you also need phantomjs.
```
npm install phantomjs-prebuilt
```

# USAGE
Have a look at the [documentation](https://automl.github.io/CAVE/stable/) of CAVE. Here a little Quickstart-Guide for the CLI.
Have a look at the [docs](https://automl.github.io/CAVE/stable/) of CAVE for details. Here a little Quickstart-Guide.

There are two ways to use CAVE: via the commandline (CLI) or in a jupyter-notebook / python script.

You can analyze results of an optimizer in one or multiple folders (that are generated with the same scenario, i.e. parallel runs).
Provide paths to all the individual parallel results using `--folders`.
## Jupyter-Notebooks / Python

Some helpful commandline arguments:
- `--folders`: path(s) to folder(s) containing the configurator-output (works with `output/run_*`)
Using CAVE in your scripts is very similar to using CAVE in a jupyter-notebook.
Take a look at [the demo](https://github.com/automl/CAVE/blob/master/examples/cave_notebook.ipynb).

## CLI

You can analyze results of an optimizer in one or multiple folders (multiple folders assume the same scenario, i.e. parallel runs within a single optimization).
CAVE generates a HTML-report with all the specified analysis methods.
Provide paths to all the individual parallel results.

```
cave /path/to/configurator/output
```

**NOTE:** *the keyword `--folders` is optional, CAVE interprets positional arguments in the commandline as folders of parallel runs*
**NOTE:** *CAVE supports [glob](https://docs.python.org/3/library/glob.html) like path-expansion (as in `output/run_*` for multiple folders starting with `output/run(...)`*

Optional:
**NOTE:** *the `--folders`-flag is optional, CAVE interprets positional arguments in the commandline as folders of parallel runs*

Important optional flags:
- `--output`: where to save the CAVE-output
- `--file_format`: if the automatic file-detection fails for some reason, choose from [SMAC3](https://github.com/automl/SMAC3), [SMAC2](https://www.cs.ubc.ca/labs/beta/Projects/SMAC), [CSV](https://automl.github.io/CAVE/stable/quickstart.html#csv) or [BOHB](https://github.com/automl/HpBandSter)
- `--validation_format`: of (optional) validation data (to enhance epm-quality where appropriate), choose from [SMAC3](https://github.com/automl/SMAC3), [SMAC2](https://www.cs.ubc.ca/labs/beta/Projects/SMAC), [CSV](https://automl.github.io/CAVE/stable/quickstart.html#csv) or NONE
- `--ta_exec_dir`: target algorithm execution directories, this should be one or multiple path(s) to
the directories from which the configurator was run initially. not necessary for all configurators (BOHB doesn't need it). used to find instance-files and
if necessary execute the `algo`-parameter of the SMAC-scenario (DEFAULT: current working directory)
- `--parameter_importance`: calculating parameter importance is expensive, so you can
specify which plots you desire: `ablation`, `forward_selection`, `fanova` and/or `lpi`.
either provide a combination of those or use `all` or `none`
- `--feature_analysis`: analysis features is expensive, so you can specify which
algorithm to run: `box_violin`, `clustering`, `importance` and/or `feature_cdf`.
either provide a combination of those or use `all` or `none`
- `--no_performance_table`: toggles the tabular analysis
- `--no_ecdf`, `--no_scatter_plots`: toggle ecdf- and scatter-plots
- `--no_cost_over_time`: toggles the cost-over-time plot
- `--no_parallel_coordinates`: toggles the parallel-coordinates plot
- `--no_configurator_footprint`: toggles the configurator-footprints
- `--no_algorithm_footprints`: toggles the algorithm-footprints
the directories from which the configurator was run initially.
Not necessary for all configurators (mainly SMAC needs it).
Used to find instance-files and if necessary execute the `algo`-parameter of the SMAC-scenario (DEFAULT: current working directory)
- `--skip` and `--only`: specify any number of analyzing methods here.
when using `--skip` CAVE runs all *except* those, when using `--only` CAVE runs *only* those specified.
`--skip` and `--only` are mutually exclusive.
Legal values include:
* ablation
* algorithm_footprints
* bohb_learning_curves
* box_violin
* budget_correlation
* clustering
* configurator_footprint
* correlation
* cost_over_time
* ecdf
* fanova
* forward_selection
* importance
* incumbents_over_budgets
* local_parameter_importance
* lpi
* parallel_coordinates
* performance_table
* scatter_plot

Some flags provide additional fine-tuning of the analysis methods:

- `--cfp_time_slider`: `on` will add a time-slider to the interactive configurator footprint which will result in longer loading times, `off` will generate static png's at the desired quantiles
- `--cfp_number_quantiles`: determines how many time-steps to prerender from in the configurator footprint
- `--cot_inc_traj`: how the incumbent trajectory for the cost-over-time plot will be generated if the optimizer is BOHB (from [`racing`, `minimum`, `prefer_higher_budget`])

For further information on to use CAVE, see:
`cave -h`
For a full list and further information on how to use CAVE, see:
`cave --help`

# EXAMPLE
## SMAC3
Run CAVE on SMAC3-data for the spear-qcp example:
### EXAMPLE
#### SMAC3
Run CAVE on SMAC3-data for the spear-qcp example, skipping budget-correlation:
```
cave examples/smac3/example_output/* --ta_exec_dir examples/smac3/ --output output/smac3_example
cave examples/smac3/example_output/* --ta_exec_dir examples/smac3/ --output output/smac3_example --skip budget_correlation
```
This will analyze the results located in `examples/smac3` in the dirs `example_output/run_1` and `example_output/run_2`.
The report is located in `CAVE_results/report.html`.
This analyzes the results located in `examples/smac3` in the directories `example_output/run_1` and `example_output/run_2`.
The resulting report is located in `CAVE_results/report.html`. View it in your favourite browser.
`--ta_exec_dir` corresponds to the folder from which the optimizer was originally executed (used to find the necessary files for loading the `scenario`).
For other formats, e.g.:

#### BOHB
You can also use CAVE with configurators that use budgets to estimate a quality of a certain algorithm (e.g. epochs in neural networks).
A good example for this behaviour is [BOHB](https://github.com/automl/HpBandSter).
To call it, for exemplary purposes only on a selection of analyzers, run:
```
cave examples/smac2/ --ta_exec_dir examples/smac2/smac-output/aclib/state-run1/ --output output/smac2_example
cave examples/csv_allinone/ --ta_exec_dir examples/csv_allinone/ --output output/csv_example
cave examples/bohb --output output/bohb_example --only fanova ablation budget_correlation parallel_coordinates
```

#### CSV
All your favourite configurators can be processed using [this simple CSV-format](https://automl.github.io/CAVE/stable/manualdoc/fileformats.html#csv).
```
cave examples/csv_allinone/run_* --ta_exec_dir examples/csv_allinone/ --output output/csv_example
```

## BOHB
You can also use CAVE with configurators that use budgets to estimate a quality of a certain algorithm (e.g. epochs in
neural networks), a good example for this behaviour is [BOHB](https://github.com/automl/HpBandSter).
#### SMAC2
The legacy format of SMAC2 is still supported, though not extensively tested
```
cave examples/bohb --output output/bohb_example
cave examples/smac2/ --ta_exec_dir examples/smac2/smac-output/aclib/state-run1/ --output output/smac2_example
```

# LICENSE
Expand All @@ -124,5 +168,3 @@ If you use out tool, please cite us:
}
```



2 changes: 1 addition & 1 deletion cave/__version__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = "1.3.2"
__version__ = "1.3.3"
72 changes: 49 additions & 23 deletions cave/analyzer/overview_table.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,14 +19,7 @@ def __init__(self, runscontainer):
super().__init__(runscontainer)
self.output_dir = runscontainer.output_dir

html_table_general, html_table_specific, html_table_cs = self.run()
self.result["General"] = {"table": html_table_general,
"tooltip": "General information about the optimization scenario."}
self.result["Run-Specific"] = {"table": html_table_specific,
"tooltip": "Information to specific runs (if there are multiple runs). Interesting "
"for parallel optimizations or usage of budgets/fidelities."}
self.result["Configuration Space"] = {"table": html_table_cs,
"tooltip": "The parameter configuration space. (See github.com/automl/ConfigSpace)"}
self.run()

def get_name(self):
return "Meta Data"
Expand All @@ -40,21 +33,34 @@ def run(self):
html_table_general = DataFrame(data=OrderedDict([('General', general_dict)]))
html_table_general = html_table_general.reindex(list(general_dict.keys()))
html_table_general = html_table_general.to_html(escape=False, header=False, justify='left')
self.result["General"] = {"table": html_table_general,
"tooltip": "General information about the optimization scenario."}

# Run-specific / budget specific infos
runs = self.runscontainer.get_aggregated(keep_folders=True, keep_budgets=False)
runspec_dict = self._runspec_dict(runs)
order_spec = list(list(runspec_dict.values())[0].keys()) # Get keys of any sub-dict for order
html_table_specific = DataFrame(runspec_dict)
html_table_specific = html_table_specific.reindex(order_spec)
html_table_specific = html_table_specific.to_html(escape=False, justify='left')
for mode in ['parallel', 'budget']:
runspec_dict = self._runspec_dict(identify=mode)
if not runspec_dict:
continue
order_spec = list(list(runspec_dict.values())[0].keys()) # Get keys of any sub-dict for order
html_table_specific = DataFrame(runspec_dict)
html_table_specific = html_table_specific.reindex(order_spec)
html_table_specific = html_table_specific.to_html(escape=False, justify='left')
if mode == 'parallel':
self.result["Parallel Runs"] = {"table": html_table_specific,
"tooltip": "Information to individual parallel runs."}
if mode == 'budget':
self.result["Budgets"] = {"table": html_table_specific,
"tooltip": "Statistics related to the budgets used in this optimization."}

# ConfigSpace in tabular form
cs_dict = self._configspace(scenario.cs)
cs_table = DataFrame(data=cs_dict)
html_table_cs = cs_table.to_html(escape=False, justify='left', index=False)
self.result["Configuration Space"] = {"table": html_table_cs,
"tooltip": "The parameter configuration space. "
"(See github.com/automl/ConfigSpace)"}

return html_table_general, html_table_specific, html_table_cs
return self.result

def _general_dict(self, scenario):
""" Generate the meta-information that holds for all runs (scenario info etc)
Expand All @@ -67,13 +73,8 @@ def _general_dict(self, scenario):
# general stores information that holds for all runs, runspec holds information on a run-basis
general = OrderedDict()

# TODO with multiple BOHB-run-integration
# overview['Run with best incumbent'] = os.path.basename(best_run.folder)
#if num_conf_runs != 1:
# overview['Number of configurator runs'] = num_conf_runs

if len(self.runscontainer.get_budgets()) > 1:
general['# budgets'] = len(self.runscontainer.get_folders())
general['# budgets'] = len(self.runscontainer.get_budgets())
if len(self.runscontainer.get_folders()) > 1:
general['# parallel runs'] = len(self.runscontainer.get_folders())

Expand All @@ -100,14 +101,39 @@ def _general_dict(self, scenario):
if num_feats > 0:
general['# features (duplicates)'] = "{} ({})".format(num_feats, num_dup_feats)

general['----------'] = '----------'

combined_run = self.runscontainer.get_aggregated(False, False)[0]
combined_stats = self._stats_for_run(combined_run.original_runhistory,
combined_run.scenario,
combined_run.incumbent)
for k, v in combined_stats.items():
general[k] = v

return general

def _runspec_dict(self, runs):
def _runspec_dict(self, identify='parallel'):
"""
identify-keyword specifies whether to use path or budget for name
"""
if identify not in ['parallel', 'budget']:
raise ValueError("illegal use of _runspec_dict")
if (identify == 'budget' and len(self.runscontainer.get_budgets()) <= 1 and
(self.runscontainer.get_budgets() is None or self.runscontainer.get_budgets()[0] == 0.0)):
return False
if (identify == 'parallel' and len(self.runscontainer.get_folders()) <= 1):
return False

runspec = OrderedDict()
runs = self.runscontainer.get_aggregated(keep_folders=identify=='parallel',
keep_budgets=identify=='budget')

for idx, run in enumerate(runs):
if identify == 'budget' and len(set(run.reduced_to_budgets)) != 1:
raise ValueError("Runs processed here should only have a single budget specified (%s)." %
run.reduced_to_budgets)
self.logger.debug("Path to folder for run no. {}: {}".format(idx, str(run.path_to_folder)))
name = os.path.basename(run.path_to_folder)
name = os.path.basename(run.path_to_folder) if identify == 'parallel' else str(run.reduced_to_budgets[0])
runspec[name] = self._stats_for_run(run.original_runhistory,
run.scenario,
run.incumbent)
Expand Down
2 changes: 1 addition & 1 deletion cave/analyzer/plot_scatter.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ def __init__(self,
)

def get_name(self):
return "Scatterplot"
return "Scatter Plot"

def _plot_scatter(self,
default: Configuration,
Expand Down
Loading

0 comments on commit 77bc0be

Please sign in to comment.