Merge pull request #241 from facebookresearch/stable

Release v0.1.8
facebookresearch · Apr 30, 2021 · 8803ad8 · 8803ad8
2 parents 5473e5e + e53ad52
commit 8803ad8
Show file tree

Hide file tree

Showing 5 changed files with 124 additions and 36 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,3 +1,90 @@
+## Release 0.1.8 (2021-04-30)
+
+This release introduces some significant changes to the way that benchmarks are
+managed, introducing a new dataset API. This enabled us to add support for
+millions of new benchmarks and a more efficient implementation for the LLVM
+environment, but this will require some migrating of old code to the new
+interfaces (see "Migration Checklist" below). Some of the key changes of this
+release are:
+
+- **[Core API change]** We have added a Python
+  [Benchmark](https://facebookresearch.github.io/CompilerGym/compiler_gym/datasets.html#compiler_gym.datasets.Benchmark)
+  class ([#190](https://github.com/facebookresearch/CompilerGym/pull/190)). The
+  `env.benchmark` attribute is now an instance of this class rather than a
+  string ([#222](https://github.com/facebookresearch/CompilerGym/pull/222)).
+- **[Core behavior change]** Environments will no longer select benchmarks
+  randomly. Now `env.reset()` will now always select the last-used benchmark,
+  unless the `benchmark` argument is provided or `env.benchmark` has been set.
+  If no benchmark is specified, a default is used.
+- **[API deprecations]** We have added a new
+  [Dataset](https://facebookresearch.github.io/CompilerGym/compiler_gym/datasets.html#compiler_gym.datasets.Dataset)
+  class hierarchy
+  ([#191](https://github.com/facebookresearch/CompilerGym/pull/191),
+  [#192](https://github.com/facebookresearch/CompilerGym/pull/192)). All
+  datasets are now available without needing to be downloaded first, and a new
+  [Datasets](https://facebookresearch.github.io/CompilerGym/compiler_gym/datasets.html#compiler_gym.datasets.Datasets)
+  class can be used to iterate over them
+  ([#200](https://github.com/facebookresearch/CompilerGym/pull/200)). We have
+  deprecated the old dataset management operations, the
+  `compiler_gym.bin.datasets` script, and removed the `--dataset` and
+  `--ls_benchmark` flags from the command line tools.
+- **[RPC interface change]** The `StartSession` RPC endpoint now accepts a list
+  of initial observations to compute. This removes the need for an immediate
+  call to `Step`, reducing environment reset time by 15-21%
+  ([#189](https://github.com/facebookresearch/CompilerGym/pull/189)).
+- [LLVM] We have added several new datasets of benchmarks, including the Csmith
+  and llvm-stress program generators
+  ([#207](https://github.com/facebookresearch/CompilerGym/pull/207)), a dataset
+  of OpenCL kernels
+  ([#208](https://github.com/facebookresearch/CompilerGym/pull/208)), and a
+  dataset of compilable C functions
+  ([#210](https://github.com/facebookresearch/CompilerGym/pull/210)). See [the
+  docs](https://facebookresearch.github.io/CompilerGym/llvm/index.html#datasets)
+  for an overview.
+- `CompilerEnv` now takes an optional `Logger` instance at construction time for
+  fine-grained control over logging output
+  ([#187](https://github.com/facebookresearch/CompilerGym/pull/187)).
+- [LLVM] The ModuleID and source_filename of LLVM-IR modules are now anonymized
+  to prevent unintentional overfitting to benchmarks by name
+  ([#171](https://github.com/facebookresearch/CompilerGym/pull/171)).
+- [docs] We have added a [Feature
+  Stability](https://facebookresearch.github.io/CompilerGym/about.html#feature-stability)
+  section to the documentation
+  ([#196](https://github.com/facebookresearch/CompilerGym/pull/196)).
+- Numerous bug fixes and improvements.
+
+Please use this checklist when updating code for the previous CompilerGym release:
+
+* Review code that accesses the `env.benchmark` property and update to
+  `env.benchmark.uri` if a string name is required. Setting this attribute by
+  string (`env.benchmark = "benchmark://a-v0/b"`) and comparison to string types
+  (`env.benchmark == "benchmark://a-v0/b"`) still work.
+* Review code that calls `env.reset()` without first setting a benchmark.
+  Previously, calling `env.reset()` would select a random benchmark. Now,
+  `env.reset()` always selects the last used benchmark, or a predetermined
+  default if none is specified.
+* Review code that relies on `env.benchmark` being `None` to select benchmarks
+  randomly. Now, `env.benchmark` is always set to the previously used benchmark,
+  or a predetermined default benchmark if none has been specified. Setting
+  `env.benchmark = None` will raise an error. Select a benchmark randomly by
+  sampling from the `env.datasets.benchmark_uris()` iterator.
+* Remove calls to `env.require_dataset()` and related operations. These are no
+  longer required.
+* Remove accesses to `env.benchmarks`. An iterator over available benchmark URIs
+  is now available at `env.datasets.benchmark_uris()`, but the list of URIs
+  cannot be relied on to be fully enumerable (the LLVM environments have over
+  2^32 URIs).
+* Review code that accesses `env.observation_space` and update to
+  `env.observation_space_spec` where necessary
+  ([#228](https://github.com/facebookresearch/CompilerGym/pull/228)).
+* Update compiler service implementations to support the updated RPC interface
+  by removing the deprecated `GetBenchmarks` RPC endpoint and replacing it with
+  `Dataset` classes. See the [example
+  service](https://github.com/facebookresearch/CompilerGym/tree/development/examples/example_compiler_gym_service)
+  for details.
+* [LLVM] Update references to the `poj104-v0` dataset to `poj104-v1`.
+* [LLVM] Update references to the `cBench-v1` dataset to `cbench-v1`.
+
 ## Release 0.1.7 (2021-04-01)
 
 This release introduces [public

diff --git a/VERSION b/VERSION
@@ -1 +1 @@
-0.1.7
+0.1.8
diff --git a/compiler_gym/envs/compiler_env.py b/compiler_gym/envs/compiler_env.py
@@ -585,12 +585,12 @@ def close(self):
 
             Internally, CompilerGym environments may launch subprocesses and use
             temporary files to communicate between the environment and the
-            underlying compiler (see :doc:`compiler_gym.service
-            <compiler_gym/service>` for details). This means it is important to
-            call :meth:`env.close() <compiler_gym.envs.CompilerEnv.close>` after
-            use to free up resources and prevent orphan subprocesses or files.
-            We recommend using the :code:`with` statement pattern for creating
-            environments:
+            underlying compiler (see :ref:`compiler_gym.service
+            <compiler_gym/service:compiler_gym.service>` for details). This
+            means it is important to call :meth:`env.close()
+            <compiler_gym.envs.CompilerEnv.close>` after use to free up
+            resources and prevent orphan subprocesses or files. We recommend
+            using the :code:`with`-statement pattern for creating environments:
 
                 >>> with gym.make("llvm-autophase-ic-v0") as env:
                 ...    env.reset()

diff --git a/compiler_gym/envs/llvm/datasets/csmith.py b/compiler_gym/envs/llvm/datasets/csmith.py
@@ -5,6 +5,7 @@
 import io
 import logging
 import subprocess
+import sys
 import tarfile
 import tempfile
 from pathlib import Path
@@ -52,6 +53,31 @@ def source(self) -> str:
         return self._src.decode("utf-8")
 
 
+class CsmithBuildError(DatasetInitError):
+    """Error raised if :meth:`CsmithDataset.install()
+    <compiler_gym.datasets.CsmithDataset.install>` fails."""
+
+    def __init__(self, failing_stage: str, stdout: str, stderr: str):
+        install_instructions = {
+            "linux": "sudo apt install g++ m4",
+            "darwin": "brew install m4",
+        }[sys.platform]
+
+        super().__init__(
+            "\n".join(
+                [
+                    f"Failed to build Csmith from source, `{failing_stage}` failed.",
+                    "You may be missing installation dependencies. Install them using:",
+                    f"    {install_instructions}",
+                    "See https://github.com/csmith-project/csmith#install-csmith for more details",
+                    f"--- Start `{failing_stage}` logs: ---\n",
+                    stdout,
+                    stderr,
+                ]
+            )
+        )
+
+
 class CsmithDataset(Dataset):
     """A dataset which uses Csmith to generate programs.
 
@@ -175,20 +201,7 @@ def _build_csmith(install_root: Path, logger: logging.Logger):
             )
             stdout, stderr = configure.communicate(timeout=600)
             if configure.returncode:
-                raise DatasetInitError(
-                    "\n".join(
-                        [
-                            "Failed to build Csmith from source, `./configure` failed.",
-                            "You may be missing installation dependencies. Install them using:",
-                            "     linux: `sudo apt install g++ m4`",
-                            "     macOS: `brew install m4`",
-                            "See https://github.com/csmith-project/csmith#install-csmith for more details",
-                            "--- Start `./configure` logs: ---\n",
-                            stdout,
-                            stderr,
-                        ]
-                    )
-                )
+                raise CsmithBuildError("./configure", stdout, stderr)
 
             logger.debug("Installing Csmith to %s", install_root)
             make = subprocess.Popen(
@@ -200,20 +213,7 @@ def _build_csmith(install_root: Path, logger: logging.Logger):
             )
             stdout, stderr = make.communicate(timeout=600)
             if make.returncode:
-                raise DatasetInitError(
-                    "\n".join(
-                        [
-                            "Failed to build Csmith from source, `make install` failed.",
-                            "You may be missing installation dependencies. Install them using:",
-                            "     linux: `sudo apt install g++ m4`",
-                            "     macOS: `brew install m4`",
-                            "See https://github.com/csmith-project/csmith#install-csmith for more details",
-                            "--- Start `make install` logs: ---\n",
-                            stdout,
-                            stderr,
-                        ]
-                    )
-                )
+                raise CsmithBuildError("make install", stdout, stderr)
 
     @property
     def size(self) -> int:

diff --git a/tests/llvm/datasets/llvm_stress_test.py b/tests/llvm/datasets/llvm_stress_test.py
@@ -3,6 +3,7 @@
 # This source code is licensed under the MIT license found in the
 # LICENSE file in the root directory of this source tree.
 """Tests for the AnghaBench dataset."""
+import sys
 from itertools import islice
 
 import gym
@@ -45,7 +46,7 @@ def test_llvm_stress_random_select(
     # As of the current version (LLVM 10.0.0), programs generated with the
     # following seeds emit an error when compiled: "Cannot emit physreg copy
     # instruction".
-    FAILING_SEEDS = {173, 239}
+    FAILING_SEEDS = {"linux": {173, 239}, "darwin": {173}}[sys.platform]
 
     if index in FAILING_SEEDS:
         with pytest.raises(