Regularization API for preference comparisons #481

Rocamonde · 2022-07-20T17:41:19Z

Fixes #461.

Support seeding via generator
Implement base class update params
Add regularization param change to logger
Add loss_regularize
Add weight_regularize
Add support for passing general regularization classes in preference_comparisons
Add tests

Co-authored-by: Adam Gleave <adam@gleave.me>

… EnsembleRewardNet initalizer

…ng before this ...

…andard deviation

…gs to predict processed

… default when using a reward ensemble with preference comparison it will wrap it in a add std wrapper.

…dic processed rather than predict_th.

…t the initalizations of the ensemble members are different.

…edict processed. RewardNetWrapper now implments predict_processed to ensure this is the default behavour.

AdamGleave

Factory approach seems good! It's an easy to use API. The actual implementation is a little trickier to follow over direct instantiation. But overall I think it's the least-bad option.

The main alternative would be to just make the callee to preference comparisons responsible for passing in a non-empty custom_logger and create the optimizer. This isn't crazy -- the scripts are already creating a custom logger, and making the callee create the optimizer does at least give flexibility as to what optimization class to use. But it seems like it's adding a lot of friction for people who want to just programatically use the Python API, whereas the regularization factory is easy to use.

Another approach would have been to just let you instantiate the regularizer without specifying optimizer or logger, and have this be something that's set after instantiation (with some check that they have been set before calling .regularize()). But this seems more error prone: it's nice to have instantiation mean the object is actually ready to use.

There's a few small comments outstanding (e.g. adding tests), but once those addressed I think we should be good to go -- just request a re-review once it's ready to look at again.

src/imitation/regularization/regularizers.py

…dynamic-l2-regularization

Co-authored-by: Adam Gleave <adam@gleave.me>

Rocamonde · 2022-09-13T11:41:39Z

Added all the requested changes, except the tests for the .backward(), which I moved off to a new issue, #562 . This is because I read through the tests and I found this is a problem that we should address more generally and probably deserves a separate PR. We only really check for progress in 3 algorithms as far as I could find.

AdamGleave

LGTM. Note I made a couple of minor comments to adjust phrasing of docstring -- please review those and merge if happy.

levmckinney and others added 30 commits July 8, 2022 23:05

first draft of reward ensembles

44aa95e

Fixed doc string

3edde81

Co-authored-by: Adam Gleave <adam@gleave.me>

adressed most of reviewers comments

65f6a82

Renamed UncertainRewardNet to RewardNetWithVariance

fa9147f

moved implementation of make_reward_net to reward_nets.py and rewrote…

8d490a4

… EnsembleRewardNet initalizer

fixed conservative reward wrapper

7b6e285

added test for reward_moments

2ec296e

switched to a nn.ModuleList not sure how serialize identity was passi…

eae4477

…ng before this ...

added test for conservative reward function

83b971d

pulled loss calculation out of reward trainer

2f7b4c3

created reward ensemble trainer

42378f8

added documentation for cross_entropy_loss_kwarg

59b7644

Merge branch 'master' into reward_ensemble

f591631

fixed tests and implementation of ensemble trainer

82048ac

modified assert so that it is actually always true

53fbd14

added loss to preference comparision notebook

4577771

add named config to reward.py and integrated tests

01add55

added logging of standard deviation

7edb59d

changed conservative reward wrapper into reward function that adds st…

9a909ff

…andard deviation

added validate reward structure function and the ability to pass kwar…

6dc6463

…gs to predict processed

fixed test_validate_wrapper_structure

67e3909

Added option to create and load reward functions that add std. Now by…

2d1547b

… default when using a reward ensemble with preference comparison it will wrap it in a add std wrapper.

fixed test_validate_wrapper_structure again

0ae83d8

removed failure_rate.sh

a35778b

predict_processed in normalization wrapper now calls base classes pre…

9dbc75e

…dic processed rather than predict_th.

fixed test coverage

197512a

added test that normalized reward net passes along its kwargs and tha…

d0e0ad9

…t the initalizations of the ensemble members are different.

adressed reviewers comments.

4b42892

now testing that all basic wrappers pass along kwargs when calling pr…

23b1f4d

…edict processed. RewardNetWrapper now implments predict_processed to ensure this is the default behavour.

added del kwargs where appropriate to improve redability

0246fb2

Rocamonde added 12 commits September 12, 2022 11:55

Replace assert with ValueError

0636e5d

Check for lambda being negative in the scaler

0a3351e

Guard against losses being negative in interval param scaler

2b36e62

Split tests up to cover more cases

e95f4b3

Clean up repetitive code for readability

918e283

Remove old TODO message

9c108d9

Merge RewardTrainer seeds into one

24c5729

Fix interval param tests to new error messages

6ed0b9b

Move regularization input validation to factory class

64e38a0

Remake the regularizer factory for better API design

e3d3f6e

Fix bugs and tests in new factory design

78efe8f

Added docstring to regularizer factory.

19f3518

AdamGleave reviewed Sep 13, 2022

View reviewed changes

src/imitation/regularization/regularizers.py Outdated Show resolved Hide resolved

src/imitation/regularization/regularizers.py Outdated Show resolved Hide resolved

Rocamonde and others added 4 commits September 13, 2022 12:48

Merge branch 'master' of github.com:HumanCompatibleAI/imitation into …

a293322

…dynamic-l2-regularization

Update src/imitation/regularization/regularizers.py

e9218c6

Co-authored-by: Adam Gleave <adam@gleave.me>

Update src/imitation/regularization/regularizers.py

0e2c997

Co-authored-by: Adam Gleave <adam@gleave.me>

Add todo to refactor once #529 is merged.

b6cb2cc

Rocamonde mentioned this pull request Sep 13, 2022

Add tests for ensuring loss goes down #562

Closed

Rocamonde added 3 commits September 13, 2022 13:25

Rename regularize to regularize_and_backward

b29b7ef

Fix bug in tests and docstrings

0145066

Rename mode to prefix

1052398

Added exceptions to docstrings

c16a0ef

Rocamonde requested a review from AdamGleave September 13, 2022 13:34

Rocamonde and others added 3 commits September 13, 2022 18:33

Make type ignore only specific to pytype

7ebec14

Add verbatim double-`` to some docstrings

d1574c3

Change phrasing in docstring

8d0b0a9

AdamGleave approved these changes Sep 14, 2022

View reviewed changes

Rocamonde merged commit d453247 into master Sep 14, 2022

Rocamonde deleted the dynamic-l2-regularization branch September 14, 2022 13:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regularization API for preference comparisons #481

Regularization API for preference comparisons #481

Rocamonde commented Jul 20, 2022 •

edited

Loading

AdamGleave left a comment •

edited

Loading

Rocamonde commented Sep 13, 2022

AdamGleave left a comment

Regularization API for preference comparisons #481

Regularization API for preference comparisons #481

Conversation

Rocamonde commented Jul 20, 2022 • edited Loading

AdamGleave left a comment • edited Loading

Choose a reason for hiding this comment

Rocamonde commented Sep 13, 2022

AdamGleave left a comment

Choose a reason for hiding this comment

Rocamonde commented Jul 20, 2022 •

edited

Loading

AdamGleave left a comment •

edited

Loading