Skip to content

Latest commit

 

History

History
29 lines (17 loc) · 1.16 KB

Task-Log.md

File metadata and controls

29 lines (17 loc) · 1.16 KB

180321 --> 250321

Open Trails:

Potential Trails:

- CLEAN MDP files, separate classes, use abstract base classes
- INVESTIGATE rewriting agent and estimator classes to fully exploit numpy (not-urgent)
- UNIFY estimators and approximators in my code

Closed Trails:

- ADJUST the learning plots so that there is a fixed maximum number of steps
- ADJUST the learning plots so that the greedy evaluation starts from the initial state
- FRESH agent or estimator function to calculate bias with respect to action-value function
- FRESH agent train method so that training of estimators is separate from action selection (15 min)
- TRY learning plots with LambChop using RMax estimators 
    - FIX a bug where the agent only takes one step when exploring and the episode terminates... (bug seems to have fixed itself by restarting)
- TRY learning plots with LambChop using novelty estimators (only)

Notes:

- THEORY can reactive turing machines help in the analysis of reinforcement learning?
- LINK for custom commands in jupyter lab :(https://towardsdatascience.com/how-to-customize-jupyterlab-keyboard-shortcuts-72321f73753d)