180321 --> 250321

Open Trails:

Potential Trails:

- CLEAN MDP files, separate classes, use abstract base classes
- INVESTIGATE rewriting agent and estimator classes to fully exploit numpy (not-urgent)
- UNIFY estimators and approximators in my code

Closed Trails:

- ADJUST the learning plots so that there is a fixed maximum number of steps
- ADJUST the learning plots so that the greedy evaluation starts from the initial state
- FRESH agent or estimator function to calculate bias with respect to action-value function
- FRESH agent train method so that training of estimators is separate from action selection (15 min)
- TRY learning plots with LambChop using RMax estimators 
    - FIX a bug where the agent only takes one step when exploring and the episode terminates... (bug seems to have fixed itself by restarting)
- TRY learning plots with LambChop using novelty estimators (only)

Notes:

- THEORY can reactive turing machines help in the analysis of reinforcement learning?
- LINK for custom commands in jupyter lab :(https://towardsdatascience.com/how-to-customize-jupyterlab-keyboard-shortcuts-72321f73753d)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Task-Log.md

Task-Log.md

Open Trails:

Potential Trails:

Closed Trails:

Notes:

Files

Task-Log.md

Latest commit

History

Task-Log.md

File metadata and controls

Open Trails:

Potential Trails:

Closed Trails:

Notes: