|
1 |
| -# deep-banana |
2 |
| -Implementation of a deep reinforcement learning agent that should catch yellow bananas and avoid blue ones. |
| 1 | +# deep-monkey |
| 2 | +A deep reinforcement learning agent that catches yellow bananas and avoids blue ones in a large square world simulated |
| 3 | +with Unity ML-Agents. |
| 4 | + |
| 5 | +This project is my solution to the navigation project of udacity's deep reinforcement learning nanodegree |
| 6 | +[Deep Reinforcement Learning Nanodegree](https://www.udacity.com/course/deep-reinforcement-learning-nanodegree--nd893). |
| 7 | + |
| 8 | +## Project Details |
| 9 | +The agend is a monkey moving in a 2d arena where on the floor are blue and yellow bananas, each time the agent hits a |
| 10 | +banana it is rewarded as follows: |
| 11 | +- for the yellow bananas it receives a reward of +1 |
| 12 | +- for the blue bananas it receives a reward of -1 |
| 13 | + |
| 14 | +State space has 37 continuous dimensions including: |
| 15 | +- the agent's velocity |
| 16 | +- ray based perception of objects around agent's forward direction |
| 17 | + |
| 18 | +Action space has 1 discrete dimension, possible dimensions are: |
| 19 | +- **`0`** - move forward |
| 20 | +- **`1`** - move backward |
| 21 | +- **`2`** - turn left |
| 22 | +- **`3`** - turn right |
| 23 | + |
| 24 | +The task is episodic. |
| 25 | + |
| 26 | +The task is considered solved if the agent can achieve a score of +13 over 100 consecutive episodes. |
| 27 | + |
| 28 | +## Getting Started |
| 29 | +1. Download the pre-compiled unity environment |
| 30 | +Linux: [click here](https://s3-us-west-1.amazonaws.com/udacity-drlnd/P1/Banana/Banana_Linux.zip) |
| 31 | +Mac OSX: [click here](https://s3-us-west-1.amazonaws.com/udacity-drlnd/P1/Banana/Banana.app.zip) |
| 32 | +Windows (32-bit): [click here](https://s3-us-west-1.amazonaws.com/udacity-drlnd/P1/Banana/Banana_Windows_x86.zip) |
| 33 | +Windows (64-bit): [click here](https://s3-us-west-1.amazonaws.com/udacity-drlnd/P1/Banana/Banana_Windows_x86_64.zip) |
| 34 | +1. Decompress the archive at your preferred location (e.g. in this repository working copy) |
| 35 | +1. Open the getting-started.ipynb. This notebook installs the dependencies and explores the environment concluding |
| 36 | +with a demonstration of an agent which chooses actions randomly. |
| 37 | +1. Follow the instructions indicated in the getting-started.ipynb notebook. You will need to specify the path to the |
| 38 | +environment executable that you downloaded at the beginning. |
| 39 | + |
| 40 | +### Code organization |
| 41 | +The code is organized as follows (in hierarchical order from abstract to detailed): |
| 42 | +- Report.ipynb: notebook illustrating the result of this project. |
| 43 | +- deep_monkey.py: includes high level functions used in the notebook, these are used for training, plotting results and |
| 44 | +saving model checkpoints. |
| 45 | +- agent.py: this file include the classes that model the agent and its dependencies. Implements a high level Agent which |
| 46 | +generalizes each variant of the original DQN algorithm. |
| 47 | +- model.py: this file include the neural network class, implemented using pytorch, which is used by the agent for |
| 48 | +approximating the q function. |
| 49 | + |
| 50 | +## Instructions |
| 51 | + |
| 52 | +### Prerequisites |
| 53 | +A working python 3 environment is required. You can easily setup one installing [anaconda] (https://www.anaconda.com/download/) |
| 54 | + |
| 55 | +### Installation |
| 56 | +If you are using anaconda is suggested to create a new environment as follows: |
| 57 | +```bash |
| 58 | +conda create --name deepmonkey python=3.6 |
| 59 | +``` |
| 60 | +activate the environment |
| 61 | +```bash |
| 62 | +source activate deepmonkey |
| 63 | +``` |
| 64 | +start the jupyter server |
| 65 | +```bash |
| 66 | +python jupyter-notebook --no-browser --ip 127.0.0.1 --port 8888 --port-retries=0 |
| 67 | +``` |
| 68 | + |
| 69 | +### Future development |
| 70 | +Agent's learning performances (number of episodes to solve the task) should be improved implementing the following: |
| 71 | +- Prioritized experience replay |
| 72 | +- Dueling DQN |
| 73 | + |
| 74 | +In addition, the same task should be solved starting from raw pixel observations. |
0 commit comments