Skip to content

Commit 089dace

Browse files
committed
[up] added first readme.md file and getting-started.ipynb
1 parent f11ac10 commit 089dace

File tree

2 files changed

+418
-2
lines changed

2 files changed

+418
-2
lines changed

README.md

+74-2
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,74 @@
1-
# deep-banana
2-
Implementation of a deep reinforcement learning agent that should catch yellow bananas and avoid blue ones.
1+
# deep-monkey
2+
A deep reinforcement learning agent that catches yellow bananas and avoids blue ones in a large square world simulated
3+
with Unity ML-Agents.
4+
5+
This project is my solution to the navigation project of udacity's deep reinforcement learning nanodegree
6+
[Deep Reinforcement Learning Nanodegree](https://www.udacity.com/course/deep-reinforcement-learning-nanodegree--nd893).
7+
8+
## Project Details
9+
The agend is a monkey moving in a 2d arena where on the floor are blue and yellow bananas, each time the agent hits a
10+
banana it is rewarded as follows:
11+
- for the yellow bananas it receives a reward of +1
12+
- for the blue bananas it receives a reward of -1
13+
14+
State space has 37 continuous dimensions including:
15+
- the agent's velocity
16+
- ray based perception of objects around agent's forward direction
17+
18+
Action space has 1 discrete dimension, possible dimensions are:
19+
- **`0`** - move forward
20+
- **`1`** - move backward
21+
- **`2`** - turn left
22+
- **`3`** - turn right
23+
24+
The task is episodic.
25+
26+
The task is considered solved if the agent can achieve a score of +13 over 100 consecutive episodes.
27+
28+
## Getting Started
29+
1. Download the pre-compiled unity environment
30+
Linux: [click here](https://s3-us-west-1.amazonaws.com/udacity-drlnd/P1/Banana/Banana_Linux.zip)
31+
Mac OSX: [click here](https://s3-us-west-1.amazonaws.com/udacity-drlnd/P1/Banana/Banana.app.zip)
32+
Windows (32-bit): [click here](https://s3-us-west-1.amazonaws.com/udacity-drlnd/P1/Banana/Banana_Windows_x86.zip)
33+
Windows (64-bit): [click here](https://s3-us-west-1.amazonaws.com/udacity-drlnd/P1/Banana/Banana_Windows_x86_64.zip)
34+
1. Decompress the archive at your preferred location (e.g. in this repository working copy)
35+
1. Open the getting-started.ipynb. This notebook installs the dependencies and explores the environment concluding
36+
with a demonstration of an agent which chooses actions randomly.
37+
1. Follow the instructions indicated in the getting-started.ipynb notebook. You will need to specify the path to the
38+
environment executable that you downloaded at the beginning.
39+
40+
### Code organization
41+
The code is organized as follows (in hierarchical order from abstract to detailed):
42+
- Report.ipynb: notebook illustrating the result of this project.
43+
- deep_monkey.py: includes high level functions used in the notebook, these are used for training, plotting results and
44+
saving model checkpoints.
45+
- agent.py: this file include the classes that model the agent and its dependencies. Implements a high level Agent which
46+
generalizes each variant of the original DQN algorithm.
47+
- model.py: this file include the neural network class, implemented using pytorch, which is used by the agent for
48+
approximating the q function.
49+
50+
## Instructions
51+
52+
### Prerequisites
53+
A working python 3 environment is required. You can easily setup one installing [anaconda] (https://www.anaconda.com/download/)
54+
55+
### Installation
56+
If you are using anaconda is suggested to create a new environment as follows:
57+
```bash
58+
conda create --name deepmonkey python=3.6
59+
```
60+
activate the environment
61+
```bash
62+
source activate deepmonkey
63+
```
64+
start the jupyter server
65+
```bash
66+
python jupyter-notebook --no-browser --ip 127.0.0.1 --port 8888 --port-retries=0
67+
```
68+
69+
### Future development
70+
Agent's learning performances (number of episodes to solve the task) should be improved implementing the following:
71+
- Prioritized experience replay
72+
- Dueling DQN
73+
74+
In addition, the same task should be solved starting from raw pixel observations.

0 commit comments

Comments
 (0)