Find 5 treasures and EXIT with minimum steps (QLearning)

- wall positions: [4,5,7,9,22,23,25,30,31,35,39,43,45,47,49,50,51,53,55,57,58,59,61,65,71,74,80,85,88,90,94,97,100,101,102,104,109,110,111,113,114,119,120,127,128,129,132,134,136,141,142,143,145,151,153,155,157,158,164,166,169,172,176,178,181,183,186,187,190,191,193,195,196,206,211,214,226,229]
- treasure positions: [6, 79, 170, 212, 227]
- exit position : 230
- hyperparameters Setting
-> EPSILON = 0.9 (randomness of ACTIONS)
-> ALPHA = 0.1 (learning-rate)
-> GAMMA = 1 (desire for future rewards: 0 -> ignore future rewards, 1 -> look for high rewards in the long term)
-> MAX_EPISODES = 1000 (amount of times of walking through the maze) - Reward Setting
-> goal_reward (exit found): 5000
-> wall_punish (hit into the wall): -300
-> out_punishment (go out of the map): -200
-> treasure_found: 1000
-> normal_reward (normal path): -100