Regarding the idea of Markov Decision Processes (MDPs) as a way of formalizing what it means to make optimal decisions in probabilistic domains. MDPs also generalize the idea of having a single goal state to instead having reward, positive or negative, that can accumulate at various states.
Take the grid world from last week’s Colab notebook (or another domain if you have something you really prefer).
Add some amount of probabilistic behavior and reward to this environment and model it as a Markov Decision Problem (MDP). See this week’s Colab notebook (Links to an external site.) for an example of a betting game modeled as an MDP.
For example: maybe the environment is slippery, and actions sometimes don’t have the desired effects. Maybe some squares give negative reward some percentage of the time (traps?). Maybe all squares give negative reward some percentage of the time (meteorite?). Maybe some walls are electrified? Etc.
The required to write down how this would be modeled as an MDP:
You don’t have to code this, just model the problem.
Do you have a guess what the optimal value function and policy should look like?
12 times new roman single space and pages as needed
There are no bids yet.
All Rights Reserved, Dataedy.com 2020