introduce the idea of Markov Decision Processes (MDPs) as a way of formalizing what it means to make optimal decisions in probabilistic domains. MDPs also generalize the idea of having a single goal state to instead having reward, positive or negative, that can accumulate at various states.
For example: maybe the environment is slippery, and actions sometimes don’t have the desired effects. Maybe some squares give negative reward some percentage of the time (traps?). Maybe all squares give negative reward some percentage of the time (meteorite?). Maybe some walls are electrified? Etc.
The required to write down
Write down how this would be modeled as an MDP:
Actions in each state
Transition function, i.e. probability that an action in a state will produce a given successor state
Reward function, i.e., which transitions produce a reward, and how much?
You don’t have to code this, just model the problem.
Do you have a guess what the optimal value function and policy should look like?
12 times new roman single space and as needed
2271 project(s) postedhire 2 freelancers
Member since: 2020-02-01
FREELANCER BIDDING (0)
There are no bids yet.
All Rights Reserved, Dataedy.com 2020
Become our member!
Change your password
Bid this project
You have to update your profile before bidding on this project!
You cannot bid on any project without updating your profile. Please click the Update button below to update the profile.