ma 661 assignment 2 Ma 661 Dynamic Programming and Reinforcement Learning Darinka Dentcheva XXXXXXXXXX Homework 2 due Wednesday, March 3, 2021 Problem 1. Assume that you play a chess match with a...

1 answer below »
zipped file is the python answers given by you for my previous order , the pdf file are the actual theoritical answers i got from other studets , i need both of them combined and all the python program and its solution and my pdf solutions and create 1 pdf file with everything


ma 661 assignment 2 Ma 661 Dynamic Programming and Reinforcement Learning Darinka Dentcheva [email protected] Homework 2 due Wednesday, March 3, 2021 Problem 1. Assume that you play a chess match with a friend. If you play timid your probability of making a draw is p = 0.9, the probability to win is 0 and the probability to lose is 0.1. If you play bold you either win with probability q = 0.45, or you lose. Each win brings one point to the score of the winer. The match consists of 5 games. If the score is a tie after the fifth game, then a “sudden death” rule is adopted; that is, whoever wins the next game is a winner of the math; if it is a draw, then the game is repeated with the same rule. Formulate a Markov decision problem to determine the optimal strategy of your play (to maximize the probability of winning the match) and solve it. Clearly describe the state space, control space, transition probabilities, and the reward function. Problem 2. A software manufacturer can be in one of two states. In state 1 their software sells well, and in state 2, the product sells poorly. While in state 1, the company can invest in development of upgraded version of the software, in which case the one-stage reward is 4 units, and the probability of degrading to state 2 is 0.2. If no investment in new development occurs, then the reward is 6 units, but the probability of transition to state 2 is 0.5. While in state 2, if the company invests in software development, then the reward is -2 units, but the probability of transition to state 1 is 0.7. Without special efforts to improve, the reward is 1 and the probability of upgrading to state 1 is 0. Formulate a dynamic programming problem to determine an optimal reserch and development policy. Solve the problem for a time horizon of 12 time intervals. Problem 3. Consider the equipment replacement problem discussed in class, with the following data: • operating cost per period c0 + c1x, x = 0,1,2, . . . ; • revenue per period R; • replacement cost K; • salvage value γKe−µx; where c0 > 0, c1 > 0, c2 > 0, 0 < γ="">< 1,="" and="" µ=""> 0. Assume that the salvage value can be collected whenever the item is replaced. The probabilities of deterioration by j steps in one period are given by the Poisson distribution p j = λ j j! e−λ , j = 0,1, . . . . (3.1) Formulate the corresponding Markov decision problem. Clearly define the state space, action space, transition probabilities, and the reward function. (3.2) Solve the problem numerically for c0 = 1, c1 = 1, R = 5, K = 10, γ = 0.8, µ = 0.2, λ = 1, and time horizon T = 20. To this end, argue that a constant value x̄ exists such that for all x ≥ x̄ replacement is always profitable. Then you will know that the value function for all x ≥ x̄ is the same as at x̄. This will allow you have finite tables of the value function for each time t. 2
Answered Same DayMar 19, 2021

Answer To: ma 661 assignment 2 Ma 661 Dynamic Programming and Reinforcement Learning Darinka Dentcheva...

Sandeep Kumar answered on Mar 19 2021
131 Votes
1)
import numpy as np
import pandas as pd
def markov_chess():
matrix_v = np.zeros((11, 6))
matrix_u = np.zeros((11, 6))
matrix_p = np.zeros((3, 2))
matrix_p[0][0] = 0.1
matrix_p[1][0] = 0.9
matrix_p[0][1] = 0.55
matrix_p[2][1] = 0.45
for j in range(11):
if j < 5:
matrix_v[j][0] = 0
elif j > 5:
matrix_v[j][0] = 1
else:
matrix_v[j][0] = 0.45
matrix_u[5][0] = 2
for m in range(1, 6):
for n in range(m, 11-m):
vs = [matrix_v[n, m-1]*matrix_p[1, 0] + matrix_v[n-1, m-
1]*matrix_p[0, 0],
matrix_v[n-1, m-1]*matrix_p[0, 1] + matrix_v[n+1, m-
1]*matrix_p[2, 1]]
matrix_v[n][m] = round(np.max(vs), 5)
matrix_u[n][m] = np.argmax(vs) + 1
if vs[0] == vs[1]:
matrix_u[n][m] = 3
return matrix_v, matrix_u
def display_table(v):
v = pd.DataFrame(v, index=[f"n={n-5}" for n in range(11)])
v.columns = [f"m={6 - j}" for j in range(6)]
return v
v, u = markov_chess()
print("The value function matrix is ")
print(display_table(v))
print("Policy Matrix: 1 for timid, 2 for bold, 3 for any")
print(display_table(u))

Explanation:
Probability of Draw p = 0.9
Probability to win = 0
Probability to lose = 1
Bold Play –
Probability to win = 0.45
Probability to lose = 0.55...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here