ORIE 3510 Engineering Stochastic Processes Spring 2021 Jim Dai, Qianmin Xie, & Annie Huang 1 Course Project Due: April 13 11:59PM, 2021 This project will be done using Python Jupyter notebook2 and...

file attached


ORIE 3510 Engineering Stochastic Processes Spring 2021 Jim Dai, Qianmin Xie, & Annie Huang 1 Course Project Due: April 13 11:59PM, 2021 This project will be done using Python Jupyter notebook2 and Python pack- ages for neural networks3. You should submit your .ipynb Jupyter notebook file, with each question clearly marked. Each of you is expected to do this project independently. You may discuss this project with your classmates, TAs, and instructors. Document the specific help from others in your submis- sion. Consider a perishable product with a maximum lifetime of L periods. Thus, at the end of period t, there may be leftover inventory of this product with a remaining lifetime j such that 1 ≤ j ≤ L− 1. Any leftover items with 0 remaining life is considered “expired” and will be disposed. Let Xtj be the amount of inventory with remaining lifetime less or equal to j (after the “expired items” are disposed), then the system state by the end of period t is Xt = (Xt1, Xt2, · · · , Xt,L−1)4. Note that the sequence (Xt1, Xt2, · · · , Xt,L−1) is non-decreasing and Xt1 is the number of items with remaining lifetime 1 and Xtj −Xt,j−1 is the number of items with remaining lifetime exactly j ∈ {2, 3, · · · , L− 1}. By the end of period t − 1, after disposing expired items, an order for Q − Xt−1,L−1 units of new items is placed. At the beginning of period t, Q−Xt−1,L−1 units of new items arrive, bringing the total inventory level to Q. The variable cost of cv per unit is paid when new inventory arrives. A random demand Dt then occurs and is met immediately as much as possible with inventory from the oldest to the newest. Demand across different periods are assumed to be i.i.d. and follow Poisson distribution with mean λ. Unmet demand is lost. At the end of period t, expired inventory is disposed at a cost cd per unit, and leftover inventory incurs a holding cost h per unit. It can be checked that X = {Xt, t = 0, 1, . . . , } is a DTMC. We use S to denote the state space of the Markov chain. The objective is to compute the discounted value function v(x) = E[ ∑∞ n=0 β ng(Xn)|X0 = x] for different initial states x ∈ S, where g(Xn) is the expected profit during period n + 1 (i.e., Xn is the system state by the end of period n). Let β denote the discount factor. In practice, we use E[ ∑t n=0 β ng(Xn)|X0 = x] to approximate the discounted value function v(x). Here t is the length of an episode and its value depends on the discounted factor β. 1Adopted from the work of Jim Dai, Mark Gluzman, Lily Liu, & Hailun Zhang 2You can launch Jupyter notebook from Anaconda Navigator. Please refer to the installation tutorial “Install TensorFlow based on Anaconda” for details. 3Please refer to the demo code and “Regression with neural networks in Python” for an example. 4You may let Xtj be the amount of inventory with remaining lifetime equal to j, it is easy to check that these two representations of the system state are equivalent. Some parameters: selling price cp = $1., variable cost cv = $0.4, disposal cost cd = $0.1, holding cost h = $0.1 and demand rate λ = 17. 1. In homework 2, 3 and 4, we only considered L = 2; in recitations, we considered examples with L ≥ 3. Now consider a problem with parameters Q = 90, L = 7, β = 0.8. Sample a subset X̄ = {x̄k}Kk=1 of the whole state space S as follows: starting from the initial state x0 = (0, ..., 0), simulate a full path of the DTMC {X0 = x0, X1 = x1, · · · , XT = xT } for some appropriately chosen T . Then we choose K unique states of interests as our sampled subset. After obtaining a subset X̄ of K states, for each state x ∈ X̄, we use Monte Carlo method to estimate its corresponding discounted value function v̂(x), including its 95%-level confidence interval, where x is the initial state. For each state x ∈ X̄, we need to run for N episodes, each with length t, to obtain an estimate v̂(x) and its associated confidence interval. Specifically, you need to show your work for the followings: (a) For each K = 20, 200, sample a full path of the DTMC {X0 = x0, X1 = x1, · · · , XT = xT } where T = 103K. Choose K states to form a subset X̄ = {x̄k}Kk=1 where x̄k = x103k for k = 1, . . . ,K. (b) For K=20 and N=100, using the 20 sample states from part (a) to estimate v̂(x̄k), where x̄k is the initial state, for k = 1, . . ., 20. Clearly show each state x̄k, the estimated value v̂(x̄k), and its 95%-level confidence interval. 5 Please specify the length of the episode t and briefly reason why. (Hint: the length of the episode depends on the discount factor β used). (c) For each value of K = 20, 200 (using one case N = 100 and the other case N = 1000 as the number of episodes), plot the estimated value v̂(x̄k) with its 95%-level confidence interval, with k = 1, . . ., K as the x-coordinates. (You should plot 2 ∗ 2 = 4 figures).6 2. As you can see that the state space of the problem is large. In many applications, the state space is even larger. Therefore, storing v̂(x) for all x ∈ S in computer memory can be non-feasible. Instead of remember pre-computed value v̂(x), one can compute in real time every time a state x is observed. However, the Monte Carlo method that we use above can sometimes be too slow to estimate v(x). One possible solution to the preceding challenge is to find a family of functions fθ : x ∈ S → fθ(x) ∈ R, 5You can use a table with three columns to show them or print out the values in Jupyter notebook. Either way, be sure to make it clear of each state and its corresponding values. 6You can look more into matplotlib.pyplot.errorbar for details to plot points with confidence interval. 2 where θ ∈ Rd is a vector of d parameters to be computed in advance (known as offline computation) and d is some positive integer. Assume that θ has been computed and is fixed. For each given x ∈ S, fθ(x) can be computed quickly (online or in real time). We use fθ(x) to estimate v(x). Namely, ṽ(x) = fθ(x), x ∈ S provides another estimate of v(x). Unlike Monte Carlo estimate v̂(x), ṽ(x) can be computed much faster. Conceptually, θ ∈ Rd can be computed in advance using a cloud computing facility for many days or weeks; with the given θ, fθ(x) can be computed, for example, from the dashboard computer in car, quickly for each input x. One can use regression to find a value of θ ∈ Rd. For that, we assume( xk ∈ S, v̂(xk) ∈ R ) , k = 1, . . . ,m is given. Then θ ∈ Rd can be computed to minimize m∑ k=1 ( fθ(xk)− v̂(xk) )2 (1) In this project, you will need to use a neural network with weight vector θ (see the demo code for the construction of a neural network) to represent fθ(x). Specifically, we will choose m = K, where K is the number of initial states in Problem 1; thus, xk and the Monte Carlo estimate v̂(xk) are available for k = 1, . . . ,K as obtained from Problem 1. We will use ( x̄k, v̂(x̄k) ) k = 1, . . . ,K to train the neural network parameter θ by solving the optimization problem (1). Once θ is computed, (a) Please build a neural network and train it using the 200 sampled states from Question 1(a). For state x = (10, 20, 30, 40, 50, 80), please list ṽ(x), v̂(x), and the 95% confidence interval (CI) of v̂(x). To estimate v̂(x), use the number of episodes N = 1000 each with length t = 100. You can use directly the simulated results from Question 1 if you find it helpful. (b) Run the DTMC again for 50000 periods starting from state (10, 20, 30, 40, 50, 80). Record X1000, X2000, . . . X50000 into a test set of 50 states. Plot ṽ(x) and v̂(x) for all states x in this test set. (c) Report the mean squared error on the test set for K = 20, 200. Again, you can use directly the simulated results from Question 1 if you find it helpful. A demo code (demo.ipynb) will be posted for reference, together with a detailed document “Regression with neural networks in Python”. 3
Apr 10, 2021
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here