THE ASSIGNMENT: This assignment consists of one section: 1. Written problems to be turned in. (You are free and encouraged to use programming for these problems, particularly the use of Matlab for...

THE ASSIGNMENT: This assignment consists of one section: 1. Written problems to be turned in. (You are free and encouraged to use programming for these problems, particularly the use of Matlab for Problem C. You should include any of your computational artifacts with your submission.) Click here to return to top of this page. 1. Written Problems to be turned in : A. Markov Chain (from Sheldon Ross) (15 points): Three white and three black balls are distributed in two urns in such a way that each contains three balls. We say that the system is in state i, i=0,1,2,3, if the first urn contains i white balls. At each step, we simultaneously draw one ball from each urn and place the ball drawn from the first urn into the second, and conversely with the ball from the second urn. Let Xn denote the state of the system after the nth step. Explain why {Xn , n=0,1,2,...} is a Markov chain. Calculate its transition probability matrix and stationary probabilities π0 , π1 , π2 , π3 for the states 0,1,2,3. B. Markov Process - Chutes and Ladders (25 points): The Hasbro game Chutes and Ladders (https://shop.hasbro.com/enus/product/chutes-and-ladders-game:1095F835-5056-9047-F548- 2F4D0AEF4ACC) has delighted young children for decades, because it is easy to play. It has also delighted at least some of their parents because they can play it without paying attention or even being awake. You are lucky enough to have an instructor who was delighted to see a game that could be modeled as a Markov Process. Such numerous delights! The basic idea is to start at square 0 (not on the board) and on alternating turns, spin a spinner/roll a die to show how many spaces to advance your token (note: in either case, values are 1..6). If your move ends on a square featuring the bottom of a ladder, or the top of a sliding board (chute), you are to follow the ladder/chute to its end and your token is placed in the corresponding square. The game ends when one lucky player lands in square 100 exactly. That is, if you are in square 99, you need to spin/roll a "1" in order to reach square 100. If you obtain any other value, you simply stay in square 99. Although the view of the game board from the Hasbro site may be sufficient to determine the position of all chutes and ladders on the board, a better rendering may be found by clicking the image above. C. Numerical Methods - Fixed Point Iteration - 30 points. You will compute stationary probabilities for the Markov chains you found in Problems A & B, using both fixed point iteration, and fixed point iteration with acceleration. We recommend using Matlab. Background: i. Fixed Point Iteration: One way of solving an equation or system of equations of the form f(x)=0 is to rewrite the system in terms of a fixed-point equation of the form g(x)=x , then solve it iteratively by beginning with an initial guess x0 , and repeatedly computing xn+1=g(xn ) until the values | xn+1 - g(xn ) | converge to the solution, x*. This will work under certain conditions. (We will consider the sequence to have converged when | xn+1 - g(xn ) |< ε for ε sufficiently small, say ε=10-6 .) for instance, although it is trivial to solve linear equations in one variable anyway, it is instructive to study solving the equation ax=b iteratively. this equation can be rewritten as (a-1+1)x=b, or x=b+(1-a)x . thus, the iteration xn+1=b+(1-a)xn is expected to converge to x*. ii. accelerating fixed point iteration. given a repetitive and slow-converging sequence, it is often possible to accelerate the convergence of the method. one technique uses the basic philosophy that if xn+1 is better than xn , then perhaps moving farther in the same direction is even better, i.e., x'n+1= xn+1+(1-α)(xn - xn+1 ) for some value α . after computing xn+1 and x'n+1 , use the value of x'n+1 to compute xn+2 , etc. iii. fixed point iteration in n dimensions. the principles of fixed point iteration and acceleration easily extend to systems of linear equations. it just depends on what you mean by "easily". the objective is to solve ax=b iteratively, where a is an n×n matrix, and x and b are n-dimensional vectors. the vector equation can be rewritten as (a-i+i)x=b, or x=b+(i-a)x , where i is the n×n identity matrix, and the iteration xn+1=b+(i-a)xn is expected to converge to the vector x*. details: the stationary probabilities can be found by solving the equation p tπ= π , (or (pt -i)π=0 ), where π is a n×1 (column) vector, and π1+ π2+ ... + πn = 1. remember that one (any) row of (pt -i) is redundant for well-behaved markov transition matrices, and can be replaced with the constraint that says the πi values sum to 1. d. reinforcement learning - hexapawn - 30 points: consider a system that allows the user repeated opportunities to observe its state and perform an action. the action taken may influence future evolution of the system. a game is an example of such a system. a policy may be defined as a mapping from the set of possible states of the system to the set of possible actions. in other words, a policy states what action is to be taken for each state. read martin gardner's mathematical games (https://www.cs.drexel.edu/~jpopyack/courses/ai/advsp21/assignments/hw2/hexapawn.pdf) column from the march 1962 issue of scientific american. the 24 states of hexapawn in the example constitute a complete set of states for which the game's second player has a choice of actions, with the exception of symmetric variants of the opponent's first move. notice that states arising in which the player has already lost are not included. (note: because the article has been reproduced in grayscale, the color information for the moves has been lost. it is sufficient to select your own colors; simply make sure each possible move for a given state has a unique color.) follow the instructions for reinforcement learning. how many games did you need to play before before learning was complete? specify the policy that has been learned by giving an action for each of the 24 states. indicate which states, if any have multiple options remaining. click here to return to top of this page. what to submit: all homework for this course must be submitted electronically using blackboard. do not e-mail your assignment to a ta or instructor! if you are having difficulty with your blackboard account, you are responsible for resolving these problems with a ta, an instructor, or someone from irt, before the assignment it due. it is suggested you complete your work early so that your instructor or ta can help you if you have difficulty with this process. for this assignment, you must submit: a pdf document with your answers to the "written problems to be turned in", written documentation for your programs, results of your testing and analysis. your source code for matlab and/or other computational scripts you use. ε="" for="" ε="" sufficiently="" small,="" say="" ε="10-6" .)="" for="" instance,="" although="" it="" is="" trivial="" to="" solve="" linear="" equations="" in="" one="" variable="" anyway,="" it="" is="" instructive="" to="" study="" solving="" the="" equation="" ax="b" iteratively.="" this="" equation="" can="" be="" rewritten="" as="" (a-1+1)x="b," or="" x="b+(1-a)x" .="" thus,="" the="" iteration="" xn+1="b+(1-a)xn" is="" expected="" to="" converge="" to="" x*.="" ii.="" accelerating="" fixed="" point="" iteration.="" given="" a="" repetitive="" and="" slow-converging="" sequence,="" it="" is="" often="" possible="" to="" accelerate="" the="" convergence="" of="" the="" method.="" one="" technique="" uses="" the="" basic="" philosophy="" that="" if="" xn+1="" is="" better="" than="" xn="" ,="" then="" perhaps="" moving="" farther="" in="" the="" same="" direction="" is="" even="" better,="" i.e.,="" x'n+1="xn+1+(1-α)(xn" -="" xn+1="" )="" for="" some="" value="" α="" .="" after="" computing="" xn+1="" and="" x'n+1="" ,="" use="" the="" value="" of="" x'n+1="" to="" compute="" xn+2="" ,="" etc.="" iii.="" fixed="" point="" iteration="" in="" n="" dimensions.="" the="" principles="" of="" fixed="" point="" iteration="" and="" acceleration="" easily="" extend="" to="" systems="" of="" linear="" equations.="" it="" just="" depends="" on="" what="" you="" mean="" by="" "easily".="" the="" objective="" is="" to="" solve="" ax="b" iteratively,="" where="" a="" is="" an="" n×n="" matrix,="" and="" x="" and="" b="" are="" n-dimensional="" vectors.="" the="" vector="" equation="" can="" be="" rewritten="" as="" (a-i+i)x="b," or="" x="b+(I-A)x" ,="" where="" i="" is="" the="" n×n="" identity="" matrix,="" and="" the="" iteration="" xn+1="b+(I-A)xn" is="" expected="" to="" converge="" to="" the="" vector="" x*.="" details:="" the="" stationary="" probabilities="" can="" be="" found="" by="" solving="" the="" equation="" p="" tπ="π" ,="" (or="" (pt="" -i)π="0" ),="" where="" π="" is="" a="" n×1="" (column)="" vector,="" and="" π1+="" π2+="" ...="" +="" πn="1." remember="" that="" one="" (any)="" row="" of="" (pt="" -i)="" is="" redundant="" for="" well-behaved="" markov="" transition="" matrices,="" and="" can="" be="" replaced="" with="" the="" constraint="" that="" says="" the="" πi="" values="" sum="" to="" 1.="" d.="" reinforcement="" learning="" -="" hexapawn="" -="" 30="" points:="" consider="" a="" system="" that="" allows="" the="" user="" repeated="" opportunities="" to="" observe="" its="" state="" and="" perform="" an="" action.="" the="" action="" taken="" may="" influence="" future="" evolution="" of="" the="" system.="" a="" game="" is="" an="" example="" of="" such="" a="" system.="" a="" policy="" may="" be="" defined="" as="" a="" mapping="" from="" the="" set="" of="" possible="" states="" of="" the="" system="" to="" the="" set="" of="" possible="" actions.="" in="" other="" words,="" a="" policy="" states="" what="" action="" is="" to="" be="" taken="" for="" each="" state.="" read="" martin="" gardner's="" mathematical="" games="" (https://www.cs.drexel.edu/~jpopyack/courses/ai/advsp21/assignments/hw2/hexapawn.pdf)="" column="" from="" the="" march="" 1962="" issue="" of="" scientific="" american.="" the="" 24="" states="" of="" hexapawn="" in="" the="" example="" constitute="" a="" complete="" set="" of="" states="" for="" which="" the="" game's="" second="" player="" has="" a="" choice="" of="" actions,="" with="" the="" exception="" of="" symmetric="" variants="" of="" the="" opponent's="" first="" move.="" notice="" that="" states="" arising="" in="" which="" the="" player="" has="" already="" lost="" are="" not="" included.="" (note:="" because="" the="" article="" has="" been="" reproduced="" in="" grayscale,="" the="" color="" information="" for="" the="" moves="" has="" been="" lost.="" it="" is="" sufficient="" to="" select="" your="" own="" colors;="" simply="" make="" sure="" each="" possible="" move="" for="" a="" given="" state="" has="" a="" unique="" color.)="" follow="" the="" instructions="" for="" reinforcement="" learning.="" how="" many="" games="" did="" you="" need="" to="" play="" before="" before="" learning="" was="" complete?="" specify="" the="" policy="" that="" has="" been="" learned="" by="" giving="" an="" action="" for="" each="" of="" the="" 24="" states.="" indicate="" which="" states,="" if="" any="" have="" multiple="" options="" remaining.="" click="" here="" to="" return="" to="" top="" of="" this="" page.="" what="" to="" submit:="" all="" homework="" for="" this="" course="" must="" be="" submitted="" electronically="" using="" blackboard.="" do="" not="" e-mail="" your="" assignment="" to="" a="" ta="" or="" instructor!="" if="" you="" are="" having="" difficulty="" with="" your="" blackboard="" account,="" you="" are="" responsible="" for="" resolving="" these="" problems="" with="" a="" ta,="" an="" instructor,="" or="" someone="" from="" irt,="" before="" the="" assignment="" it="" due.="" it="" is="" suggested="" you="" complete="" your="" work="" early="" so="" that="" your="" instructor="" or="" ta="" can="" help="" you="" if="" you="" have="" difficulty="" with="" this="" process.="" for="" this="" assignment,="" you="" must="" submit:="" a="" pdf="" document="" with="" your="" answers="" to="" the="" "written="" problems="" to="" be="" turned="" in",="" written="" documentation="" for="" your="" programs,="" results="" of="" your="" testing="" and="" analysis.="" your="" source="" code="" for="" matlab="" and/or="" other="" computational="" scripts="" you="">
Apr 18, 2021
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here