. A hidden Markov model (HMM) can be used to describe the joint probability of a sequence of unobserved (hidden) discrete-state variables, H = ( H 0 , . . . , Hn ), and a sequence of corresponding...



.
A
hidden Markov model (HMM)
can be used to describe the joint probability of a sequence of unobserved (hidden) discrete-state variables,
H
= (H0, . . . , Hn), and a sequence of corresponding observed variables
O
= (O0, . . . , On) for which
Oi
is dependent on
Hi
for each
i. We say that
Hi
emits
Oi; consideration here is limited to discrete emission variables. Let the state spaces for elements of
H
and
O
be
H
and
E, respectively. Let
Oj
and
O
>j
denote the portions of
O
with indices not exceeding
j
and exceeding
j, respectively, and define the analogous partial sequences for
H. Under an HMM, the
Hi
have the Markov property



P[Hi|Hi−1,O0] =
P[Hi|Hi−1] (4.90) and the emissions are conditionally independent, so



P[Oi|H
,
Oi−1,
O
>i] =
P[Oi|Hi].
(4.91) Time-homogeneous transitions of the hidden states are governed by transition probabilities



p(h, h


∗) =
P[Hi+1 =
h


∗|Hi
=
h] for
h, h
∗ ∈
H. The distribution for
H0 is
parameterized by
π(h) =
P[H0 =
h] for
h

H. Finally, define emission probabilities



e(h, o) =
P[Oi
=
o|Hi
=
h] for
h

Hand
o

E. Then the parameter set

θ

= (
π

,

P
,

E) completely parameterizes the model, where

π

is a vector of initial-state probabilities,
P
is a matrix of transition probabilities, and
E
is a matrix of emission probabilities. For an observed sequence
o, define the
forward variables
to be
α(i, h) =
P[Oi
=
oi, Hi
=
h] (4.92)


and the
backward variables
to be
β(i, h) =
P[O
>i
=
o
>i|Hi
=
h] (4.93) for
i
= 1, . . . , n
and each
h

H. Our notation suppresses the dependence of the forward and backward variables on

θ
. Note that The forward and backward variables are also useful for computing the probability that state
h
occurred at the
ith position of the sequence given
O
=
o
according to
P[Hi
=
h|O
=
o
,
θ
] = _
hH α(i, h)β(i, h)/P[O
=
o|
θ
], and expectations of functions of the states with respect to these probabilities.



a.
Show that the following algorithms can be used to calculate
α(i, h) and
β(i, h). The
forward algorithm
is


• Initialize
α(0, h) =
π(h)e(h, o0).


• For
i
= 0, . . . , n
− 1, let
α(i
+ 1, h) =


• Initialize
β(n, h) = 1.


• For
i
=
n, . . . ,
1, let
β(i
− 1, h) =


∗These algorithms provide very efficient methods for finding
P[O
=
o|
θ
] and other useful probabilities, compared to naively summing over all possible sequences of states.



b.
Let
N(h) denote the number of times
H0 =
h, let
N(h, h


∗) denote the number of transitions from
h
to
h


∗, and let
N(h, o) denote the number of emissions of
o
when the underlying state is
h. Prove that these random variables have the following expectations:



c.
The
Baum–Welch algorithm
efficiently estimates the parameters of an HMM [25]. Fitting these models has proven extremely useful in diverse applications including statistical genetics, signal processing and speech recognition, problems involving environmental time series, and Bayesian graphical networks [172, 236, 361, 392, 523]. Starting from some initial values

θ
(0), the Baum–Welch algorithm proceeds via iterative application of the following update formulas: Prove that the Baum–Welch algorithm is an EM algorithm. It is useful to begin by noting that the complete data likelihood is given by



d.
Consider the following scenario. In Flip’s left pocket is a penny; in his right pocket is a dime. On a fair toss, the probability of showing a head is
p
for the penny
d
for the dime. Flip randomly chooses a coin to begin, tosses it, and reports the outcome (heads or tails) without revealing which coinwas tossed. Then, Flip decides whether to use the same coin for the next toss, or to switch to the other coin. He switches coins with probability
s, and retains the same coin with probability 1 −
s. The outcome of the second toss is reported, again not revealing the coin used. This process is continued for a total of 200 coin tosses. The resulting sequence of heads and tails is available from the website for this book. Use the Baum–Welch algorithm to estimate
p,
d, and
s.



e.
Only for students seeking extra challenge: Derive the Baum–Welch algorithm for the case when the dataset consists ofM
independent observation sequences arising from aHMM.Simulate such data, following the coin example above. (You may wish to mimic the single-sequence data, which were simulated using
p
= 0.25,
d
= 0.85, and
s
= 0.1.) Code the Baum–Welch algorithm, and test it on your simulated data. In addition to considering multiple sequences, HMMs and the Baum–Welch algorithm can be generalized for estimation based on more general emission variables and emission and transition probabilities that have more complex parameterizations, including time inhomogeneity.








May 05, 2022
SOLUTION.PDF

Get Answer To This Question

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here