motif_mixture(5).pdf

(133 KB) Pobierz
Motif representation using position
weight matrix
Xiaohui Xie
University of California, Irvine
Motif representation using position weight matrix – p.1/31
Position weight matrix
Position weight matrix representation of a motif with width
w:
θ
11
θ
12
θ
13
θ
14
θ
θ
22
θ
23
θ
24
21
θ
=
(1)
···
θ
w1
θ
w2
θ
w3
θ
w4
where each row represents one position of the motif, and
is normalized:
4
θ
ij
= 1
j=1
(2)
for all
i
= 1, 2,
· · ·
, w.
Motif representation using position weight matrix – p.2/31
Likelihood
Given the position weight matrix
θ,
the probability of
generating a sequence
S
= (S
1
, S
2
,
· · ·
, S
w
)
from
θ
is
w
P
(S|θ) =
i=1
w
P
(S
i
i
)
θ
i,S
i
i=1
(3)
=
(4)
For convenience, we have converted
S
from a string of
{A,
C, G, T
}
to a string of
{1,
2, 3, 4}.
Motif representation using position weight matrix – p.3/31
Likelihood
Suppose we observe not just one, but a set of sequences
S
1
, S
2
,
· · ·
, S
n
, each of which contains exactly
w
letters.
Assume each of them is generated independently from
the model
θ.
Then, the likelihood of observing these
n
sequences is
n
P
(S
1
, S
2
,
· · ·
, S
n
|θ)
=
k=1
n
P
(S
k
|θ)
w
w
4
c
ij
θ
ij
i=1 j=1
(5)
=
k=1 i=1
θ
i,S
ki
=
(6)
where
c
ij
is the number of letter
j
at position
i
(Note that
4
j=1
c
ij
=
n
for all
i).
Motif representation using position weight matrix – p.4/31
Parameter estimation
Now suppose we do not know
θ.
How to estimate it from
the observed sequence data
S
1
, S
2
,
· · ·
, S
n
?
One solution: calculate the likelihood of observing the
provided
n
sequences for different values of
θ,
n
w
L(θ)
=
P
(S
1
, S
2
,
· · ·
, S
n
|θ)
=
k=1 i=1
θ
i,S
ki
(7)
Pick the one with the largest likelihood, that is, to find
θ
that
max
θ
P
(S
1
, S
2
,
· · ·
, S
n
|θ)
(8)
Motif representation using position weight matrix – p.5/31
Zgłoś jeśli naruszono regulamin