lecture06_handouts.pdf
(
222 KB
)
Pobierz
Evolutionary distance
Contents of the lecture
1
Evolutionary distance
Models of DNA evolution
Models of protein divergence
Evolutionary distance
The need for a measure of evolution
Recognition of orto/para/ksenology requires studies of kinship
and differentiation of genes or proteins,
The measure of relativeness has direct application to analyse
genetic history of genes and genomes (phylogenetic analysis),
and
to study speciation and evolution of entire organisms.
Evolutionary distance
Models of DNA evolution
Models of protein divergence
Distance vs (dis)similarity
For very similar sequences, the number of point changes is a
good measure of distance
but not when mutations accumulate.
Let’s make D mean „number of differences among current sequences”.
Evolutionary distance
Models of DNA evolution
Models of protein divergence
Conclusions?
The number of differences D between sequences is not
proportional to evolutionary distance,
The evolutionary distance is not (necessarily) proportional to time
(the rate of evolution is not constant!)
Drawbacks of D:
Nonlinear growth, e.g.
assume 20 mutations
→
D = 20 then 3x20 mutations
→
D = 51
D is not additive:
D
1
→
2
+
D
2
→
3
=
D
1
→
3
Evolutionary distance
Models of DNA evolution
Models of protein divergence
Models proposed for DNA
Jukes-Cantor (JC, 1969)
assumed equal probability of bases and mutations
Kimura two-parameter model (K2P, 1980 or Felsenstain 1984)
differentiated among transition and transversion
Tamura model (T92, 1992)
account for transition, transversion and GC content
Hasegawa-Kishino-Yano (HKY, 1985)
support for (observed) base frequencies
general time-reversible model (GTR)
accounting for all variables but not generally solvable.
None of the models proposed is aware of reading frames nor of
evolutionary pressure, and they are therefore advised for noncoding
sequences.
Evolutionary distance
Models of DNA evolution
Models of protein divergence
Jukes-Cantor
Assumptions:
Equal probabilities of base occurencies (=composition)
Constant rate of mutations.
A direct consequence is that all descendants gather similar number of
independent mutations.
Evolutionary distance
Models of DNA evolution
Models of protein divergence
Jukes-Cantor (cont.)
When comparing two sequences, they accumulate mutations
independently in time
t,
so the
time difference
between
descendants is counted as 2t;
Every base can be changed to one to the 3 other bases with
equal rate and probability;
Putting
α
as the mutation rate (per unit of time), we get:
d
=
2t
·
3
α
(def)
Evolutionary distance
Models of DNA evolution
Models of protein divergence
Jukes-Cantor (cont.)
The definition is impractical: neither
t
nor
α
is known.
However, the model does allow one to calculate the distance
d
from the observed fraction of differences
D
3
4
d
=
−
ln 1
−
D
4
3
4
The factors
3
and
3
are a consequence of the assumptions, that
4
is, equal mutation rate and equal probability of bases.
Evolutionary distance
Models of DNA evolution
Models of protein divergence
Jukes-Cantor (cont.)
2.5
2
1.5
1
0.5
0
0
0.1
0.2
0.3
0.4
0.5
0.6
Fraction of differences D
0.7
D
Jukes-Cantor
25% conserved res.
Evolutionary distance d
Evolutionary distance
Models of DNA evolution
Models of protein divergence
Jukes-Cantor (cont.)
The JC model can be summarized by the following
rate matrix
Q of
changes after one unit of time:
A
A
G
C
T
G
C
T
−
3
α
α
α
α
α
−
3
α
α
α
α
α
−
3
α
α
α
α
α
−
3
α
Evolutionary distance
Models of DNA evolution
Models of protein divergence
K2P model by Kimura
The main difference is different rates of transitions
α
and
transversions
β
A
A
G
C
T
G
C
T
−α −
2
β
α
β
β
α
−α −
2
β
β
β
β
β
−α −
2
β
α
β
β
α
−α −
2
β
Evolutionary distance
Models of DNA evolution
Models of protein divergence
K2P model by Kimura (cont.)
The model allow to calculate
d
from the proportion of transitions
S
and transversions
V
observed in the pairwise alignment
(D
=
S
+
V
)
1
1
d
=
−
ln
(
1
−
2S
−
V
)
−
ln
(
1
−
2V
)
2
4
note that there is no notion of DNA composition (e.g GC content)
Evolutionary distance
Models of DNA evolution
Models of protein divergence
Tamura 1992: GC content
Using
κ
as the transition to transversion frequence ratio.
The frequencies of bases (DNA composition) derived from GC
content
π
GC
.
A
A
G
C
T
G
C
T
−
∑
column
κπ
GC
/
2
π
GC
/
2
(
1
−
π
GC
)/
2
κ(
1
−
π
GC
)/
2
−
∑
column
π
GC
/
2
(
1
−
π
GC
)/
2
(
1
−
π
GC
)/
2
π
GC
/
2
−
∑
column
κ(
1
−
π
GC
)/
2
(
1
−
π
GC
)/
2
π
GC
/
2
κπ
GC
/
2
−
∑
column
Not suitable for ssDNA or RNA.
Evolutionary distance
Models of DNA evolution
Models of protein divergence
Tamura 1992: GC content
d
=
−
h
ln 1
−
Where:
h
=
2
π
GC
(
1
−
π
GC
)
S
h
−
V
−
(
1
−
h
)
ln
(
1
−
2V
)
2
1
S
fraction of transitions in the alignment,
V
fraction of transversions.
Evolutionary distance
Models of DNA evolution
Models of protein divergence
Other models
Hasegawa, Kishino and Yano proposed a model allowing for four
independent base frequencies, and differentiation between
transition and transversion.
HKY model does not provide a formula for
d
calculable from
observed differences only; it does however provide probabilities
for bayesian statistics.
The most general model is GTR (generalized time-reversible
model) but it is not solvable without a priori knowledge of
parameters.
other, more specific models are to be found in literature.
Plik z chomika:
twoj.inzynier
Inne pliki z tego folderu:
lecture06_handouts.pdf
(222 KB)
lecture06_text.pdf
(188 KB)
Inne foldery tego chomika:
lecture_01
lecture_02
lecture_03
lecture_04
lecture_05
Zgłoś jeśli
naruszono regulamin