0544.pdf

(4024 KB) Pobierz
Nonequispaced Fourier Transforms for Protein-Protein Docking
Julio E. Castrill´ n-Cand´ s
o
a
Vinay Siddavanahalli
Chandrajit Bajaj
§
Department of Computer Sciences, & Institute of Computational Engineering and Sciences,
Computational Visualization Center
University of Texas at Austin
Austin, TX 78712
ICES REPORT 05-44
October 26, 2005
Abstract
In this paper we introduce a grid free approximate Fast Convolution algorithm base on Nonequispaced
Fourier Transforms for accurately predicting rigid body protein-protein docking sites. Of the many docking
approaches, grid based Fast Fourier Transform (FFT) approaches have been shown to produce the best bal-
ance between computational complexity and accuracy of the correlation profiles of complex protein-protein
interactions over the six dimensional search space. However, these uniform sampling methods are still com-
putationally intractable and highly memory intensive for predicting large protein-protein docking sites. In this
paper we introduce an error bounded FFT for nonequispaced data approach that significantly improves compu-
tational complexity and storage. We are able to produce efficiently, highly compressed, but accurate, docking
correlation profiles.
This work was supported in part by NSF grants ACI-0220037, EIA-0325550, and NIH grants 0P20 RR020647, and R01 GM074258
julio@ices.utexas.edu
skvinay@cs.utexas.edu
§
bajaj@cs.utexas.edu
1 Introduction
Efforts in structural proteomics have lead to a rapid increase in the number of three-dimensional (3-D) struc-
tures of individual proteins. Moreover, knowledge of networks of interactions and signaling pathways is also
expanding rapidly through genomic and proteomics approaches. Still, our picture of the structures of both stable
and transient protein interactions lags behind. Efforts in crystallizing macromolecular complexes have met with
limited success, and hybrid experimental approaches, utilizing cryo-electron microscopy and crystallography
or NMR to give structural details of complex assemblies are evolving. However, along with these experimental
methods, there is a growing need for efficient and robust computational approaches to predicting the complexed
viable structures in protein-protein interactions. These approaches are also known as protein-protein docking.
Protein-protein docking or in general molecular docking usually consists of two primary selections. One is
the choice of goodness of fit measure (sometimes called the scoring function) while the other is the choice of
the search algorithm. Both of these decisions are based on an assumed molecular model. The scoring function
includes consideration for molecular properties in addition to a representation of molecular shape. Grid based
Fast Fourier Transform (FFT) approaches have been shown to produce highly accurate correlation profiles
of complex protein-protein docking making them a popular choice for solving the above docking site search
problem. However, they are time consuming, and in particular, highly memory intensive for large molecules
due to the large size of the grid needed. In this paper we introduce an adaptive grid-free irregularly spaced
Fourier approach for accurately predicting rigid body protein-protein docking sites.
Problem Description
For molecule
A,
let
V
iA
:
R
3
R
be the
i
th
associated density map for
i
= 1
. . . m,
where each map
represents a molecular shape or property. Similarly for molecule
B
we have
V
iB
:
R
3
R
maps for
i
=
ˆ
1
. . . m.
Let
S
i
(V
i
) :
R
3
C
and
S
i
(V
i
) :
R
3
C
be the scoring functions defined on
V
i
. For a rotation
R
in
the 3D rotation group
SO(3),
the rotation operator
Λ
R
is defined as
Λ
R
S(x)
:=
S(R
−1
(x))
∀x ∈
R
3
,
where
x
= (x,
y, z).
Similarly, the translator operator
T
j,k,l
is defined as
T
j,k,l
(S
i
(x,
y, z))
=
S
i
(x
j, y
k, z
l)
for
j, k, l
R.
The six dimensional search docking problem, can be posed as the following correlation problem
m
arg max
j,k,l,R
i=1
x∈R
3
y∈R
3
ˆ
Re T
j,k,l
R
(S
i
(V
iA
(x))))
S
i
(V
iB
(y))
dxdy,
(1)
where
Re
corresponds to the real part. This problem is also equivalent the following fitting minimization
problem (see section 4):
m
arg min
j,k,l,R
i=1
ˆ
T
j,k,l
R
(S
i
(V
iA
(x))))
S
i
(V
iB
(y))
,
(2)
1
where
·
corresponds to the
L
2
(R
3
)
norm. In section 3 an approximation to the correlation map in equation 1
ˆ
is obtained by an approximate convolution map
P
(x,
y, z).
To match these two maps the rotation
R
is modified
to include the
π
rad flip around each axis. For a rotation
R
in the 3D rotation group
SO(3),
the rotation operator
Λ
R
is defined as
R
:=
RZ
1
Z
2
Z
3
,
where
Z
1
, Z
2
and
Z
3
are the rotations around each axis in
R
3
, then
Λ
R
S(x)
:=
S(R
−1
(x))
∀x ∈
R
3
.
Note that alternate formulations lead tothe same search space. Indeed, in [35] split the search problem into 5D
rotations and a 1D translation.
Molecular shapes have a natural smooth particle atomistic or quasi-atomistic representation. By taking ad-
vantage of the adaptive smooth particle representation, we eliminate the underlying grid thus producing highly
compressed, but accurate, correlation profiles based on an adaptive irregularly spaced FFT algorithm. Our
docking method primarily consists of three steps: First, we select an adaptive smooth particle representation
for proteins which is also compatible with our initial shape-complementarity based scoring function. Second,
we calculate the frequency profiles directly from the smooth particle representation, and search effectively over
six dimensional translation and rotational space, utilizing the irregularly spaced FFT, and finally, we evaluate a
compressed correlation profile which captures the rigid body protein-protein docking sites.
The rest of the paper is as follows. In section 2, we summarize the main Fourier based approaches to
the rigid body protein-protein docking problem. Moreover, the different approaches to the FFT over irregularly
sampled domains are described. A complexity analysis of grid and spherical harmonic Fourier based algorithms
for docking and matching is given in Appendix A. In section 3 the main part of the algorithm is described. In
section 3.1 a smooth particle representation of molecular maps and affinity functions is introduced alongwith
the corresponding shape complementarity based scoring function to capture rigid body protein-protein docking.
With a suitable shape complementarity based scoring function defined, the search algorithm is separated
into two parts: the translational Fourier based search and the rotational search. In section 3.3 we show how
to reduce the computational and storage costs of the translation search algorithm with our method. Traditional
grid based Fourier approach embeds the two molecular maps in a
N
3
grid and convolve them using the FFT
leading to
O(N
3
log
N
)
time. In our irregularly spaced Fourier method, we assume both molecules to contain
M
atoms (you can take the maximum number of atoms from both molecules). An accurate approximate
correlation profile is derived in
O(M
log
M
)
computational steps and
O(M
)
storage. In practice
M
is much
smaller that
N
3
. In section 4 error estimates for the convolution profile obtained with our method are derived.
Finally, in section 5 we describe our implementation and report on a few docking results including the
actual timing and the accuracy of our correlation profiles.
We point out that during the writing of this tech report a similar work by Potts et al [33] was made aware
to us. The purpose of that work is to build a fast summation algorithm of radially symmetric functions with
general kernels as an alternative to Multi-pole fast summation methods. Our research involves the development
of a more general method for fast convolution of radially symmetric functions with the purpose of predicting
molecular docking sites.
2
2 Prior Work
In this section we briefly review past docking approaches with an emphasis on techniques applying Fourier
search. A review on irregularly sampled Fourier transforms in also presented.
2.1 Molecular Shape and Affinity Functions
Solvent molecule (probe )
Solvent
Accessible
Surface (SAS)
Solvent
Excluded
Surface (SES)
Figure 1: Solvent Accessible and Solvent Excluded Surfaces.
Various molecular surfaces have been defined (Figure 1) using a spherical representation of individual
atoms and a spherical probe representing a solvent molecule. The SAS is outlined by the center of the probe
sphere as it “rolls” over the atoms constituting a molecule [11]. The SES [25], [11], is defined as the inner
boundary of the volume that can be occupied by solvent in contact with the molecule. A number of algorithms
have been developed to compute these surfaces [1, 3, 4, 11, 29, 36, 37, 41–43] for the purpose of visualization
and various computations. It is interesting to observe that the SES of proteins forming molecular complexes
exhibits a very high level of geometrical complementarity. These surfaces are used extensively for visualizing
and studying molecular properties and interactions. However, these surfaces are approximations of a somewhat
fuzzy boundary of the molecule’s electron density. Surfaces are also used to visualize molecular properties
associated with molecular shape (e.g. charge density, electrostatic potential, hydrophobicity,
etc.)(
[12, 22]).
Such surfaces are usually level sets of scalar fields and their gradient or Laplacian ( [11, 29, 37, 42, 43]).
Molecular shape (surface and volumetric) are also derived from approximations of an appropriate level set
of electron density [6, 13, 28, 34]. The accurate computation of electron density representations for molecules
from the PDB requires computations at the quantum mechanical level [7]. One usually approximates the
electron density distribution of the
i
th
atom with a Gaussian function ( [3, 6, 7, 20, 30, 31, 38]) as
3
K
i
(r) = exp
Br
2
2
−B
,
R
i
where
B
<
0 is the rate of decay parameter,
R
i
is the Van der Waals radius of the
i
th
atom and
r
2
= (x
x
ci
)
2
+
(y
y
ci
)
2
+(z
z
ci
)
2
({x
ci
, y
ci
, z
ci
}
is the center of the
i
th
atom). A volumetric representation of the molecule
may now be obtained by summing the contributions from each single atom, thus the electron density
I(x)
for
M
atoms is described as
M
M
(
Br
2
−B)
R
i
2
I(x)
=
i=1
K
i
(r) =
e
i=1
.
(3)
Notice that for protein structures,
R
i
can be grouped into a set of about 15 distinct values.
A critical component of all docking approaches is defining a suitable measure for the affinity functions in
the scoring calculations. Paper [23] separates the affinity functions into core and a surface skin with the objec-
tive to penalize core-core clashes, but add positively surface skin-surface skin overlaps. By assigning different
affinities to the core and the molecular surface skin of each atom and performing a convolution between these
weighted maps, a profile is obtained where the largest values conform to the best translational overlap. Modifi-
cations of this approach have been developed in ( [9,10,19,26]). They define the core and the skin regions using
the molecular surfaces like the
solvent accessible surface
(SAS) and
solvent excluded surface
(SES). Other ap-
proaches include adapting scoring functions for molecular
matching
[8, 21]. These scoring functions are also
designed to match molecular functional properties, such as electrostatics potential. They can be modified for
docking by forming a function
f
for molecule
A
and
g
for the complementary volume for molecule
B.
2.2 Grid Based Fourier Methods
Katchalski-Katzir et. al.’s [23] use coarse grids and rotational angles to reduce the combinatorics of the search.
Gabb et.
al.
[19] use the a priori knowledge of suitable binding site locations on the proteins to reduce the
combinatorics of possible relative conformations. Fast Fourier Transforms are used in each of [19, 23, 35] to
additionally speed up the cumulative scoring function computations and hence the search. Moreover, in [9, 10]
Chen et.
al.
improve on FFT Grid based methods [19] with better scoring functions and additional molecular
properties.
2.3 Spherical Harmonic Fourier Methods
Several groups [13, 28, 35] studied the problem of representing molecular surfaces with expansions of spherical
harmonic functions and its application to fast computations of the protein docking problem.
Efficiency is additionally gained from the fast rotation and cumulative correlation function computations
involving coefficients of spherical harmonic polynomials. To combat the numerical intensive trigonometric
computations in these methods, many values are precomputed and cached in a direct trade-off of memory for
increased speed. For example, most of the sine and cosine terms of the spherical harmonic expansion are
4
Zgłoś jeśli naruszono regulamin