0544.pdf

(4024 KB) Pobierz

Nonequispaced Fourier Transforms for Protein-Protein Docking

∗

Julio E. Castrill´ n-Cand´ s

†

Vinay Siddavanahalli

‡

Chandrajit Bajaj

Department of Computer Sciences, & Institute of Computational Engineering and Sciences,

Computational Visualization Center

University of Texas at Austin

Austin, TX 78712

ICES REPORT 05-44

October 26, 2005

Abstract

In this paper we introduce a grid free approximate Fast Convolution algorithm base on Nonequispaced

Fourier Transforms for accurately predicting rigid body protein-protein docking sites. Of the many docking

approaches, grid based Fast Fourier Transform (FFT) approaches have been shown to produce the best bal-

ance between computational complexity and accuracy of the correlation proﬁles of complex protein-protein

interactions over the six dimensional search space. However, these uniform sampling methods are still com-

putationally intractable and highly memory intensive for predicting large protein-protein docking sites. In this

paper we introduce an error bounded FFT for nonequispaced data approach that signiﬁcantly improves compu-

tational complexity and storage. We are able to produce efﬁciently, highly compressed, but accurate, docking

correlation proﬁles.

∗

This work was supported in part by NSF grants ACI-0220037, EIA-0325550, and NIH grants 0P20 RR020647, and R01 GM074258

julio@ices.utexas.edu

‡

skvinay@cs.utexas.edu

bajaj@cs.utexas.edu

†

1 Introduction

Efforts in structural proteomics have lead to a rapid increase in the number of three-dimensional (3-D) struc-

tures of individual proteins. Moreover, knowledge of networks of interactions and signaling pathways is also

expanding rapidly through genomic and proteomics approaches. Still, our picture of the structures of both stable

and transient protein interactions lags behind. Efforts in crystallizing macromolecular complexes have met with

limited success, and hybrid experimental approaches, utilizing cryo-electron microscopy and crystallography

or NMR to give structural details of complex assemblies are evolving. However, along with these experimental

methods, there is a growing need for efﬁcient and robust computational approaches to predicting the complexed

viable structures in protein-protein interactions. These approaches are also known as protein-protein docking.

Protein-protein docking or in general molecular docking usually consists of two primary selections. One is

the choice of goodness of ﬁt measure (sometimes called the scoring function) while the other is the choice of

the search algorithm. Both of these decisions are based on an assumed molecular model. The scoring function

includes consideration for molecular properties in addition to a representation of molecular shape. Grid based

Fast Fourier Transform (FFT) approaches have been shown to produce highly accurate correlation proﬁles

of complex protein-protein docking making them a popular choice for solving the above docking site search

problem. However, they are time consuming, and in particular, highly memory intensive for large molecules

due to the large size of the grid needed. In this paper we introduce an adaptive grid-free irregularly spaced

Fourier approach for accurately predicting rigid body protein-protein docking sites.

Problem Description

For molecule

let

→

be the

associated density map for

= 1

. . . m,

where each map

represents a molecular shape or property. Similarly for molecule

we have

→

maps for

. . . m.

Let

) :

→

and

) :

→

be the scoring functions deﬁned on

. For a rotation

the 3D rotation group

SO(3),

the rotation operator

is deﬁned as

S(x)

S(R

−1

(x))

∀x ∈

where

= (x,

y, z).

Similarly, the translator operator

j,k,l

is deﬁned as

j,k,l

(x,

y, z))

−

j, y

−

k, z

−

for

j, k, l

∈

The six dimensional search docking problem, can be posed as the following correlation problem

arg max

j,k,l,R

i=1

x∈R

y∈R

Re T

j,k,l

(Λ

(x))))

(y))

dxdy,

(1)

where

corresponds to the real part. This problem is also equivalent the following ﬁtting minimization

problem (see section 4):

arg min

j,k,l,R

i=1

j,k,l

(Λ

(x))))

−

(y))

(2)

where

corresponds to the

)

norm. In section 3 an approximation to the correlation map in equation 1

is obtained by an approximate convolution map

(x,

y, z).

To match these two maps the rotation

is modiﬁed

to include the

rad ﬂip around each axis. For a rotation

in the 3D rotation group

SO(3),

the rotation operator

is deﬁned as

where

, Z

and

are the rotations around each axis in

, then

S(x)

S(R

−1

(x))

∀x ∈

Note that alternate formulations lead tothe same search space. Indeed, in [35] split the search problem into 5D

rotations and a 1D translation.

Molecular shapes have a natural smooth particle atomistic or quasi-atomistic representation. By taking ad-

vantage of the adaptive smooth particle representation, we eliminate the underlying grid thus producing highly

compressed, but accurate, correlation proﬁles based on an adaptive irregularly spaced FFT algorithm. Our

docking method primarily consists of three steps: First, we select an adaptive smooth particle representation

for proteins which is also compatible with our initial shape-complementarity based scoring function. Second,

we calculate the frequency proﬁles directly from the smooth particle representation, and search effectively over

six dimensional translation and rotational space, utilizing the irregularly spaced FFT, and ﬁnally, we evaluate a

compressed correlation proﬁle which captures the rigid body protein-protein docking sites.

The rest of the paper is as follows. In section 2, we summarize the main Fourier based approaches to

the rigid body protein-protein docking problem. Moreover, the different approaches to the FFT over irregularly

sampled domains are described. A complexity analysis of grid and spherical harmonic Fourier based algorithms

for docking and matching is given in Appendix A. In section 3 the main part of the algorithm is described. In

section 3.1 a smooth particle representation of molecular maps and afﬁnity functions is introduced alongwith

the corresponding shape complementarity based scoring function to capture rigid body protein-protein docking.

With a suitable shape complementarity based scoring function deﬁned, the search algorithm is separated

into two parts: the translational Fourier based search and the rotational search. In section 3.3 we show how

to reduce the computational and storage costs of the translation search algorithm with our method. Traditional

grid based Fourier approach embeds the two molecular maps in a

grid and convolve them using the FFT

leading to

O(N

log

)

time. In our irregularly spaced Fourier method, we assume both molecules to contain

atoms (you can take the maximum number of atoms from both molecules). An accurate approximate

correlation proﬁle is derived in

O(M

log

)

computational steps and

O(M

)

storage. In practice

is much

smaller that

. In section 4 error estimates for the convolution proﬁle obtained with our method are derived.

Finally, in section 5 we describe our implementation and report on a few docking results including the

actual timing and the accuracy of our correlation proﬁles.

We point out that during the writing of this tech report a similar work by Potts et al [33] was made aware

to us. The purpose of that work is to build a fast summation algorithm of radially symmetric functions with

general kernels as an alternative to Multi-pole fast summation methods. Our research involves the development

of a more general method for fast convolution of radially symmetric functions with the purpose of predicting

molecular docking sites.

2 Prior Work

In this section we brieﬂy review past docking approaches with an emphasis on techniques applying Fourier

search. A review on irregularly sampled Fourier transforms in also presented.

2.1 Molecular Shape and Afﬁnity Functions

Solvent molecule (probe )

Solvent

Accessible

Surface (SAS)

Solvent

Excluded

Surface (SES)

Figure 1: Solvent Accessible and Solvent Excluded Surfaces.

Various molecular surfaces have been deﬁned (Figure 1) using a spherical representation of individual

atoms and a spherical probe representing a solvent molecule. The SAS is outlined by the center of the probe

sphere as it “rolls” over the atoms constituting a molecule [11]. The SES [25], [11], is deﬁned as the inner

boundary of the volume that can be occupied by solvent in contact with the molecule. A number of algorithms

have been developed to compute these surfaces [1, 3, 4, 11, 29, 36, 37, 41–43] for the purpose of visualization

and various computations. It is interesting to observe that the SES of proteins forming molecular complexes

exhibits a very high level of geometrical complementarity. These surfaces are used extensively for visualizing

and studying molecular properties and interactions. However, these surfaces are approximations of a somewhat

fuzzy boundary of the molecule’s electron density. Surfaces are also used to visualize molecular properties

associated with molecular shape (e.g. charge density, electrostatic potential, hydrophobicity,

etc.)(

[12, 22]).

Such surfaces are usually level sets of scalar ﬁelds and their gradient or Laplacian ( [11, 29, 37, 42, 43]).

Molecular shape (surface and volumetric) are also derived from approximations of an appropriate level set

of electron density [6, 13, 28, 34]. The accurate computation of electron density representations for molecules

from the PDB requires computations at the quantum mechanical level [7]. One usually approximates the

electron density distribution of the

atom with a Gaussian function ( [3, 6, 7, 20, 30, 31, 38]) as

(r) = exp

−B

where

0 is the rate of decay parameter,

is the Van der Waals radius of the

atom and

= (x

−

)

−

)

+(z

−

)

({x

, y

, z

}

is the center of the

atom). A volumetric representation of the molecule

may now be obtained by summing the contributions from each single atom, thus the electron density

I(x)

for

atoms is described as

(

−B)

I(x)

i=1

(r) =

i=1

(3)

Notice that for protein structures,

can be grouped into a set of about 15 distinct values.

A critical component of all docking approaches is deﬁning a suitable measure for the afﬁnity functions in

the scoring calculations. Paper [23] separates the afﬁnity functions into core and a surface skin with the objec-

tive to penalize core-core clashes, but add positively surface skin-surface skin overlaps. By assigning different

afﬁnities to the core and the molecular surface skin of each atom and performing a convolution between these

weighted maps, a proﬁle is obtained where the largest values conform to the best translational overlap. Modiﬁ-

cations of this approach have been developed in ( [9,10,19,26]). They deﬁne the core and the skin regions using

the molecular surfaces like the

solvent accessible surface

(SAS) and

solvent excluded surface

(SES). Other ap-

proaches include adapting scoring functions for molecular

matching

[8, 21]. These scoring functions are also

designed to match molecular functional properties, such as electrostatics potential. They can be modiﬁed for

docking by forming a function

for molecule

and

for the complementary volume for molecule

2.2 Grid Based Fourier Methods

Katchalski-Katzir et. al.’s [23] use coarse grids and rotational angles to reduce the combinatorics of the search.

Gabb et.

al.

[19] use the a priori knowledge of suitable binding site locations on the proteins to reduce the

combinatorics of possible relative conformations. Fast Fourier Transforms are used in each of [19, 23, 35] to

additionally speed up the cumulative scoring function computations and hence the search. Moreover, in [9, 10]

Chen et.

al.

improve on FFT Grid based methods [19] with better scoring functions and additional molecular

properties.

2.3 Spherical Harmonic Fourier Methods

Several groups [13, 28, 35] studied the problem of representing molecular surfaces with expansions of spherical

harmonic functions and its application to fast computations of the protein docking problem.

Efﬁciency is additionally gained from the fast rotation and cumulative correlation function computations

involving coefﬁcients of spherical harmonic polynomials. To combat the numerical intensive trigonometric

computations in these methods, many values are precomputed and cached in a direct trade-off of memory for

increased speed. For example, most of the sine and cosine terms of the spherical harmonic expansion are

Plik z chomika:

xyzgeo

Inne pliki z tego folderu:

lecture(2).ppt (4091 KB)
0544(4).pdf (4024 KB)
proteindocking(5).pdf (1853 KB)
gkq311(4).pdf (4167 KB)
1471-2105-12-36(4).pdf (1183 KB)

0544.pdf

Plik z chomika:

Inne pliki z tego folderu:

Inne foldery tego chomika: