Next: The FAMUSAMM algorithm Up: FAMUSAMM: An Algorithm for Previous: FAMUSAMM: An Algorithm for

Introduction

In many cases dynamical processes essentially contribute to protein function, e.g., in ligand binding or in enzymatic reactions. Representing a microscopic description of protein dynamics in the framework of classical mechanics, molecular dynamics (MD) simulations [1, 2, 3] can serve as a tool to interpret experimental data and to provide predictions on hitherto unobserved processes.

MD simulations of protein dynamics pose a computational challenge, since suitable simulation systems have to include the native environment (e.g., water and lipid molecules) of these biological macromolecules [4, 5, 6, 7] and, therefore, typically have to comprise several 10,000 atoms. Furthermore, they have to use femtosecond integration time steps to enable smooth descriptions of the fastest degrees of freedom. Simulations of such systems are currently limited to nanoseconds even if the most powerful supercomputers are employed. Although there are biochemically important processes which actually occur on a nanosecond time scale [8] and have been successfully described by MD simulations [9, 10], most biochemical processes occur on time scales well above nanoseconds and, thus, are inaccessible to conventional MD methods.

In the MD approach the classical Newtonian equations are solved numerically to obtain the motion of all atoms [11, 12, 13]. For each numerical integration step the Coulomb sum

between all pairs of atoms (i, j) with partial charges and positions has to be evaluated. In common force fields for protein and solvent molecules most atoms carry partial charges; therefore the evaluation of U dominates the computational effort in MD simulations as it scales quadratically with the number N of charged particles.

In truncation methods [14], which simply neglect long range electrostatic interactions beyond a cutoff distance of typically 8 - 10Å, the computation of U scales with N instead of and, consequently, such methods are widely used in MD simulations. However, the truncation of electrostatic interactions leads to serious errors in forces, energies and other observables. These errors are generally too large to be acceptable for descriptions of protein dynamics [15, 16, 17, 18]. For instance, using a cutoff distance of 10Å the relative errors in the computation of electrostatic forces are of the order of 10% [19]. One of the major artifacts concerning the description of protein dynamics, that is entailed by these sizable errors, is a suppression of dynamical correlations between slowly fluctuating protein modes, as these are mainly coupled by the neglected long-range Coulomb forces [16, 18]. Thus, truncation of electrostatics trades a description of protein dynamics which is computationally prohibitive but adequate in the sense of basic physics for a description which is computationally manageable but physically inadequate.

As a result of the above considerations, one of the major tasks which has to be tackled in the strive for improved MD methods is the design of approximate algorithms for computationally efficient and sufficiently accurate evaluation of long-range Coulomb interactions and integration of the Newtonian equations.

However, any attempt to solve the problems stated above is confronted with two difficulties: First, requiring efficiency and accuracy on equal footings typically represents a contradiction, since approximate algorithmic schemes tend to gain computational efficiency at the cost of accuracy; that contradiction already has shown up in the above discussion of truncation methods and has to be taken care of in the design of any new approximation scheme. Second, it is much simpler to estimate the computational complexity of an approximate MD algorithm, since this is just a matter of counting floating point operations, than to judge its accuracy. In fact, as discussed in refs. [16] and [20], the latter task is highly nontrivial due to the complicated, nonlinear, and chaotic dynamics of macromolecular systems and requires careful comparisons of a variety of statistical observables on structural and dynamical properties which have to be extracted from extensive sample simulations. For instance, the particular artifact of the truncation method noted above, i.e., the suppression of dynamical correlations, became apparent only after the progress of computer technology had enabled simulations of sufficiently long duration. To the extent that quality assessments of approximate MD algorithms will become necessary in the present paper, we will adopt the problem adapted approach developed in refs. [16] and [20] and will employ the procedures and some of the statistical observables which have been suggested there as useful and suitable tools to judge algorithmic accuracy (see also further below).

Up to now various approaches have been suggested which aim at a more efficient approximate treatment of the long-range electrostatic interactions in MD simulations. These approaches may be classified into two categories: multipole [21, 22, 23, 24, 25, 26, 18, 19] and multiple-time-step methods [27, 28, 29, 30, 31, 32, 33, 16].

Multipole methods approximate the long-range forces originating from a group of point charges by truncated multipole expansions of their electrostatic potential. Using a hierarchy of grids for subdivision of space, nested on multiple scales, and a corresponding hierarchical organization of charge groups and multipole expansions [22] a computational complexity of O(N log N) is achieved. By additionally using a hierarchy of local Taylor expansions for the evaluation of the electrostatic forces acting on particles Greengard and Rokhlin have constructed the so-called fast multipole method (FMM) that even scales with O(N) for large systems [23, 24, 25]. Several implementations of this method suited for MD simulations have been reported [34, 35, 36].

For MD simulations of biomolecules the FMM-type grouping of charges, defined by a fixed and regular subdivision of space, requires multipole expansions of rather high order as to achieve sufficient numerical accuracy [24]. If, instead, charge grouping is adapted to structural and dynamical properties of the simulated biomolecules, multipole expansions can be truncated at low orders without large sacrifices of accuracy [19, 26, 18]. Accordingly, such a structure adapted multipole method (SAMM) provides a further substantial speed-up for MD simulations.

Multiple-time-step methods are based on the observation that forces between distant atoms generally exhibit slower fluctuations than forces between close atoms. Therefore, without significant loss of accuracy, the more slowly fluctuating forces can be treated with longer integration step sizes. The required separation of forces can be implemented, e.g., by grouping atom pairs into distance classes. Simple multiple-time-step methods define only two distance classes, an inner and an outer one [27], whereas more advanced methods employ a hierarchy of such classes [28, 29, 30, 31, 32, 33, 16].

As one can expect from the general considerations presented further above, the enhanced efficiency of multipole and multiple-time-step methods is obtained at the cost of various kinds of algorithmic artifacts. A typical example is the artificial introduction of ``algorithmic noise'', which arises from discontinuous and discretized representations of continuous and smooth functions and processes. Such noise effectively can act like an uncontrolled stochastic force on to the atomic motion, and can turn the simulated dynamics into a Langevin-type dynamics with the nasty property, that the algorithmic noise is not properly balanced by a corresponding friction term and, therefore, constantly adds heat to the simulated system. By thorough studies of sample simulations these artifacts have been shown to be negligible in the case of the SAMM method [19, 26, 18] and to be optimally small in the case of a particular multiple-time-step procedure, the so-called DC-1d extrapolation scheme, which has been suggested and carefully compared with other methods in ref. [16].

From a general point of view, multipole methods and multiple-time-step methods exploit different regularities of Coulomb interactions as to obtain enhanced computational efficiency: the former make use of regularities in space, whereas the latter exploit regularities in time. This complementarity led us to propose [16] that a combination of both concepts should be capable to render additional speed-ups. In view of the proven superiority of the SAMM method as compared to fixed grid FMM procedures on the one hand, and of the DC-1d scheme as compared to other multiple-time-step schemes on the other hand, these two methods appear to be the natural candidates for the envisaged combination. It is the purpose of the present paper to show how a suitable combination of these methods can be achieved and to demonstrate that the resulting fast multiple-time-step structure-adapted multipole method (FAMUSAMM) actually preserves and combines the advantages of the parent methods with respect to enhanced computational efficiency and to the lack of substantial algorithmic artifacts. In particular we will demonstrate by comparison of a combination of SAMM with other multiple-time-step schemes, that the DC-1d procedure actually is the method of choice.

Windemuth [37] has recently presented a first step towards a combination of multipole and time-step methods; using a simple variant of a multiple-time-step method with only two distance classes and a conventional FMM algorithm he achieves considerable speed-ups. Unfortunately, a quality assessment is lacking in Windemuth's paper. Zhou and Berne [38] have reported a more advanced combination of the multiple-time-step algorithm RESPA [31] with a slightly modified fast multipole method using a rectangular subdivision of the simulation volume. Here, quality assessments are based on short MD simulations in the ps range. For MD simulations with periodic boundaries [39, 40, 41] a fast method based on a combination of RESPA and the smooth particle mesh Ewald method [42] has been described [43]. The latter method is particularly useful for simulations of crystalline structures, e.g., protein crystals. Also, for the simulation of non-crystalline systems, like in lipid membranes soluted proteins [4], periodic boundaries have been used as an approximation due to the availability of efficient algorithms. In the latter case, however, it is a priori not clear whether that approximation is justified (for a discussion of possible artifacts see, e.g., ref. [44, 45, 46, 47]). In our paper we develop an efficient algorithm to speed up simulations of non-periodic systems, in which, typically, the environment is effectively included through the use of appropriate boundary forces (see, e.g., refs [44, 48]). We provide a careful analysis of possible algorithmic artifacts and their influence on structural and dynamical properties of a test system.

We intend to achieve high efficiency by combining SAMM and a multiple-time-step scheme. To that aim, we first outline the basic features of the parent methods and, subsequently, present our new algorithm FAMUSAMM. As will be explained in the following sections, in our multipole method charge grouping will be based on the specific structural features of macromolecules rather than on fixed subdivisions of space as used in the methods cited above. In particular, our combination of SAMM with a multiple-time-step method is expected to be more efficient and less memory consuming, since we intend to apply the multiple-time-step scheme to local Taylor expansions of the electrostatic potential rather than to Coulomb forces acting on individual atoms. Having defined suitable methods to evaluate FAMUSAMM's accuracy, we present the results of our accuracy assessment. After documenting the enhanced computational performance of our method, we discuss and summarize our results.

Next: The FAMUSAMM algorithm Up: FAMUSAMM: An Algorithm for Previous: FAMUSAMM: An Algorithm for

Helmut Grubmueller
Wed Apr 30 15:40:09 MET DST 1997