^{1}

^{*}

^{2}

^{3}

^{1}

^{4}

Conceived and designed the experiments: FD IA AL PB DB. Wrote the paper: FD IA AL.

The authors have declared that no competing interests exist.

Symmetric protein assemblies play important roles in many biochemical processes. However, the large size of such systems is challenging for traditional structure modeling methods. This paper describes the implementation of a general framework for modeling arbitrary symmetric systems in Rosetta3. We describe the various types of symmetries relevant to the study of protein structure that may be modeled using Rosetta's symmetric framework. We then describe how this symmetric framework is efficiently implemented within Rosetta, which restricts the conformational search space by sampling only symmetric degrees of freedom, and explicitly simulates only a subset of the interacting monomers. Finally, we describe structure prediction and design applications that utilize the Rosetta3 symmetric modeling capabilities, and provide a guide to running simulations on symmetric systems.

Homomeric protein assemblies are ubiquitous in nature, playing many key roles in biochemical processes. These assemblies are built up by the repetition of a single structural unit, the most common example being homodimers with two protein subunits. Homomeric assemblies often play a morphological role by forming channels, containers and molecular rulers. Almost all homomeric assemblies have a symmetrical arrangement of their subunits in three-dimensional space. Symmetry is a central concept in understanding the structural organization of many protein complexes and is fundamental to the field of crystallography.

Due to the biological importance of symmetrical protein assemblies, the need arises to structurally model symmetrical protein systems. Symmetry imposes fundamental constraints on the organization of these protein assemblies, which enables the computational treatment of very large systems. In this work, we describe a general framework for modeling arbitrary complex symmetries in Rosetta3. First, we give a short background to the different types of symmetries that are relevant to the study of protein structures and crystallography. Then we describe how this symmetry machinery is implemented within Rosetta: we restrict the conformational search space by sampling only symmetric degrees of freedom, and systems are limited by only explicitly simulating a subset of the interacting monomers. Optimizations that allow efficient scoring and minimization of symmetric systems are described. We proceed by providing a guide to running symmetric simulations with Rosetta. We describe several tools by which one may how one may define a symmetric system, and how several Rosetta protocols may be run in the context of symmetric partners. These protocols include docking, ab initio structure prediction, comparative modeling, and protein design. Finally, we compare the performance of the symmetry machinery in Rosetta3 with the implementation in Rosetta2

Regular symmetries include point, helical and crystal symmetries.

_{n}_{n}

There are five basic types of point symmetry, denoted by Schöenflies symbols C, D, T, O, and I. The most common type of symmetry is cyclic, or C_{n}_{2}, is a special case of this symmetry). Complexes with high-order cyclic symmetry are used as ring structures in pores and in chambers. Dihedral, or D_{n}_{n}_{n}_{6} symmetry; D symmetry provides additional interface variety that leads to more stability as well as improved allosteric control. The higher-order symmetries T, O, and I consist of three-fold symmetry groups at the vertices of a tetrahedron, octahedron, and icosahedron, respectively. Icosahedral symmetry is very commonly observed in viral structures, as it produces roughly spherical assemblies, suitable for storage and transport. Tetrahedral and octahedral symmetries are less common, but have been observed in ferretin structures.

Helical symmetries are produced by rotation and translation along a single symmetry axis and have been observed in microtubules, flagella and actin filaments. As well, amyloid fibers displaying helical symmetry are associated with a number of diseases, such as Creutzfeldt-Jacob's disease and Alzheimer's disease. In simple helical symmetries, only three parameters (aside from the orientation of a reference subunit) are required to uniquely define the symmetric system: an angle of rotation between subunits, a translation (or “rise”), _{n}_{n}

Wallpaper and crystal symmetries occur when a subunit forms a repeating two-dimensional or three-dimensional pattern. There are 17 possible two-dimensional repeats, and – ignoring cases impossible by protein's chirality – 65 possible three-dimensional repeats. These are referred to as

The presence of symmetry leads to large reduction in the number of parameters required to describe the relative orientation of protein subunits in coordinate space. For an asymmetric system the number of degrees-of-freedom required to specify an oligomer is 6×(number of subunits -1), while a symmetrical system can typically be described with 3 to 6 degrees-of-freedom.

All modeling tasks in Rosetta consist of two general components:

The conformation of a macromolecule in Rosetta is represented by a tree-like structure, with either atom-level (

The conformational degrees of freedom (dofs) of a molecular system are the torsion angles of the backbone and side-chains along with the rigid-body transformations between peptide segments. Maintaining perfect symmetry with regards to the internal structure of protein subunits is straightforward: when a torsion angle is set in one subunit, it is simultaneously set in all other subunits. For implementation purposes, we describe the internal degrees of freedom with respect to a

Maintaining rigid body symmetry between subunits is more challenging. The representation effects both energy evaluation and minimization; this representation must be general enough to model arbitrary complex symmetries. With Rosetta, we have opted for a system in which the rigid body configuration of each subunit is controlled by its own reference frame. These reference frames are related to one another by symmetry operations defined by the symmetry group. Analogous to how the identity of the internal structure between subunits are maintained, a change of coordinates of one subunit relative to its reference frame is replicated to all other subunits/coordinate systems, enforcing rigid body symmetry. The position of a subunit relative to its reference frame is controlled by jumps. These jumps are described by 6 variables, three rotational and three translational, that describe the rigid-body transformation between the start and end coordinates of the jump. These reference frames are implemented in Rosetta by introducing non-amino acid pseudo residues, called

To maintain the overall symmetry of the system, for many symmetry groups only a subset of translations/rotations are allowed to move. The reference frames are set up such that – if rigid-body movement is restricted – the allowed direction of movement coincides with one of the principle axes of the virtual residue (e.g., rotation around the _{2} symmetric system may be set up such that only translation along

In many applications (such as

The setup of virtual residues (which act as reference frames) is described in a tree-like hierarchy, like that shown in

Circles represent virtual residues and arrows beween indicate a jump. _{2} symmetric protein complex (see

The framework outlined in the previous section describes how kinematics are enforced and propagated in symmetric poses. In this section, we briefly introduce how the energy of structures are evaluated in Rosetta. We describe several modifications to energy evaluation that allow for increased efficiency when evaluating structures known to be symmetric. Since scoring takes the majority of time in most Rosetta full-atom protocols, these enhancements result in a significant increase of speed in almost all modeling and design protocols.

Rosetta's fullatom energy function is comprised of a linear combination of terms. For implementation purposes, these energy terms are divided into four separate classes: one-body energy terms, distant-dependent two-body energy terms, distance-independent two-body energy terms, and whole-structure (or “many-body”) energy terms

When scoring symmetric structures, we quickly notice that a majority of these interactions are duplicated multiple times throughout the complex. For example, if we consider the C_{4} system of

Of the six interfaces in the symmetric complex, only two of these are unique: the energy of interface AB is identical to that of BC, CD, and DA; the energy of interface AC is identical to that of BD. The energies internal to one subunit are identical to those in each of the symmetric copies. Thus, to compute the energy of the entire system, we only need consider the internal energy of A and the energy of interfaces AB and AC.

Thus, ignoring whole-structure energies, we see that in order to evaluate the energy of a symmetric complex, we only need to consider the energy of one subunit (the master subunit), plus the interactions that subunit makes with each of the other subunits. Revisiting the C_{4} system in

Here E(

Furthermore, if we assume that there is a maximum interaction distance of any two-body energy function, then we only need to explicitly model subunits whose residues will possibly approach to within this maximum interaction distance during simulation. For example, when modeling a large ring, like the C_{17} structure shown in

In the C_{17} system shown here (PDB id 3kml), if we assume interactions at a distance of more than 10 Å contribute a negligible amount of energy, then we only need to model the three colored subunits in Rosetta. The entire system's energy (and gradients) may be described in terms of the energy of the master subunit (red) and the interactions between the master and the adjacent slave subunits (orange).

Note that if we are calculating agreement with experimental data that are dependent on the conformation of the entire complex, such as residual dipolar coupling (RDC) data or small-angle X-ray scattering (SAXS) data, then all subunits must be explicitly included in order to correctly evaluate these whole-structure energies.

One time-consuming step in scoring a structure is computing the energy graph for the distant-dependent two body energies. Here, we must compute all pairs of residues containing atoms within some cutoff distance of one another. For asymmetric structures, Rosetta represents this cloud of atoms with an octree. Using an octree, the energy graph of a protein with N residues is computed in two steps: first the octree is constructed from the “atom cloud,” then, for each residue in the protein, the nearby residues are found. With a symmetric structure, we only need to consider edges in this energy graph with at least one vertex in the “master” subunit. Assuming we are explicitly modeling S subunits, then we only need to query the octree N/S times instead of N times (the time spent constructing the octree is the same). This speedup is particularly noticeable in cases where experimental data requires that a large number of subunits be explicitly modeled.

The total energy of a symmetric system is given in terms of interface energies as a line in the symmetry definition file, for example, the symmetry definition file for the C4 system in

Here,

When scoring a symmetric structure, Rosetta attaches a weight to each interaction edge. For one-body energies and two-body energies within the master subunit, this weight is simply the weight on master subunit (‘4’ in the example above). For two body energies between the master and some other subunit, the weight is the corresponding weight from the symmetry definition file: in this case, ‘4’ for interactions with the slave subunit controlled by virtual residue

Whole-structure energies are slightly trickier to handle within the symmetric framework. In many cases, it is not clear whether a more suitable interpretation is to compute the energy over one subunit and scale this energy by the number of subunits, or to compute these energies over the whole symmetric complex and leave it unscaled. We have opted for the latter, with the justification that scoring a complex with point symmetry should give the same results using symmetric scoring as asymmetric scoring. However, with lattice symmetry, or cases where only some subset of the complete system is explicitly modeled, these whole-structure energies may not make much sense. Therefore, this behavior may be modified for particular score functions by making the appropriate corrections in

Rosetta's sidechain optimization module, the packer, can also take advantage of the same efficiencies that make scoring rapid. In asymmetric sidechain optimization, the packer builds a discrete set of rotamers _{i}_{i}_{i}_{1}) and its pairwise interaction energies (_{2}) with the background (_{2}(_{i}_{j}_{2}(_{i}_{j}_{i}_{j}_{1} and edge energies replace _{2}. For speed, Rosetta precalculates and stores the node and edge energies in a sparse interaction graph for rapid retrieval.

The state assignment problem is also a fine model for the symmetric packing task, where, assigning rotamer _{i}_{i}_{c}_{k}_{k}

Similarly, the edge energy for states _{i}_{j}

With these equations for calculating the node and edge energies, the same interaction-graph data structure and the same discrete-optimization algorithm used to solve the asymmetric sidechain placement problem may be used.

In this section, we describe the basic framework we use when minimizing symmetric systems. We discuss a few implementation issues, and describe two cases that require special treatment: lattice symmetries, like that of helical symmetry or 2D or 3D crystal tilings, and asymmetric whole-structure energies.

As with kinematics and scoring, minimization of symmetric complexes is done with respect to a master subunit. For each backbone torsion and rigid-body degree of freedom in the master subunit, we compute the derivative of Rosetta's all-atom energy with respect to the corresponding degree of freedom. _{4} case in

Since every subunit is in the same symmetric context, we only need to consider gradients with respect to the master subunit. Thus, when computing derivatives with respect to the motion of

Formally, we compute the partial derivative of the energy E of a system with point symmetry (where _{i}_{k}_{i}

As with asymmetric minimization, the formulation of Abe et al.

When minimizing with respect to lattice symmetry, additional complication arises when minimizing the degree of freedom corresponding to the rise between subunits. One issue that arises is that gradients along the rise of the helix only should be computed in one direction only. For example, consider a helix containing seven subunits,

_{i}_{i}_{i} jumps_{i}_{i}_{i}

A second issue has to do with gradients across an interface that spans multiple copies of the helical rise “jump.” Consider the interactions of subunits

To account for this, each cloned jump has a weight associated with it. This weight specifies a scaling factor that is applied to derivatives coming into the jump, before they get remapped to the master jump. Thus, the derivatives computed in the interface between _{5}, are scaled by a factor of 2 (corresponding to the number of copies of the cloned jump between _{4} and _{5}. See

Another difficult case that arises comes about when a whole-structure energy is applied asymmetrically, that is, the energies for each subunit are not equal. This commonly arises with experimental electron density data, if not symmetrically averaged, but may also arise with coordinate constraints or other types of experimental data, where gradients are

Unfortunately, this case is not directly handled by Rosetta's symmetry machinery, as the symmetric modeling is built on the idea that each energy term only differs by a symmetric transformation between subunits. However, one may get around this limitation by making the score function “symmetry-aware”. The basic idea is to map all the derivatives to the master subunit. For every atom in the symmetric complex, the gradient is computed. Then, for each atom in each slave subunit, the symmetric rotation mapping the subunit to the master subunit is applied to the gradient. These are then added to the corresponding atom in the master subunit's gradient.

When the symmetry group is a single layer hierarchy, and the rigid-body orientation of the whole system is not allowed to move, this works as expected. However, when the symmetry hierarchy is multi-layer, or the whole system is allowed to move as a rigid body, then there are problems minimizing along jumps within the symmetry hierarchy. It is clear to see this when we consider the C_{2} symmetry shown in

_{1.X}_{1}_{1.1}_{1.1}_{1.2}_{i}_{i}_{2→1}_{1}, we add the rotated gradients of S_{2}. Then, at virtual V_{1.1}, we subtract the rotated gradients, and add the unrotated gradients.

This may be done within Rosetta by storing the unrotated derivatives for every atom in the symmetric complex. With backbone and sidechain torsions, the naive strategy – rotating each subunit to the master one – may be used. Then, at each symmetric jump, the transformation mapping the parent virtual residue to the master's virtual residue at the same level in the hierarchy is applied. Since the lower levels in the hierarchy have already added their layer's rotated gradients, this can be handled by assigning a “correction gradient” to the virtual residues within the upper levels of the symmetry hierarchy. That is, the virtual residue is assigned a “gradient” that is the result from subtracting all the previously rotated gradients and adding the all the newly rotated gradients of every atom in the subtree beneath it.

Finally, notice that we use the term gradient loosely here. Rosetta's implementation uses the recurrence of Abe et al. to pass along two components of the gradient up the fold tree, denoted _{1} and _{2}, which allows for efficient conversion between Cartesian space and torsion space gradients. In this case, the correction factors associated with virtual residues are applied directly to these _{1}s and _{2}s, instead of the gradients.

Within Rosetta3, this is currently only implemented for the symmetry-aware electron density scoring function. The code for this may be found in

In this section, we provide a practical guide to modeling symmetrical structures with Rosetta. We first describe the file format by which symmetry information is encoded. Then, we introduce four different symmetry-enabled Rosetta applications (symmetric docking, fold-and-dock, comparative modeling and fixed backbone design) and describe how they may be configured to make use of the symmetry machinery. Together with this manuscript we distribute a set of canonical test cases as Supporting

Everything that Rosetta needs to know about the symmetry of the system is encoded in the symmetry definition file (SDF), which is provided as input to any Rosetta protocol run with symmetry. This file provides:

One of the key purposes of the SDF is to inform Rosetta how to evaluate the energy of a structure in a symmetric fashion. In the SDF for alcohol dehydroganase in

In this example, the subunit that is connected to the virtual residue

A second key aspect of the SDF is to provide the coordinates of the reference frames – that is, the virtual residues – to set up the rigid body symmetry. There are two ways of specifying these coordinate frames:

Here the virtual residue named

This specifies that the first virtual residue is encoded by the triplets defined after the start keyword. A second virtual residue is generated by application of twofold rotation around the Cartesian Z axis (

A third key aspect of the SDF is specifying what dofs in the system are allowed to move, what their initial values should be and how to perturb them. In the SDF for alcohol dehydrogenase, the line:

specifies that for the jump named

Generally, symmetry definition files will not be hand-crafted, but rather, will be created by a script. There are two such scripts included with Rosetta3. The first of these,

This script automatically creates symmetry definition files corresponding to the symmetry in some template protein structure. If the template is not symmetrical – for example, if differing crystal contacts between subunits cause some asymmetry – then it is “symmetrized” by the script. For these cases, simple heuristics are used to find a symmetric system nearby the target system. However, if the starting model is very asymmetric, the symmetrized structure may be very far from the input. Generally this is undesirable, and suggests modeling the system asymmetrically.

The script provides at least limited support for most types of point, helical, and lattice symmetries. However, there are some caveats. The following symmetry types are currently unsupported by the script:

Tetrahedral, octahedral and icosahedral point symmetries are improperly generated.

Nonpolar helical symmetries (a D_{n}_{n}

2D lattice (or wallpaper) symmetry is not created by the script.

3D lattice (or crystal) symmetries are available, but assume a fixed unit cell size. Systems produced in this manner allow rigid-body movement of a subunit in the asymmetric unit, but do not allow the cell dimensions to change during simulation.

The script runs in one of three modes, depending on the symmetry type: noncrystallographic (point) symmetries, crystallographic symmetry, and helical symmetry. The mode of the script is specified with the flag

If this flag is not given,

There are several options common to each mode:

input PDB file

the max Cα-Cα distance between two interacting chains

When the system is constructed, a master chain is first selected (how this is specified is mode-specific). The resulting SDF specifies a system where the only subunits that are explicitly modeled are those with some Cα within the specified interaction distance of the master subunit.

For noncrystallographic symmetry mode (

the chain ID of the main chain

the chain IDs of one chain in each symmetric subcomplex

Use of the -_{2} and C_{38} symmetry (assuming A and B were adjacent chains). To generate the SDF for C_{2} symmetry in

As another example, with a D4 symmetric system, with chains A-B-C-D in the upper ring and chains E-F-G-H in the lower ring, one would specify

Alternately, one could specify the interacting chains in reverse order, as ‘

allow rigid body minimization of complete system

This flag is important when the structure is scored against experimental data that depends on the rotation or the whole system, such as electron-density or RDC data.

For crystallographic symmetry mode (

override the unit cell parameters in the PDB file with these values

override the spacegroup in the PDB file with these values

The resulting SDF defines a system where a single subunit is placed in its “lattice context,” where only the symmetric copies that interact with the master subunit are explicitly represented. As a sidenote, the energy line in the SDF specifies the energy calculated by Rosetta as twice the per-subunit energy.

Finally, helical symmetry mode (

the chain ID of the main chain

the chain ID of the next chain along the fiber/helix

the chain ID of a chain in -a's point symmetry group

A helical twist can be forced by appending:

The same heuristics used to symmetrize a system are used to force a different helical twist. Thus, if this value is very different from the twist provided in the PDB, then the system may move dramatically.

When run, the SDF that recapitulates the symmetry in the input PDB is written to stdout. Several PDB files are written as well. Given the input file

the symmetrized version of the input file, showing the complete point symmetry group.

the same as above, but only showing chains that form an interface with chain A

the input PDB to Rosetta's symmetry modeling, the coordinates of the master subunit (typically a single chain in the symmetric complex).

The files

When the structures of symmetric protein assemblies are predicted _{n}_{n}_{2} symmetry in

A SDF for D_{2} symmetry can be generated with:

By default the script encodes for all subunits to be simulated. For larger complexes, such as a 38-membered ring, a subunit only interacts with its direct neighbors in the ring and its not necessary to simulate all subunits (see

This generates a SDF that encodes only 3 out of 38 subunits.

Symmetry options that control protocol behavior can also be defined in the SDF.

To generate SDFs for symmetries outside C_{n}_{n}

The symmetry machinery in Rosetta3 is built to take advantage of the object-oriented architecture of Rosetta's core. Polymorphism and inheritance allows symmetric versions of key components in Rosetta's scoring, kinematics, sidechain-optimization, and minimization machinery to be plugged in in place of their non-symmetric counterparts, which allows symmetry to be used with minimal adjustment to the code. When adapting a scientific protocol to use symmetry, care must be taken that the symmetric versions of these classes are employed. For most protocols, if Rosetta is given a symmetry definition file, this is automatic, and Rosetta will protect you from making the system nonsymmetric, but care must be taken in protocols where kinematic connectivity or the coordinates of the protein change, to make sure that the symmetric complex is perturbed in a reasonable manner. Typically, changes to the conformation of a protein are controlled through higher-level objects called movers that interface with the lower level core functions. There are symmetrical versions of the most common movers, which substantially simplify the adaptation process.

First, instantiation of symmetry at the beginning of a protocol involves a check for the presence of a symmetry definition file specified on the command line followed by a call to a mover that initialize the symmetry information by reading from a SDF and swapping in symmetrical versions of base classes for energy evaluation and coordinate storage into the Pose object (the Pose object represents the complete state of the molecular system).

A number of utility classes are available to get access to symmetry information and to make objects compatible with symmetry, including

Directly setting torsions or jumps in the master subunit (or using nonsymmetric movers that only do this) is fine: the symmetric machinery will maintain the symmetry of the overall system. For a great many protocols, the only changes necessary to enable symmetry are the two shown above.

Finally, a number of protocols have been ported to use symmetry if a symmetry definition file is provided. In addition to Rosetta's

The symmetric assembly protocol aims to predict the structure of a symmetrical protein assembly based on the structure of a single subunit _{n}_{n}

When atomic contacts have been established the protein complex energy is optimized in a rigid body Monte Carlo search performed using a low-resolution knowledge-based scoring function and a simplified representation of the protein. All dofs described in the SDF are perturbed during the Monte Carlo procedure. The low-resolution phase is followed by further refinement in Rosetta's high-resolution energy function, with an all-atom representation of the protein assembly. The energy is optimized using a Monte Carlo minimization procedure, which consists of several of cycles of rigid body moves followed by symmetric side-chain optimization and symmetric energy minimization.

To run the symmetric docking protocol, two pieces of input data are required: the structure of a protein subunit and a SDF. A preexisting symmetric protein complex can be refined using the docking protocol (for perturbation studies, for example), with the starting input subunit and SDF generated by the

The reference frames encoded by this script have their axis pointing towards the absolute origin (0,0,0) in Cartesian space and with the translational dof along the Cartesian x-axis. Thus, it is important that the axis connecting the anchor residues align with the Cartesian

The default

This line specifies that the two subunits will initially be placed at (50,0,0) and (-50,0,0). The 100 Å distance is typically large enough that the subunits start in a non-contacting configuration. The initial positioning can be changed by manually editing this line of the SDF.

The last three terms in this line describe the orientation parameters of the jump:

In this particular case, the rotational dofs should be completely randomized. Normally, when a range is given a random value is found in the range and a rotation of that angle is applied by rotation around the given axis. However, when all three rotations are given the range 0-360 degree, Rosetta ensures that rotational space is uniformly sampled.

As well, the presence of multiple identical subunits presents some problems when calculating root-mean-square (rms) deviation values to a reference structure. For complexes with more than two subunits, it may be necessary to consider alternate chain orderings in order to find the lowest rms value. A symmetric rms value can be determined with Rosetta by addition of a command line flag (-symmetry:symmetric_rmsd).

A typical prediction case for symmetric docking is distributed in Supporting

The fold-and-dock protocol simultaneously samples the internal degrees of freedom of a monomer, and rigid-body degrees of freedom between (symmetrically disposed) monomers. It is well suited to predicting the structures of intertwined symmetric assemblies for which the structure of the monomer is not stable in isolation, and hence not amenable to a two stage approach in which monomer predicted structures are first generated in isolation and then docked together _{n}_{n}

Fold-and-dock is a combination of the symmetric assembly protocol and the Rosetta abinitio structure prediction protocol. Like symmetric docking, the simulation starts with a randomized symmetric configuration of subunits with no atomic contacts between subunits. The protein subunits initially adopt an extended structure. What follows is a simulated annealing fragment assembly – of the same kind used for regular monomeric abinitio structure prediction – performed symmetrically. In addition, two types of rigid body moves are attempted at random frequency: a

The models produced by

A typical prediction case for fold-and-dock is distributed in Supporting

Another case where symmetric modeling is beneficial arises with comparative (or template-based) modeling of symmetric structures. When building a homology model of a symmetric multimer, if a template contains the same symmetry, it may be reasonable to use the symmetry of the template when building the threaded model of the target.

Rosetta's comparative modeling protocol performs threading of the target sequence onto the template backbone, followed by fragment-based rebuilding of gaps in the threaded model, and finally all-atom optimization of Rosetta's energy function. By running the symmetry definition script

A typical prediction case for comparative modeling is distributed in Supporting

The fixed backbone design provides a direct interface to Rosetta's sidechain optimization module, the packer. Through the design application the energy of a protein subunit or complex can be minimized by optimization of the protein sequence. The symmetrical version of the packer is invoked by specification of a SDF file on the command line. The

A typical prediction case for fixed backbone design is distributed in Supporting

We have previously modeled symmetrical protein assemblies using an implementation in Rosetta2, described in

The object-oriented nature of Rosetta3 enables us to take full benefit of structural symmetry in protein modeling. Rosetta3 can, given the right symmetry definition, model all types of symmetries. In contrast, Rosetta2 only included a few hard-coded symmetry types and exact values of energy gradients for more complex symmetries (such as helix symmetry) could not calculated, as it required of the types of corrections described in the energy minimization section. The inflexible nature of Rosetta2 code base prevented the implementation of several features present in Rosetta3. A major benefit of Rosetta3 over Rosetta2 is that lists of atom and residue neighbors, used during energy calculation and energy minimization, is restricted to only those pairs that are required to evaluate the energy of the whole system. With Rosetta2 the memory requirements gets prohibitory large for big systems and a substantial fraction of the running time is spent on generating updated neighbor list.

We have compared the running time of Rosetta3 with Rosetta2 for several prediction protocols and molecular systems. Such comparisons put the improvements in practical terms but it cannot be used to isolate the effects of modifications to the symmetry machinery. This is because general improvements of Rosetta3 from Rosetta2, together with protocol level developments, will also impact running times. The comparison here is with standard parameters in Rosetta2 and 3. A symmetric docking run on a C_{2} system with 164 residues is 2.3 times faster in Rosetta3 than Rosetta2. For more computational intense protocols and with more complex symmetries the improvement is larger. Running fold-and-dock on a sequence with 100 residues with D_{2} symmetry (4 subunits with in total 400 residues) the run time of Rosetta3 is around 13 times shorter per model than Rosetta2. For a dimer with the same number of residues with C_{2} symmetry the improvement is 26 times. Rosetta2 scales poorly with the number of simulated subunits in the system (runs with several hundred residues is unpractical both due to memory and speed issues) while Rosetta3 can model quite large protein assemblies without a dramatic increase running time. We have determined running time of the fold-and-dock protocol for a system with 100 residues with C_{n}

The run time of a symmetric modeling run depends on the size of the system (mostly the subunits size), the number of degrees-of-freedom in the rigid body sampling and the scientific protocol. As a reference, generating all-atom model of a D_{2} symmetric 100-residue protein using fold-and-dock takes about 8 minutes on contemporary desktop computer.

We present a general framework for the modeling and design of symmetric protein assemblies. The current implementation presents a set of ready-made scientific protocols to facilitate some common tasks in structural biology: structure prediction of symmetric protein structure from sequence or subunit structure, comparative modeling of symmetric proteins or proteins in crystal lattices and symmetric protein design. The list of symmetry-enabled protocols can easily be extended by small modifications to the Rosetta source code. In the same manner, the framework can be used to model more exotic symmetry types, not currently covered by the distributed scripts, without any changes to the source code. Thus, this manuscript describes the extension of the Rosetta methodology to the exciting universe of symmetric protein assemblies.

A complete reference guide to Rosetta3 symmetry definition files.

(DOC)

An archive containing example flags file and input files for running four different symmetry protocols in Rosetta3. The protocols include symmetric docking, symmetric comparative modeling, fold-and-dock, and symmetric design.

(BZ2)