The authors have declared that no competing interests exist.

Conceived and designed the experiments: TT NH ST. Performed the experiments: TT. Analyzed the data: TT NH. Contributed reagents/materials/analysis tools: TT. Wrote the paper: TT NH ST.

Co-translational folding (CTF) facilitates correct folding

Proteins are synthesized

While

Co-translational folding (CTF) has been suggested for

This so-called elongation attenuation can be realized by a few mechanisms. Most notably, for a given codon, the elongation rate is affected by its cognate tRNA binding kinetics, thus depending on the concentration of the cognate tRNA [

SufI in

A) Strucuture of SufI. N-, M-, and C-domains are depicted in blue, green, and red. Linkers that connet two domains are depicted in yellow. B) The codon-based elongation rate by Spencer _{codon} simulations. Regions marked in red in B take long elongation time. D) Three CTF schems. The CTF_{fast} (dashed), the CTF_{slow} (dotted), and the CTF_{codon} (solid) lines. E) A schematic view of the system including the wall-and-tunnel potential.

These experimental data can be complemented with theoretical and computational analysis to deepen our understanding on the CTF mechanisms. Previously, lattice Monte Carlo simulations [

Since the CTF becomes non-trivial primarily for relatively large and multi-domain proteins (SufI has three domains and is about 450 residue long (

Yet, to address mechanisms of the CTF and, in particular, an impact of elongation attenuation by CGMD simulations, technically, there are two major issues. First, we need to realize misfolding as well as correct folding in a well-balanced manner. Thus, the CG model needs to be calibrated so that an energy landscape is globally funneled in one hand and modestly rugged in the other hand. There have been a considerable number of studies towards hybrid modeling of structure-based potentials for globally funneled landscape with sequence-dependent terms for modestly rugged surfaces [

In this paper, we first describe computational modeling of CGMD for the CTF. Then, we performed the CTF and, as a control, the refolding simulations of SufI, comparing these results. Characteristics of misfolded structures are then analyzed. Next, folding networks for these simulations clarify impacts of CTF and the elongation attenuation on folding reaction mechanisms. Finally, the correlation between the degree of folding and the translation elongation time was investigated.

In the current CG modeling, each amino acid is represented by one bead located at the C_{α} position. For folding simulations by CGMD, the so-called perfect-funnel model, or often called Go model, has been widely used giving many insightful lessons for folding dynamics [_{HP} to the Go model potential _{Go}; the latter is responsible for globally funnel-like shape of the landscape, while the former makes the landscape modestly rugged leading to many metastable non-native traps. Concretely, the entire potential function of a protein is _{Go}+_{HP}. The Go potential was parameterized based on the atomic interaction at the native structure, called the AICG model developed by Li

As is well-known, proteins _{fast}), 2) the uniformly slow translation scheme (a dotted line, CTF_{slow}), and 3) the non-uniform codon-based translation scheme (a solid line in Fig _{codon}) that is dependent on the cognate tRNA concentration. We note that, in our scheme, the _{tunnel} was introduced to mimic the ribosome steric effect that is realized by a combination of a wall and a tunnel (

For comparison, we also performed folding simulations of SufI in a refolding scheme, where a full-length polypeptide chain started folding from denatured conformations obtained by high temperature simulations. No wall-and-tunnel potential _{tunnel} was utilized in this scheme.

MD simulations were performed at 0.82_{F}_{F}^{8} time steps at many temperatures. The lowest temperature at which we observed unfolding was defined as the upper limit of denaturation temperature _{F}

First we compare a representative folding trajectory via the codon-based co-translational folding (CTF_{codon}) scheme with that via the refolding scheme.

A) A time course of a refolding trajectory. B) that of the CTF_{codon}. Some snapshots were drawn with the same color code as

In the refolding trajectory shown in

On the other hand, the CTF_{codon} trajectory in ^{8} time step is followed by the folding of M-domain at ~ 1.2 × 10^{8} time step. Subsequently, at ~ 1.7 × 10^{8} time step, the protein folded to near native structure in which the C-domain is partly misfolded. Finally, at around 1.9 × 10^{8} time step, it quickly transited into the native-like conformation.

More quantitatively, we repeated folding simulations of SufI 100 times both in the refolding and the CTF_{codon} schemes. In each trajectory, we judged whether the protein is folded or not by a set of native-ness scores, Q-scores, at the final 100 structures of the simulation (0 ≤ _{codon} resulted in 35 cases of correct folding. To clarify the statistical significance of the difference, we computed the histograms of Q-scores of the final structures in each scheme (

scheme | Refolding | CTF_{fast} |
CTF_{slow} |
CTF_{codon} |
---|---|---|---|---|

folded | 18 | 20 | 25 | 35 |

^{a} For the judgement of folding, the stringent criterion was used.

scheme | Refolding | CTF_{fast} |
CTF_{slow} |
CTF_{codon} |
---|---|---|---|---|

Refolding | 1.0 | 0.556 | 0.0314 | 0.000174 |

CTF_{fast} |
1.0 | 0.140 | 0.00822 | |

CTF_{slow} |
1.0 | 0.556 | ||

CTF_{codon} |
1.0 |

scheme | Refolding | CTF_{fast} |
CTF_{slow} |
CTF_{codon} |
---|---|---|---|---|

Refolding | 1.0 | 0.354 | 0.00838 | 0.000189 |

CTF_{fast} |
1.0 | 0.106 | 0.00779 | |

CTF_{slow} |
1.0 | 0.305 | ||

CTF_{codon} |
1.0 |

We then investigate effects of translational attenuation regions in SufI sequence, that was studied in experiments [_{fast}).

In the same way as the CTF_{codon} case, we repeated the CTF_{fast} simulations 100 times. Using the same criteria for the judgment of folding, i.e., Q-scores, we found only 20 cases of successful folding, which is much fewer than the CTF_{codon} scheme. The statistical analysis of the distribution suggested that the difference is significant (p = 0.00822). Actually, the result by the CTF_{fast} scheme is statistically indistinguishable to that by the refolding scheme (p = 0.556). This is consistent with the experiment of Zhang

Experimentally, lowering temperature could rescue the low-folding yield of the impaired folding scheme, which we now test in simulations. For the purpose, we performed folding simulations of SufI by the CTF where the elongation is slow and is in a uniform rate entirely (CTF_{slow}). Of 100 simulations, we found 25 successful foldings by the same criteria as above. The statistics test resulted in no significance between th CTF_{slow} and the CTF_{codon} schemes, while a subtle p value, p = 0.14 for the comparison between the slow and the fast CTF schemes.

To understand the CTF, comparison between the translation time scale and the folding time scale is of central importance. To estimate relevant folding time scales, for individual domains, we performed kinetic folding simulations. Time required to reach structures that have Q > 0.5 was computed for each of domains (_{N−fold} was 1.4 × 10^{7} time steps, which is longer than that of the C-domain, _{C−fold} = 3.6 x 10^{6} time steps. Interestingly, _{N−fold} is longer than the time to complete translation by the CTF_{fast} scheme, _{translation−fast} ~ 4.4 × 10^{6}, but is comparable to that by the CTF_{slow} scheme, _{translation−slow} ~ 1.3×10^{7}. Importantly, when the time for completion of the translation of N-domain is comparable to or longer than the average folding time of N-domain, the success ratio of SufI is high.

To understand why the codon-based CTF can facilitate folding of SufI, we now look into misfolded structures. For each of the four folding schemes, we analyzed probabilities of misfolding of individual domains at the ends of simulations (_{fast} scheme. The CTF_{codon} showed the smallest probabilities of misfolding for these domains. Of the four schemes, the rank order in misfolding of N- and M-domains is well (anti-)correlated with the probability of successful folding of the full-length SufI. (

A) Fractions of misfolded domains at the end of simulations in four different schemes; the refolding (black), the CTF_{fast} (red), the CTF_{slow} (green), and the CTF_{codon} (blue). B) Representative final structures of misfolding. i) structure that is misfolded in N-domain. ii) Misfolded in M-domain. iii) (right) Misfolded in C-domain. (left) Native structure for comparison. See text for the explanation of the block arrows.

scheme | Refolding | CTF_{fast} |
CTF_{slow} |
CTF_{codon} |
---|---|---|---|---|

Refolding | 1.0 | 0.0994 | 0.261 | 0.0470 |

CTF_{fast} |
1.0 | 1.0 | 0.794 | |

CTF_{slow} |
1.0 | 0.556 | ||

CTF_{codon} |
1.0 |

scheme | Refolding | CTF_{fast} |
CTF_{slow} |
CTF_{codon} |
---|---|---|---|---|

Refolding | 1.0 | 0.677 | 5.96E-6 | 5.22E-8 |

CTF_{fast} |
1.0 | 0.000581 | 2.85E-6 | |

CTF_{slow} |
1.0 | 0.261 | ||

CTF_{codon} |
1.0 |

scheme | Refolding | CTF_{fast} |
CTF_{slow} |
CTF_{codon} |
---|---|---|---|---|

Refolding | 1.0 | 0.677 | 0.261 | 0.0691 |

CTF_{fast} |
1.0 | 0.261 | 0.0131 | |

CTF_{slow} |
1.0 | 0.677 | ||

CTF_{codon} |
1.0 |

scheme | Refolding | CTF_{fast} |
CTF_{slow} |
CTF_{codon} |
---|---|---|---|---|

Refolding | 1.0 | 0.0265 | 0.0406 | 0.00218 |

CTF_{fast} |
1.0 | 0.886 | 0.433 | |

CTF_{slow} |
1.0 | 0.322 | ||

CTF_{codon} |
1.0 |

scheme | Refolding | CTF_{fast} |
CTF_{slow} |
CTF_{codon} |
---|---|---|---|---|

Refolding | 1.0 | 0.509 | 4.89E-5 | 3.69E-7 |

CTF_{fast} |
1.0 | 0.000594 | 9.98E-6 | |

CTF_{slow} |
1.0 | 0.577 | ||

CTF_{codon} |
1.0 |

scheme | Refolding | CTF_{fast} |
CTF_{slow} |
CTF_{codon} |
---|---|---|---|---|

Refolding | 1.0 | 0.434 | 0.575 | 0.360 |

CTF_{fast} |
1.0 | 0.266 | 0.189 | |

CTF_{slow} |
1.0 | 0.807 | ||

CTF_{codon} |
1.0 |

We now show some representative misfolded structures (

Next, we investigated the ensemble of folding pathways for the CTF and the refolding schemes. To clarify folding pathways, we drew folding networks where nodes represent discretized conformational states and links represent transitions between the states[

Conformational states were discretized by the native-ness scores (Q-scores) and by the non-native contact scores (N-scores) (See ^{6} × 3^{6} ~ 1.1 × 10^{7} states (nodes). To simplify the network, we removed any loops that go from a node and return to the same node later. All 100 trajectories were used to draw a network for each folding scheme.

We depict folding networks of SufI for four different folding schemes (Figs _{codon} (_{codon} has (820 nodes). By refolding, the protein exhibited much more divergent conformational states, many of which are characterized by low Q-scores and high N-scores. Second, while the refolding scheme did not show any dominant pathways, the CTF_{codon} has a clear folding route from the top in the figure to the bottom. Obviously, the CTF enforced SufI to fold vectorially from N-terminal, which provided constraints to the order of domain folding events. In contrast, the refolding scheme made a protein fold freely from any segments resulting into diverse transitions. The CTF restricts kinetics of proteins and reduces conformational ensemble being observed, and are consistent with earlier theoretical works[

The refolding network possesses 3284 nodes, while the codon-based CTF has only 820 nodes. The size of nodes represent their probabilities. The darkness of the node represents native-ness. The darker one is closer to the native. Diamonds, triangle, and stars indicate that N-, M-, and C-domains are pre-dominantly unfolded, respectively. When pre-dominantly unfolded domains are no uniquely decided, circles are used.

The folding network for the CTF_{fast} scheme (

On the other hand, the slow CTF scheme showed the folding network (_{codon} scheme. The number of nodes found in the slow CTF was 1096, which is slightly larger than that found in the CTF_{codon}, i.e. 820. We see a single and nearly identical folding route in these two schemes.

It is interesting to ask to what extent the translation rate is designed (optimized), via codon usage, to facilitate folding. To this end, here we investigate the correlation, if any, between a putative translation rate and the degree of folding. For the former, we simply use the translation rate, in arbitrary unit, predicted by an algorithm proposed in Spencer _{i} in a nascent chain of the length _{L=i} is the average _{100 trajectories} means the average over 100 trajectories in the slow CTF scheme. If Δ_{i} is high at _{i}, it would naturally correlate with the translation rate. Importantly, however, we did not bias the CTF by the codon usage. Instead, we used a uniform and slow CTF scheme. Thus, Δ_{i} is not directly related to the difference in the translation rate, but is a purely physicochemical quantity determined by the amino acid sequence. We note that Δ_{i} was smoothed by a window average of the 5-residue windows to reduce the noise.

A) The degree of folding acquisition Δ_{i} after averaging over the window size 5. B) One over the translation rate computed from the Spencer

The Δ_{i} profile shown in _{i} region around 280–310, which well correlates with a translational attenuation region, 33-40kDa region (281–326 residues, grey shaded in _{i} profile at ~245. More quantitatively, by using 200th-350th residues, we computed the correlation between the Δ_{i} profile and the translation rate profile (

The highest peak of the Δ_{i} profile in

Comprehensively performing molecular simulations of co-translational folding (CTF) and refolding of SufI, we elucidated mechanisms of how translational attenuation can facilitate correct folding from structural perspectives. First, coarse-grained simulations showed that the codon-based CTF, CTF_{codon}, exhibited higher probability of correct folding than the refolding did. When the translational attenuation is removed, the CTF_{fast} simulations resulted in the success rate similar to that by the refolding scheme. When the elongation was uniformly slowed down, the CTF_{slow} simulation gave essentially the same results as those of CTF_{codon}. These are all consistent with recent experiments. On top, the simulations provided much of structural and mechanistic insights. Specifically for SufI, we found that the M-domain is least stable and can fold only when it is supported by the pre-folded N-domain. Once a segment of the M-domain is entangled with either N- or C-domain, an escape from the trap was difficult. Combining molecular simulations with biochemical experiments provided detailed mechanistic understanding of CTFs.

A recent theoretical study suggested that, under certain situations, fast translation can coordinate folding to the native structure [

We note that the current CG modeling has some limitations. One of the major limitations is on the time scales. Using the CG modeling, one cannot easily estimate the absolute time scales of folding and translation. Using a low viscosity in Langevin dynamics and structure-based potentials, we speeded up the folding kinetics some orders of magnitude. Translation kinetic parameters in the normal and slowed phases are not accurately known. This makes quantitative comparison difficult. Another limitation is the balance between the structure-based potential and the sequence-dependent terms, which was determined empirically here. Accurate modeling of these balances is highly desired in future work.

In this study, we studied folding of a three-domain protein SufI [

In the simulation, one residue is represented by one CG particle which locates at Cα position. We used our in-house developing software CafeMol for all the simulations [

The potential energy function consists of the native-based AICG2+ potential (_{Go}) and non-local many body hydrophobic interaction potential (_{HP}). The total energy _{total} for the refolding simulation is given as

The native-based potential _{Go} is defined as [

The first term keeps virtual bonds between consecutive amino acids, the second and the third terms represent statistical potential for virtual bond-angles and virtual dihedral-angles [

For the hydrophobic interaction, we take the function developed in[_{HP} represents the buried-ness of the amino acid _{linear} and _{min} are constants and _{i} represents local density and is calculated by:
_{A(i)} is the number of heavy atoms that defines the amino acid _{max,A(i)} is the maximum coordination number for particle type _{HP} represents the degree of the contact between particle

We note that the described hydrophobic interaction potential was first developed for a CG model that uses different resolution from the current work. Thus we need to re-parameterize the function. We estimated parameters _{min,A(i),A(j)} _{max,A(i),A(j)}, _{max,A(i)}, and _{A(i)} for each amino acid types in the following way. Using Dunbrack’s culled PDB set [_{vdW,i} + _{vdW,j} + _{vdW,H2O}, where _{vdW,i} is the van der Walls radius of the atom _{max,A(i),A(j)} and we set _{min,A(i),A(j)} = _{max,A(i),A(j)} −4.

In the total potential energy _{total}, _{Go}(_{HP}(_{Go}(_{nat}). We assumed that this value is a reasonable energy at the native structure and fixed this value at the native structure. We then express it as a linear combination of _{Go}(_{HP}(

To reproduce a steric effect of ribosome exit tunnel and surface, we added a pure repulsive wall-and-tunnel potential _{tunnel} defined as:
_{i} is the distance between the particle i and the wall-and-tunnel. The default parameters in CafeMol were used for _{ex} and _{T} = 15 Å, _{T} = 90 Å, respectively.

We note that all the interaction potentials here are temperature independent. Since hydrophobic interactions are effective interaction that itself depends on temperature, one can include temperature dependence as in Chan et al for more accurate modeling [

Molecular dynamics was simulated by the Langevin equation at the constant temperature _{i} is a friction constant and _{i} is a random force. This random force satisfies 〈_{i}(_{i}(_{j}(_{i,j}. The stationary distribution generated by this Langevin equation is the Boltzmann distribution for a given temperature _{i} is derived from partial differentiation of the potential energy function. For numerical integration, we used the scheme in [

In simulations that mimic CTF, we increased the chain length of the nascent polypeptide one by one residue and used a wall-and-tunnel potential that represents the rough geometry of the ribosome exit tunnel (

In a scheme that mimics the CTF rate that depends on codon (CTF_{codon}), we used the translation elongation profile derived from Spencer’s algorithm [^{6}steps per one residue. For other residues, we used the elongation rate as 10^{4}steps per one residue. The scheme was termed the CTF_{codon} (See

To test the effect of synonymous substitution that remove the translational attenuation, we set the CTF in which the elongation rate is fast and uniform. A protein is elongated at the rate of 10^{4}steps per one residue. This is termed as the CTF_{fast}.

To test the effect of lowering temperature, we set the CTF in which the elongation rate is uniform and is slow (CTF_{slow}). The elongation speed is 3×10^{5} steps per one residue.

As a control, we also set up the refolding scheme. In this scheme, a wall-and-tunnel potential was not used and the full-length SufI was present from the beginning. The initial unfolded conformation was prepared by constant temperature simulation at a high temperature for 10^{7} time steps from the native state. This was sufficient to prepare a fully unfolded structure.

In all four schemes, we ran 100 trajectories, and each trajectory is simulated for 3×10^{8} time steps, including the time for translation in the cases of CTF schemes. The comparison of three elongation schemes is given in

To judge whether SufI is folded or not, we introduced multiple native-ness scores, i.e.,

In general, the widely used

The _{total}, or for any part of the protein, both of which were used in this work. _{total} is convenient to quantify an overall native-ness by one value. When _{total} is above 0.95, SufI takes native state with high probability (this is called as a generous criterion for the native state). During the analysis, however, we noticed that, for multi-domain proteins such as SufI, the completion of folding cannot easily be assessed by _{total} alone. For example, we found that individual domains are all correctly folded, while some domain-domain interfaces are not. Since the number of native contacts for the domain interface is much less than those within the domains, these structures often take _{total} values close to one. (Even worse is that these values can be within the thermal fluctuation range of the _{total} at the true native state.) To distinguish these misfolded structures, we need to check

Q-scores for N-, M-, and C-domains and for N-M, N-C, and M-C domain-domain interfaces are classified by four thresholds. We located those thresholds at the local minima of statistical weight distributions. Specifically, thresholds of N-domain’s

To draw a folding network, we used a physical model of network, which is called a spring-electrical model [

To discretize structural conformations, we classified six Q-scores of parts and six N-scores of parts. Here, N-score represents degree of formed non-native contacts and was defined as the number of formed non-native contacts relatively to the maximal number of the same contacts. Based on the thresholds, we can assign conformations to one of 5^{6}×3^{6} nodes and represent a trajectory by a polygonal line that transits from a node to another. For simplicity, we removed any loops. Here, a loop is a sequence of transitions that start from and return to one node.

(DOCX)

The average is about 28 residues.

(TIF)

Starting from a denatured state, we performed folding simulations for 10^{8} time steps. Temperatur is given in CafeMol unit. The sudden drop in average Q-score was found at the temperatur 440, which corresponds to _{F}*. Folding simulations were conducted at 360, which corresponds to 0.82 _{F}.

(TIF)

In each folding scheme, the last 100 snapshots (corresponding to 10^{5} time steps) are used.

(TIF)

(C) The linear fitting is used to obtain folding times of individual domain. Blue, green, and red curves correspond to folding of N-, M-, and C-domains.

(TIF)

The meaning of symbols are identical to those in

(TIF)

The upper right triangle part shows the probability map of non-native map formed in the last 100 snapshots (corresponding to 10^{5} time steps) in representative trajectories. The lower triangl part shows the native contact map obtained from the native structure. The (i),(ii) and (iii) are three representative misfolded structures corresponding to the same symbols in

(TIF)