^{¤a}

^{¤b}

^{*}

The authors have declared that no competing interests exist.

Conceived and designed the experiments: PC KZ MMM ID JHC. Performed the experiments: KZ MMM PC. Analyzed the data: KZ MMM ID JHC PC. Contributed reagents/materials/analysis tools: KZ MMM PC. Wrote the paper: KZ MMM ID JHC PC.

Current address: Department of Mathematics, University of Massachusetts Boston, Boston, Massachusetts, United States of America

Current address: The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut, United States of America

Chemical and enzymatic footprinting experiments, such as shape (selective 2′-hydroxyl acylation analyzed by primer extension), yield important information about RNA secondary structure. Indeed, since the

RNA is an important biomolecule, known to play both an

A secondary structure for a given RNA nucleotide sequence

RNA secondary structure can be predicted by Zuker and Stiegler's algorithm

A first step towards integrating chemical/enzymatic probing data was taken by Mathews et al.

Chemical/enzymatic probing data is probabilistic in nature, as exemplified in pars footprinting data

One issue with this approach is that it takes into consideration shape data only for base-stacked positions, i.e., a pseudo free energy term corresponding to shape data is applied at positions where a stacked base pair occurs, but not where nucleotides are unpaired. By ignoring shape data for unpaired nucleotide positions, this approach can thus bias structure prediction to form base pairs even at positions, which shape data may suggest are flexible. Indeed the expected distance of predicted base pairing probabilities computed by RNAstructure with shape values increases after the incorporation of the shape pseudo energy terms (see

Secondary structure prediction accuracy | |||||||||||

RNA | len | test | (A) | (B) | (C) | RNA | len | test | (A) | (B) | (C) |

asp-tRNA | 75 | sens. | 1.00 | 1.00 | 0.76 | phe-tRNA | 76 | sens. | 1.00 | 0.75 | 0.95 |

ppv | 1.00 | 1.00 | 0.76 | ppv | 0.95 | 0.71 | 0.95 | ||||

ave ent. | 0.21 | 0.17 | 0.27 | ave ent. | 0.2 | 0.17 | 0.46 | ||||

str. div. | 19.53 | 17.17 | 22.60 | str. div. | 11.37 | 9.38 | 34.37 | ||||

edist. | 23.7 | 61.77 | 24.9 | edist. | 29.51 | 61.77 | 33.68 | ||||

HCV IRES | 95 | sens. | 0.96 | 0.96 | 0.96 | 5S rRNA | 120 | sens. | 0.94 | 0.94 | 0.26 |

ppv | 1.00 | 1.00 | 1.00 | ppv | 0.82 | 0.82 | 0.22 | ||||

ave ent. | 0.05 | 0.06 | 0.27 | ave ent. | 0.30 | 0.17 | 0.27 | ||||

str. div. | 3.20 | 3.57 | 21.45 | str. div. | 46.93 | 20.70 | 32.90 | ||||

edist. | 31.36 | 52.48 | 36.53 | edist. | 42.57 | 54.01 | 46.41 | ||||

P546 | 155 | sens. | 0.95 | 0.96 | 0.43 | glycine | 162 | sens. | 0.92 | 0.92 | 0.70 |

ppv | 0.96 | 0.98 | 0.44 | ppv | 0.84 | 0.84 | 0.61 | ||||

ave ent. | 0.18 | 0.12 | 0.38 | ave ent. | 0.11 | 0.05 | 0.30 | ||||

str. div. | 27.7 | 14.05 | 66.50 | str. div. | 15.14 | 5.13 | 44.16 | ||||

edist. | 41.36 | 131.77 | 56.11 | edist. | 53.90 | 115.55 | 60.29 |

Nonetheless, MFE dynamic programming methods that incorporate high throughput chemical/enzymatic footprinting data can yield important insights into the structure and function of RNA molecules, much faster than the labor-intensive X-ray diffraction methods.

The motivation for our work is to develop a method that incorporates chemical/enzymatic footprinting data in a

DNA oligonucleotides for the sequence and its reverse complement were purchased from MWG Operon; remaining reagents were obtained from Sigma-Aldrich. DNA oligonucleotides were annealed to create templates for T7 polymerase transcription, and the transcription products were purified by denaturing PAGE and eluted in 10 mM Tris-HCl (pH 7.5 at

Spontaneous cleavage pattern resulting from in-line probing of yeast asp-tRNA, nucleotides with larger backbone flexibility will have higher rates of cleavage and thus bands of greater intensity. Lanes for no reaction, T1 RNase (cleavage following only guanosines), and partial hydroxyl cleavage (-OH, cleavage after each base) are indicated. Due to the high resolution of the gel, double bands appear for nucleotides 2–9. These bands correspond to RNA molecules where the

Briefly stated, our algorithm, RNAsc (

In experiments reported by the Weeks Lab

Normalized (blue circles) and raw (red diamonds) shape values. Gray bars indicate the missing shape values. The subplots shows the piecewise normalization map.

Let

In this section, we describe how to integrate Boltzmann weights into the computation of the partition function for secondary structures of a given RNA sequence.This allows us to compute the probability

To compute partition function

The minimum free energy (MFE) structure can be computed by a modification of McCaskill's algorithm

Pointwise entropy and Morgan-Higgs structural diversity

In this section, we show that on average, the ensemble of low energy secondary structures produced by our method yields a footprinting pattern that more closely resembles the pattern from input experimental shape data; in particular, we prove that the expected distance from (normalized) shape data for the ensemble of low energy structures (our algorithm) is strictly less than the expected distance from shape data for the Boltzmann ensemble of low energy structures (McCaskill's algorithm). First, we require some definitions. All secondary structures

Next, define the expected distance

T

P

The above theorem can be generalized; however, we first require some notation. The weighted partition function

T

The proof the the theorem can be found in

Given RNAsc parameter

In trying to compute

By definition,

Since RNAstructure of Deigan et al.

Since the approach in

In this section we present the benchmarking results for our algorithm RNAsc, a novel algorithm that recalibrates probing data as probabilities of nucleotides being unpaired and integrates this information as ‘soft constraints’ into the computation of minimum free energy secondary structure (see

In order to directly characterize how well shape data reflects RNA secondary structure, we compared normalized

Distribution of shape discrepancies in yeast asp-tRNA

To assess whether an alternative experimental method might yield data that more accurately reflects the secondary structure, we performed in-line probing on the

Our analysis indicates that in-line probing and shape reactivity profiles are quite distinct from one another. See

Distribution of reactivities of data from in-line probing

The signal from in-line probing is significantly more diffuse than that from shape, and the error rate, as calculated above for shape, is significantly higher (

Integrating shape and in-line probing data into our new algorithm RNAsc also shows that shape has an edge over in-line probing. The structures predicted by RNAsc for yeast asp-tRNA using in-line probing and shape data are both identical to the crystal structure. However, one measure of the robustness of the data in the context of our secondary structure prediction algorithm RNAsc is the range of the scaling parameter

The plots show heat maps displaying ppv (

Heat maps illustrating differences between in-line probing

In a second analysis, we compared the pointwise entropy at each nucleotide using no data, shape data, and in-line probing data (see

Pointwise entropy of yeast asp-tRNA, computed from RNAsc using shape data (red squares), in-line probing (blue diamonds), and using no probing data (black circles). Average pointwise entropies: 0.210 (shape data), 0.267 (in-line probing), 0.269 (no data). As expected, by integrating either shape or in-line probing data into RNAsc, the variability (entropy) decreases; however, it appears that variability (entropy) is decreased more by shape than by in-line probing data – again, suggesting that shape data is more robust than in-line probing data when used with RNAsc.

Using

As explained in Deigan et al.

We now show that the smaller structural variation in the RNAstructure ensemble appears to be an artifact of the magnitude of parameters

Suppose that position

Suppose now that position

From these illustrative examples, it is suggestive that structural

Note that the average relative decrease in expected distance of the computed probabilities to shape data from RNAstructure to RNAsc is

The figure shows a plot of the expected distance

We believe RNAsc may be helpful long-term in elucidating the nature of discrepancies between shape and the native structure. As in any experimental protocol, there is a Gaussian error term; however, our data (not shown) indicates that shape discrepancy is positively correlated with high pointwise entropy. Indeed, it seems plausible that a region of the RNA molecule which fluctuates due to thermal motion, thus having higher pointwise entropy, might entail a more variable accessibility for the chemical probe NMIA, thus causing a greater shape discrepancy with the X-ray structure. The program RNAsc allows the user to determine such regions of high pointwise entropy, and to see the structure variability in that region by sampling. It may be possible to confirm or refute our hypothesis concerning the non-Gaussian nature of shape discrepancy (“error”), by performing additional shape probing experiments at lower temperatures. It follows that RNAsc could prove to be a valuable tool in this line of research.

Widespread accessibility of quantitative RNA structural mapping techniques and medium- to high-throughput quantification of the data have motivated the development of computational tools to predict structures from such information. The integration of experimental data as “constraints” in the thermodynamic algorithm when computing minimum free energy (MFE) structure can significantly improve the accuracy of RNA structure prediction. However, such methods are also dependent on the quality of the data used for the constraints

On the

Two recent approaches towards overcoming this error include the iterative ‘sample and select’ approach of Quarrier et al.

Supplementary information.

(PDF)

We would like to thank D.H. Mathews for discussions and for making available the source code of RNAstructure