^{1}

^{2}

^{2}

^{1}

^{2}

^{1}

^{1}

^{1}

^{1}

^{3}

^{1}

^{1}

^{*}

ELH, DJB, EZZ, TLK, and JCS are employees of Vertex Pharmaceuticals Incorporated and may own stock or stock options in that company. AD, AMT, JS, and ECM are former employees of Vertex Pharmaceuticals Incorporated and may own or may have owned stock or options in that company. SDM, ID, AG, and GP are employees of Janssen Pharmaceuticals, Inc. (a subsidiary of Johnson & Johnson) and may own stock or stock options in that company.

Conceived and designed the experiments: SDM ID DJB AG AD EZZ AMT JS GP TLK JCS. Performed the experiments: ELH DJB AD EZZ AMT JS. Analyzed the data: ELH JCS SDM ID DJB AG AD EZZ AMT JS GP TLK. Contributed reagents/materials/analysis tools: ELH AD JCS AMT. Wrote the paper: ELH JCS TLK DJB.

For patients infected with hepatitis C virus (HCV), the combination of the direct-acting antiviral agent telaprevir, pegylated-interferon alfa (Peg-IFN), and ribavirin (RBV) significantly increases the chances of sustained virologic response (SVR) over treatment with Peg-IFN and RBV alone. If patients do not achieve SVR with telaprevir-based treatment, their viral population is often significantly enriched with telaprevir-resistant variants at the end of treatment. We sought to quantify the evolutionary dynamics of these post-treatment resistant variant populations. Previous estimates of these dynamics were limited by analyzing only population sequence data (20% sensitivity, qualitative resistance information) from 388 patients enrolled in Phase 3 clinical studies. Here we add clonal sequence analysis (5% sensitivity, quantitative) for a subset of these patients. We developed a computational model which integrates both the qualitative and quantitative sequence data, and which forms a framework for future analyses of drug resistance. The model was qualified by showing that deep-sequence data (1% sensitivity) from a subset of these patients are consistent with model predictions. When determining the median time for viral populations to revert to 20% resistance in these patients, the model predicts 8.3 (95% CI: 7.6, 8.4) months versus 10.7 (9.9, 12.8) months estimated using solely population sequence data for genotype 1a, and 1.0 (0.0, 1.4) months versus 0.9 (0.0, 2.7) months for genotype 1b. For each individual patient, the time to revert to 20% resistance predicted by the model was typically comparable to or faster than that estimated using solely population sequence data. Furthermore, the model predicts a median of 11.0 and 2.1 months after treatment failure for viral populations to revert to 99% wild-type in patients with HCV genotypes 1a or 1b, respectively. Our modeling approach provides a framework for projecting accurate, quantitative assessment of HCV resistance dynamics from a data set consisting of largely qualitative information.

Hepatitis C virus (HCV) chronically infects approximately 170 million people worldwide. The goal of HCV treatment is viral eradication (sustained virologic response; SVR). Telaprevir directly inhibits viral replication by inhibiting the HCV protease, leading to high SVR rates when combined with pegylated-interferon alfa and ribavirin. Telaprevir-resistant variants may be detected in the subset of patients who do not achieve SVR with telaprevir. While the clinical impact of viral resistance is unknown, typically the telaprevir-sensitive virus re-emerges after the end of treatment due to competition between the telaprevir-sensitive and resistant variants. Previous estimates of these competition dynamics were obtained from population sequence data, which are qualitative and have a limited sensitivity of ∼20%. We sought to improve these estimates by combining these data with clonal sequence data, which are quantitative and have a sensitivity of ∼5%, and using quantitative modeling. The resulting model, which was verified with an independent data set, predicted that the median time for telaprevir-resistant variants to decline to less than 1% of the viral population was ≤1 year. Our modeling approach provides a framework for accurately projecting HCV resistance dynamics from a dataset consisting of largely qualitative information.

Hepatitis C is an inflammatory infection of the liver caused by the hepatitis C virus (HCV). HCV chronically infects approximately 170 million people worldwide

Telaprevir is a direct-acting antiviral that inhibits viral replication by binding to the active site of the HCV NS3-4a protease, an enzyme essential for viral replication

In clinical studies, telaprevir-resistant variants were identified in the majority of patients who did not achieve an SVR with telaprevir treatment

The analysis reported here builds upon the work of Sullivan et al.

In Sullivan

To illustrate the concept of resistance monitoring,

(A) Dynamics for the total viral load (Total, green), wild-type virus (WT, blue), and a telaprevir-resistant variant (Resistant, red) during and after treatment with telaprevir-based treatment. LOD is the limit of detection for the ‘total’ viral load quantification, and Seq. LOD is the limit of detection above which sequencing can be reliably performed (1000 IU/ml). The treatment phase is shown by the gray bar. (B) Corresponding percent resistance dynamics on a linear scale. Viral sequencing can be performed when the total viral load exceeds the sequencing assay LOD (solid red curve). The dashed lines at 20% and 5% show the limits of detection for population and clonal sequence data, respectively.

The logistic model given by

The fitted dynamics of resistant variant loss for each patient are shown for (A) genotype 1a and (B) genotype 1b.

To examine how well the model fit the data across all patients, histograms of the objective function values (φ) were generated (

(A): Histogram of the log_{10} objective function values (φ; see

(A): Histogram of the log_{10} objective function values (φ; see

To assess the predictive capability of the model, we compared model predictions to resistance quantified by massively parallel sequencing (deep sequencing; DS) from a subset of samples included in our modelling analysis using an Illumina platform (LOD: 1%). These data provided a more quantitative measure of resistance as resistance from these samples had previously been determined using only population sequencing. As the patients and time points chosen for quantification were selected and analyzed independently of this modeling work ^{4} equivalently sized datasets for the DS samples. None of these datasets had a smaller sum of squared errors than that observed in this analysis, indicating that our observations have a <10^{−4} probability of being generated by chance alone. Additionally, we assessed the null hypothesis that the difference between the actual and predicted % resistance values was equal to 0 (

Comparison of the model-predicted and actual (deep sequence) resistance frequency. (A) Predicted resistance frequency for samples with undetectable resistance by deep sequencing. (B) Predicted versus actual resistance frequency for samples with measurable resistance by deep sequencing. The p-value of the prediction is <10^{−4} as determined by Monte Carlo simulation. (C) The differences between the modeled and actual resistance frequency are depicted in the histogram (counts indicated above each bar), with a median % difference of -3.3e-5 and a mean % difference of 1.3.

The model was used to predict population statistics for reversion of virus from the resistant to the WT, non-resistant state. Specifically, Kaplan-Meier analysis was used to calculate the median time to reversion for each HCV subtype. Previously, the estimated time it takes for a population of patients to revert from resistant to WT virus was calculated using only population sequence data

Results for patients with HCV subtypes 1a and 1b are shown in plots (A) and (B), respectively. Hash marks (∧) denote the censored observations indicating the time of the last visit for patients with virus that did not revert to <20% resistant. For clarity, these patients are explicitly denoted on the population sequence (“Pop. Seq.”) and model-predicted time-to-20% resistance curves only.

Population Sequence | Model Prediction | Model Prediction | ||||

HCV subtype | Time-to-20%, months (95% CI) | Censored/Total (n/N) | Time-to-20%, months (95% CI |
Censored/Total (n/N) | Time-to-1%, months (95% CI |
Censored/Total (n/N) |

10.7 (9.9,12.8) | 114/255 | 8.3 (7.6,8.4) | 75/255 | 11.0 (10.3,11.4) | 75/255 | |

0.9 (0.0,2.7) | 9/105 | 1.0 (0.0,1.4) | 5/105 | 2.1 (1.1,2.6) | 5/105 |

The 95% CI assumes a single value for each patient and does not incorporate uncertainity of individual predictions (see

Notably, fewer patients are censored in the modeled results as compared with the population sequence results (

The X-axis represents the inferred time-to-loss of detectable resistance by population sequencing and reflects the first visit wherein the patient did not have detectable resistant variants. The Y-axis relies on the algorithms defined here, wherein the rate of loss is modeled continuously for each patient. The majority of the data points fall to the right of the unity line, indicating that the model predicts more rapid times-to-20% than those estimated from population sequence data.

The predicted time-to-1% resistance was also determined as this value represents the measure of resistance theoretically obtainable by recent massively parallel sequencing approaches (e.g.

The predicted time-to-1% resistance was also used to assess the effect of a number of covariates on reversion times. The covariates assessed were: (1) baseline resistance status, (2) failure modality, (3) prior treatment status, and (4) the length of time PR treatment persisted after the time of treatment failure (

Previously, resistance monitoring of patients who did not achieve an SVR with telaprevir-based treatment in the Phase 3 studies ADVANCE, REALIZE, and ILLUMINATE showed that, after treatment failure in the absence of drug, resistant variants decline over time and are replaced by WT (drug-sensitive) virus

Of note, existing methodology for analysis of population sequence results allows estimation of resistance levels at only discrete time points, and therein can only describe resistance levels in gross bins (

Because the model uses a continuous function, a patient's resistance level can be calculated within a patient's population sequence binned result at a given time (

In contrast to analysis of individual data points (as in Sullivan

Both population and clonal sequencing have limitations in terms of their sensitivity, namely 20% and 5% resistance, respectively, as they are commonly employed. Our model allows for estimation of resistance levels below the sensitivity of the two data types used to generate the models.

Overall, the results suggest that the model captures the resistance dynamics for the majority of patients quite well (^{−10}) had only population sequence data. We observed that the resistance dynamics for these patients were well fit by the population average parameters. Of note, one of the reasons why these patients' dynamics were well fit may be explained by the lack of specific information within the population sequence dataset since data are binned into ranges of between 0 and 20%, 20 and 80%, and 80 and 100% resistance. As such, many fits are possible through many of the population sequence curves which are consistent with the observed population sequence results.

In order to validate the model, we compared the quantitative model predictions of % resistance against actual quantitative results generated by an independent test set (DS). Monte Carlo simulation analyses suggest that the model predictions of % resistance for this test set are accurate with the null hypothesis of equivalence between the methods not refuted. The consistency between the model predictions and the DS results (1% resistance sensitivity) suggests that the model can accurately predict the resistance dynamics between 100% and 1% resistance even though neither of the sequence data types (population and clonal) used to train the model had sensitivities below 5%.

The model cannot fit two modalities of viral evolution. First, the model cannot fit patients whose % resistance increases over time because the model's logistic expression decreases monotonically over time. For example, as in the rare case shown in

Second, the model does not accurately fit virus that appears to have a natural resistance level greater than 1% (e.g.,

To the best of our knowledge, population sequence data have not been explicitly used in any mechanistic modeling analyses thus far. Previously, clonal sequence data were used to construct a multi-variant HCV dynamic model that explained the dynamics of specific telaprevir-resistant variants before and after telaprevir treatment

We used Kaplan-Meier estimation to determine the expected time frames for given events (e.g., the time-to-20% reversion, time-to-1% reversion). This data rich Kaplan Meier analysis (

This analysis provides a novel framework for developing a quantitative understanding of resistant variant evolutionary dynamics. The model enabled prediction of the median time-to-1% resistance for HCV subtypes 1a and 1b (11.0 months and 2.1 months, respectively) even though many patients solely had population sequence data available, which can only be used to determine reversion to 20% resistance. These model predictions (median time-to-1% reversion) were comparable to the median time-to-20% reversion as determined by a previous analysis using only population sequence data

The dataset used for this analysis was previously reported by Sullivan et al.

Population sequence analysis was performed with a minimum of 11× coverage of the NS3 protease, and a median of 4 time points per patient as described by Sullivan et al.

For patients that did not achieve an SVR, resistance evolutionary dynamics after treatment were modeled as a competition between the WT virus and telaprevir-resistant variants (an example from a hypothetical patient is shown in _{max} is 100%. Solving _{0}

The percent resistance (i.e., quantifying the fraction of resistant variant) is simply:_{0}

Qualitative and quantitative information on percent resistance (

The log_{10} values of the model parameters _{0}_{p}_{c}

_{l}

_{l}

_{j}

_{p}

_{c}

For example, considering an expected range of 20–80% resistance with a model prediction of 85% resistance, the constraints in

Data from patients that had both population and clonal sequence data were first fit using _{0}_{10}_{0}_{10}

Some exceptions were made to this strategy:

To maintain a conservative methodology, patients whose samples at the time of treatment failure had <20% resistant virus were assumed to have exactly 20% resistance at the time of failure.

If the sample collected at the final time point for a patient had 80–100% resistant virus as determined by population sequencing, virus in that patient was assumed never to revert to WT.

If the sample collected at the initial time point for a patient had ≥20% and <80% resistant virus as determined by population sequencing, only deviations of the log_{10}_{10}_{0}

Model predictions were compared to results of 52 samples generated by deep sequencing (DS) _{0}

For the Monte Carol simulation, a sum of squared errors was calculated from the difference between the actual and model predicted % resistance. Monte Carlo simulations assumed a uniform distribution within the measured population sequence range. 10,000 replicate datasets were sampled to generate a distribution of errors. The significance of the model prediction was determined by calculating the percentage of errors in this distribution that were less than or equal to the observed error.

To assess the null hypothesis that the modeled results were not different from the actual results, the difference between the actual and predicted result was determined for each sample, and the null hypothesis that the mean of this difference equaled 0 was tested. Because the resultant differences were not normally distributed, the absolute differences were first arcsine square root transformed, but the original sign of the difference was retained. Parametric (

Once parameters for all patient samples were estimated, the modeled time required for virus from samples of each patient to reach a specific percent resistance was calculated by setting the % resistance (_{x%}_{x%}

Population and clonal sequence data were queried using a custom Oracle database with Perl scripts. R (v. 2.15.0) was used to convert these numerical values into a format readable by Octave, which was in turn used to estimate model parameters using the optimization routine sqp.m. R was used to generate figures and calculate Kaplan-Meier statistics using the survival library. Kaplan-Meier statistics were confirmed by independent generation with JMP statistical software (SAS Institute, v. 8.0.1).

To ensure patient confidentiality, an anonymized dataset containing a summatry of the raw data underlying these analyses has been created and is available upon request to

(EPS)

(EPS)

(EPS)

(EPS)

(DOCX)

The authors gratefully acknowledge the contributions of Emily Martin in helping with design of experiments and interpretation of results.