Conceived and designed the experiments: PF. Performed the experiments: PF. Analyzed the data: PF. Contributed reagents/materials/analysis tools: PF. Wrote the paper: PF.
The author has read the journal's policy and has the following conflicts: Philippe Flandre has received travel grant or consulting fees from Abbott, BristolMyers Squibb, Gilead, JanssenTibotec and ViiV Health Care. This does not alter the author's adherence to all the PLoS ONE policies on sharing data and materials.
In recent years the “noninferiority” trial has emerged as the new standard design for HIV drug development among antiretroviral patients often with a primary endpoint based on the difference in success rates between the two treatment groups. Different statistical methods have been introduced to provide confidence intervals for that difference. The main objective is to investigate whether the choice of the statistical method changes the conclusion of the trials.
We presented 11 trials published in 2010 using a difference in proportions as the primary endpoint. In these trials, 5 different statistical methods have been used to estimate such confidence intervals. The five methods are described and applied to data from the 11 trials. The noninferiority of the new treatment is not demonstrated if the prespecified noninferiority margin it includes in the confidence interval of the treatment difference.
Results indicated that confidence intervals can be quite different according to the method used. In many situations, however, conclusions of the trials are not altered because point estimates of the treatment difference were too far from the prespecified noninferiority margins. Nevertheless, in few trials the use of different statistical methods led to different conclusions. In particular the use of “exact” methods can be very confusing.
Statistical methods used to estimate confidence intervals in noninferiority trials have a strong impact on the conclusion of such trials.
The efficacy of antiretroviral therapy for treatment of HIV1 infection has improved steadily since the advent of potent combination therapy in 1996
Most of the earlier developments have been made by designing and analysing superiority trials. However, high levels of efficacy and inherent difficulty in the use of combinations of tripledrug makes difficult new improvement. Studies of treatment naïve patients indicate that addition of a fourth drug may provide only small incremental benefits
HIV noninferiority trial has emerged as the new standard design for HIV drug development among antiretroviralnaïve individuals
In this work, we present some recent HIV noninferiority trials designed for naïve and treatment experienced patients.
The objective of this work is to investigate the impact of the statistical analysis currently used on results of recent HIV trials. Criteria to select the HIV noninferiority studies were the following: results published or presented in 2010, inclusion of HIVinfected adult patients (>18 years), use of a primary endpoint based on a difference in proportions reflecting efficacy, and not use of a stratified analysis. In trials using a difference in proportions as primary endpoint, the proportion of response (number of patients with response out of the total number of patients) is provided in each arm. Such information is sufficient to compute the difference in proportions, confidence intervals and tests with any statistical method. It is then easy to recover sufficient information to reanalyze data with another method that the one used in the original publication.
For superiority trials, the full analysis set  intentiontotreat (ITT) population  is recommended because it tends to avoid overoptimistic estimates of efficacy resulting from a per protocol (PP) analysis, since noncompliers included in the full analysis set will generally diminish the estimated treatment effect
Although a modified hypothesis testing framework exists, reporting of the noninferiority trials is often preferred using the confidence interval approach. Most methods, however, provide equivalently a test statistic and a corresponding confidence interval of the observed treatment difference. Let π_{1} and π_{2} represent the true proportions of patients in success in patients receiving the new treatment and the reference treatment (control group). We are interested in the difference, π_{1}−π_{2} = Δ. Null hypothesis for the noninferiority test is H_{0}: Δ≤Δ_{L} versus the alternative hypothesis H_{1}: Δ>Δ_{L} where Δ_{L} is the prespecified noninferiority margin
Four other methods, however, were applied in the analyses of those recent HIV noninferiority trials; Farrington and Manning (FM), Exact, Newcombe, and Miettinen and Nurminen (MN) methods
Eleven noninferiority trials were selected from criteria described above and
Studies  Patients  Comparison  Sample size  Hypothesis  Margin  Power  2sided CI  Method 
EASIER  trtexp  RAL vs. ENF  85 vs. 84  p1 = p2 = 96%  10%  80%  95%  Farrington and Manning 
KALESOLO  trtexp  LPV/r alone vs. HAART  87 vs. 99  p1 = p2 = 90%  12%  80%  90%  Wald 
NCT00162643  naive  EFV vs. LPV/r  95 vs. 94  12%  95%  Wald  
PROGRESS  naive  RAL/LPV/r vs. TDF/FTC/LPV/r  101 vs. 105  p1 = p2 = 75%  20%  90%  95%  Exact CZ 
MONOI  trtexp  DRV/r alone vs. DRV/rregimen  112 vs. 113  p1 = p2 = 90%  10%  80%  90%  Wald 
MONET  trtexp  DRV/r alone vs. DRV/rregimen  123 vs 123  p1 = p2 = 90%  12%  80%  95%  Wald 
SPIRAL  trtexp  RAL vs. PI/r  139 vs. 134  p1 = p2 = 85%  12.5%  80%  95%  Newcombe 
Switchmrk 1 and 2  trtexp  RAL vs LPV/r  172 vs. 174 175 vs. 178  p1 = p2 = 87.5%  12%  90%  95%  Miettinen and Nurminen 
ODIN  trtexp  DRV/r qd vs. DRV/r bid  294 vs. 296  p1 = p2 = 70%  12%  90%  95%  Wald 
M06802  trtexp  LPV/r qd vs. LPV/r bid  300 vs. 299  p1 = p2  12%  >80%  95%  Wald 
Hypotheses of success rates and power were either found in original articles or provided by investigators after request. For one trial, however, information on success rates and power were missing. Hypotheses of success rates varied from 70% to 96% and should be consistent with data from previous studies using both similar treatment regimen and population of patients. In some cases, however, it is difficult to anticipate success or failure rates with a new combination therapy or with a current combination but in a new population of patients.
Most of the noninferiority margin was fixed at 12% or around 12% (two studies had a 10% margin and one a 12.5% margin). The PROGRESS study used an unconventional 20% margin to investigate the efficacy of a new combination (lopinavir/r+raltegravir)
Another key point is the type I error (α significance level) or equivalently the level of the confidence interval (CI). A 1sided α = 0.025 corresponds to a 2sided 95% CI. MONOI and KALESOLO studies used a 2sided 90%. There is a wide use of a 2sided 95%CI although a 2sided 90% CI is deemed acceptable for the noninferiority hypothesis test
The MONET study excluded from the PP analysis mainly patients on the basis of violation of inclusion criteria, while the MONOI study excluded patients on the basis of major protocol violations, including violation of inclusion criteria and violation of the protocol post randomization
Results of the seven trials considering a nonITT analysis in addition to the ITT analysis are displayed in
The four methods, briefly described above, were then applied to data of the 11 trials (
Confidence intervals  
Studies  Analysis  Margin  Results  δ  Wald  Exact CZ  Newcombe  FM 
EASIER  ITT  −10%  98.8% vs. 98.8%  0.01%  −3.3% to 3.3%  −5.6% to 5.7%  −5.3% to 5.4% 

EASIER  OT  −10%  98.8% vs.100%  −1.22%  −3.6% to 1.2%  −7.3% to 3.4%  −6.6% to 3.4% 

KALESOLO  ITT  −12%  83.9% vs. 87.9%  −3.97% 




KALESOLO  Switch included  −12%  90.8% vs. 87.9%  2.93% 

−7.0% to 12.3%  −4.8% to 10.5%  −5.4% to 11.3% 
NCT00162643  ITT  −12%  70.5% vs. 53.2%  17.33% 

3.3% to 30.7%  3.5% to 30.3%  3.5% to 31.1% 
NCT00162643  OT  −12%  85.9% vs. 61.7%  24.2% 

10.4% to 37.1%  10.6% to 36.6%  10.2% to 38.1% 
PROGRESS  ITT  −20%  83.2% vs. 84.8%  −1.6%  −11.6% to 8.4% 

−11.8% to 8.5%  −12.2% to 9.0% 
MONOI  ITT  −10%  87.5% vs. 92.0%  −4.54% 




MONOI  PP  −10%  94.1% vs. 99.0%  −4.9%  −9.1% to −0.8% 



MONET  ITT  −12%  84.3% vs. 85.3%  −1.0% 

−10.1% to 8.3%  −9.9% to 7.9%  −10.1% to 8.1% 
MONET  PP  −12%  86.2% vs. 87.8%  −1.6% 

−10.4% to 7.4%  −10.2% to 6.9%  −10.4% to 7.1% 
Confidence intervals  
Studies  Analysis  Margin  Results  δ  Wald  Exact CZ  Newcombe  FM 
SPIRAL  ITT  −12.5%  89.2% vs. 86.6%  2.6%  −5.1% to 10.4%  −5.5% to 10.9% 

−5.6% to 10.9% 
SPIRAL  OT  −12.5%  96.9% vs. 95.1%  1.8%  −3.9% to. 7.5%  −3.7% to 7.6% 

−5.0% to 8.6% 
Switchmrk 1  ITT  −12%  80.8% vs. 87.4%  −6.54% 




Switchmrk 2  ITT  −12%  88.0% vs. 93.8%  −5.82%  −11.8% to 0.15% 



ODIN  ITT  −12%  72.1% vs. 70.9%  1.2% 

−6.2% to 8.6%  −6.1% to 8.4%  −6.1% to 8.5% 
M06802  ITT  −12%  55.3% vs. 51.8%  3.5% 

−4.6% to 11.6%  −4.5% to 11.4%  −4.4% to 11.4% 
M06802  Observed data  −12%  76.0% vs. 72.2%  3.8% 

−4.4% to 12.1%  −4.3% to 11.9%  −4.3% to 12.0% 
Values in italic in
The reason of a larger confidence interval for the ITT analysis compared with the PP analysis is given in
Width of 95% confidence interval for δ = −5%  
ITT  PP  
N1 = N2  N1 = N2  
p2  p1  200  190  180  170  160  150 


0.103  0.105  0.108  0.111  0.115  0.119 


0.129  0.133  0.136  0.140  0.145  0.149 


0.149  0.152  0.157  0.161  0.166  0.172 


0.163  0.168  0.172  0.177  0.183  0.189 


0.175  0.179  0.184  0.190  0.195  0.202 


0.183  0.188  0.193  0.199  0.205  0.212 


0.190  0.194  0.200  0.206  0.212  0.219 


0.194  0.199  0.204  0.210  0.216  0.223 


0.196  0.201  0.206  0.212  0.219  0.226 
N1 = N2  N1 = N2  
p2  p1  100  95  90  85  80  75 


0.145  0.149  0.153  0.158  0.163  0.168 


0.183  0.188  0.193  0.198  0.204  0.211 


0.210  0.216  0.222  0.228  0.235  0.243 


0.231  0.237  0.244  0.251  0.258  0.267 


0.247  0.254  0.261  0.268  0.276  0.285 


0.259  0.266  0.273  0.281  0.290  0.299 


0.268  0.275  0.283  0.291  0.300  0.309 


0.274  0.281  0.289  0.297  0.306  0.316 


0.276  0.284  0.291  0.300  0.309  0.319 
In general, the Wald method is known as being conservative, i.e., producing smaller width of confidence intervals compared with other methods.
Width of confidence intervals  
ITT  nonITT  
Studies  Wald  Exact CZ  Newcombe  FM  Wald  Exact CZ  Newcombe  FM 
EASIER 

0.112  0.106 


0,107  0,100 

KALESOLO 


0.171  0.175 


0,153  0,167 
NCT00162643  0.273  0.274 


0.262  0.268 


PROGRESS 

0.208  0.203 


MONOI ITT 


0.136  0.137 


0.106  0.104 
MONET ITT 


0.178  0.181 


0.172  0.175 
SPIRAL 

0.163  0.158 


0,113  0,111 

Switchmrk 1 


0.155  0.155  
Switchmrk 2 

0.125  0.127 


ODIN  0.146 


0.146  
M06802  0.160 

0.159 

0,162 


0,163 
Studies  Analysis  Exact CZ  Exact SS  Studies  Analysis  Exact CZ  Exact SS 

ITT  −5.6% to 5.7% 


ITT  −5.5% to 10.9%  −9.3% to 14.6% 

OT  −7.3% to 3.4% 


OT  −3.7% to 7.6%  −10.6% to 14.3% 

ITT 




Switch included  −7.0% to 12.3%  −11.5% to 17.2%  

ITT  3.3% to 30.7%  2.7% to 30.8% 

ITT 



OT  10.4% to 37.1%  8.4% to 38.6% 

ITT 



ITT  −12.0% to 8.8%  −15.1% to 12.2% 

ITT  −6.2% to 8.6%  −6.9% to 9.2% 

ITT 



ITT  −4.6% to 11.6%  −4.7% to 11.5% 

PP 



Observed data  −4.4% to 12.1%  −5.5% to 13.0% 

ITT  −10.1% to 8.3% 



PP  −10.4% to 7.4% 

This work investigated the impact of the statistical methods used in the analysis of HIV noninferiority trials. An optimistic view may consider that, from the 18 datasets (trial/set of population) analyzed by 4 different statistical methods, different conclusion of the results were draw in only 2 occasions. One remark, however, than in some datasets the different methods assessed very distinct confidence intervals. Conclusions were not altered by those different confidence intervals due to the point estimate of the treatment difference. It is obvious that an observed treatment difference far from the noninferiority margin will generally lead to demonstrate noninferiority whatever the method used. In the two datasets with discordant conclusions, the observed treatment differences were −4.9% and −5.82% corresponding to the midpoint between 0 and the noninferiority margin chosen.
The MONOI study provides an interesting situation since the PP analysis concluded to the noninferiority while the ITT was inconclusive. As discussed above, it is often admitted that the ITT analysis tends to dilute the treatment difference and then may lead to erroneously conclude of noninferiority for a drug that is truly inferior to the active control groups among compliers
The regulatory agencies provide guidelines covering the statistical principles for clinical trials
The choice of the noninferiority margin is a key point and should be based upon a combination of statistical reasoning and statistical judgement
Comparison between the two ‘exact’ methods is confusing. First the difference between these two methods is more important than between any exact and any nonexact method. Second, the term ‘exact’ may be very confusing for clinicians who consider that an ‘exact’ method is definitive and that no improvement can be made. In general, one considers that exact methods are better or more appropriate than nonexact methods. But which exact method should be used? Chan and Zhang suggested their method because they pointed out that the SS method was overly conservative
Interestingly some authors have suggested that approximate is better than exact for interval estimation of binomial proportions
A limitation of the study is that we did not applied all the statistical methods that have been proposed to estimate confidence intervals for the difference between independent proportions. The four methods, however, where the methods used in HIV noninferiority trials publisehed in 2010 and represent a large panel of methods. It can also be argued that each method used for the analysis was also used for sample size/power determination. And then only the planned method should be used as corresponding to a given sample size and power. In fact, the four methods provide almost similar sample sizes. For example, with p_{1} = p_{2} = 0.90, α = 0.025 (onesided) 1β = 90%, and Δ_{L} = 0.10, the sample size per group is 189, 204, 200 and 201 with the Wald, FM, Newcombe and Exact CZ, respectively, and 441, 441, 445, and 447, respectively with p_{1} = p_{2} = 0.70 (see also reference
In conclusion, the choice of the statistical methods may lead to different confidence intervals estimates, especially in trials with low or moderate samples size. The exact CZ, Newcombe and FM methods seem the most appropriate methods although further investigation comparing at least those three methods in a clinical trials setting will be helpful to determine the best method according to different scenario. Choice of the methods has low or no impact on determination of the sample size.