^{1}

^{2}

^{*}

^{3}

GA declares that he has no competing interests. RF is also a director and shareholder of a company that provides electronic measurement services to health services researchers; notwithstanding this, he declares that he has no conflicts of interest. This does not alter the authors' adherence to PLOS ONE policies on sharing data and materials.

Conceived and designed the experiments: RF GA. Performed the experiments: RF GA. Analyzed the data: RF GA. Wrote the first draft of the manuscript: RF. Commented on successive drafts: GA. Created the figures: GA. Wrote initial rocmic Stata module and help file: RF. Contributed to modifications of the updated code: GA.

Receiver Operator Characteristic (ROC) curves are being used to identify Minimally Important Change (MIC) thresholds on scales that measure a change in health status. In quasi-continuous patient reported outcome measures, such as those that measure changes in chronic diseases with variable clinical trajectories, sensitivity and specificity are often valued equally. Notwithstanding methodologists agreeing that these should be valued equally, different approaches have been taken to estimating MIC thresholds using ROC curves.

We aimed to compare the different approaches used with a new approach, exploring the extent to which the methods choose different thresholds, and considering the effect of differences on conclusions in responder analyses.

Using graphical methods, hypothetical data, and data from a large randomised controlled trial of manual therapy for low back pain, we compared two existing approaches with a new approach that is based on the addition of the sums of squares of 1-sensitivity and 1-specificity.

There can be divergence in the thresholds chosen by different estimators. The cut-point selected by different estimators is dependent on the relationship between the cut-points in ROC space and the different contours described by the estimators. In particular, asymmetry and the number of possible cut-points affects threshold selection.

Choice of MIC estimator is important. Different methods for choosing cut-points can lead to materially different MIC thresholds and thus affect results of responder analyses and trial conclusions. An estimator based on the smallest sum of squares of 1-sensitivity and 1-specificity is preferable when sensitivity and specificity are valued equally. Unlike other methods currently in use, the cut-point chosen by the sum of squares method always and efficiently chooses the cut-point closest to the top-left corner of ROC space, regardless of the shape of the ROC curve.

Initially developed during World War II for use in interpreting radar signals, receiver operator characteristic (ROC) curves are commonly used in medical research to evaluate screening tests, identify thresholds to facilitate decision-making about patients, and to quantify the responsiveness of quasi-continuous patient-reported outcomes measures (PROMs).

MIC is usually defined as smallest magnitude of change that can be considered important (at the level of the individual), and in the absence of troublesome side-effects and excessive costs, mandates a change in a patient's management.

Epidemiologists have taken different approaches to calculating the optimum MIC cut-point using ROC curves even though they agree that sensitivity and specificity should be valued equally.

For each of these estimators, we first constructed diagrams showing contours of the quantities being minimised, to illustrate the how methods favour different areas of ROC space. Using these diagrams we illustrated how the three different methods can give rise to similar or different estimates of MIC in the situation where the number of possible cut-points is large,

UK BEAM was a multi-centre and multi-arm randomised controlled trial (RCT) (ISRCTN32683578) in which 1,334 participants with non-specific low back pain were randomised. The trial methods are described in detail elsewhere;

We dichotomised the health transition scale as suggested by Lauridsen

The trial protocol was approved by the Northern and Yorkshire multi-centre research ethics committee and 41 local research ethics committees. No additional ethics approval was required to reuse the anonymous trial data in this analysis.

The figure shows the contour diagrams we constructed for the Farrar estimator (Figure 1a), the EMGO estimator 2 (Figure 1b), and the sums of squares estimator (Figure 1c), and contours resulting from all three estimators in the same space (Figure 1d).

In

These contour plots show that the three methods are not equivalent. In particular they demonstrate that different points of the contours drawn by EMGO method (

The figure shows three hypothetical examples of well-behaved ROC curves when the number of cut-points is large (

In real applications, ROC curves often consist of a finite number of possible cut-points, rather than being continuous, and are noisy, rather than well behaved. These issues add further complexities to the selection of MIC using the different approaches.

The figure shows ROC curves constructed using real trial data. Panel 3a shows the Roland Morris Disability Questionnaire changes in the Best Care arm of the UK BEAM trial, at one year. The three estimators choose different points (Farrar = red, EMGO = blue, and sums of squares = green). Panel 3b shows the modified von Korff disability changes across all four arms of the trial, at three months. Panel 3c shows modified von Korff disability data from the Best Care arm at three months. Panel 3d shows modified von Korff disability data from all four trial arms at one-year.

Cut-point | Sensitivity (%) | Specificity (%) | |Farrar|x100% | |EMGO|x100% | |Sum of Squares|x100% |

−11 | 100 | 0 | 100 | 100 | 100 |

−10 | 100 | 0.560 | 99.44 | 99.44 | 99.44 |

−9 | 100 | 2.250 | 97.75 | 97.75 | 97.75 |

−7 | 100 | 3.930 | 96.07 | 96.07 | 96.07 |

−5 | 100 | 5.620 | 94.38 | 94.38 | 94.38 |

−4 | 100 | 8.990 | 91.01 | 91.01 | 91.01 |

−3 | 100 | 10.67 | 89.33 | 89.33 | 89.33 |

−2 | 100 | 13.48 | 86.52 | 86.52 | 86.52 |

−1 | 98.44 | 19.66 | 78.78 | 81.90 | 80.36 |

0 | 98.44 | 26.97 | 71.47 | 74.59 | 73.05 |

1 | 98.44 | 38.76 | 59.68 | 62.80 | 61.26 |

2 | 95.31 | 47.75 | 47.56 | 56.94 | 52.46 |

3 |
87.50 | 58.99 | 28.51 | 53.51 | 42.87 |

4 |
76.56 | 66.29 | 10.27 | 57.15 | 41.06 |

5 |
65.63 | 78.09 | 12.46 | 56.28 | 40.76 |

6 | 54.69 | 85.96 | 31.27 | 59.35 | 47.44 |

7 | 43.75 | 90.45 | 46.70 | 65.80 | 57.05 |

8 | 34.38 | 93.26 | 58.88 | 72.36 | 65.97 |

9 | 23.44 | 95.51 | 72.07 | 81.05 | 76.69 |

10 | 17.19 | 97.19 | 80.00 | 85.62 | 82.86 |

11 | 10.94 | 97.75 | 86.81 | 91.31 | 89.09 |

12 | 9.380 | 97.75 | 88.37 | 92.87 | 90.65 |

13 | 6.250 | 98.88 | 92.63 | 94.87 | 93.76 |

15 | 0 | 98.88 | 98.88 | 101.1 | 100.0 |

Cut-point chosen by the EMGO method.

Cut-point chosen by the Farrar method.

In

Finally,

The results demonstrate that the three estimators can lead to different cut-points and that, in practice, the differences can be magnitudes of several points. This has implications for interpreting improvement in individual patients, and for interpreting the number or proportion of improved patients in clinical trial arms.

The Farrar method finds the point at which sensitivity and specificity are closest together. One problem with this approach is that while ROC curves are monotonic in nature, it is possible for the shape of curve to be such that it crosses the 45 degree tangent line at a point where sensitivity and specificity are closest together, but with a combination of sensitivity and specificity that is less appealing (

Our sums of squares approach always selects the cut-point closest to the top-left corner (1,0) of ROC space. Furthermore, even in the situation where a ROC curve is approximately symmetrical and all three methods should choose the same point, the EMGO method will be more sensitive to noise than the either the sum of squares or Farrar method, and as such should be avoided. This can be seen from

Finally, we acknowledge that our proposed approach and its utility is obvious from simple Euclidean geometry and as such this presented work should be unnecessary. Nevertheless, since currently non-optimal approaches are in widespread use, we considered that this study was necessary to provide clarification and to highlight the extent to which different approaches can affect results. We urge epidemiologists to more carefully consider cut-point selection when using ROC curves to make decisions about MIC thresholds. To help with this, we provide a Stata module that produces estimates based on all three of the approaches discussed, along with a help file, which is held in the RePEc Statistical Software Components archive, and may be installed at the Stata prompt by typing

Thanks are due to David Torgerson and Martin Underwood for allowing us to re-use the UK BEAM trial data in this work. Thanks are also due to University of Warwick, Campus Kristiania, and University of Cambridge for jointly covering the associated open access publication charges.