^{1}

^{2}

^{*}

^{1}

^{3}

^{2}

^{3}

^{1}

^{3}

^{1}

The authors have declared that no competing interests exist.

Explaining the contribution of host and pathogen factors in driving infection dynamics is a major ambition in parasitology. There is increasing recognition that analyses based on single summary measures of an infection (e.g., peak parasitaemia) do not adequately capture infection dynamics and so, the appropriate use of statistical techniques to analyse dynamics is necessary to understand infections and, ultimately, control parasites. However, the complexities of within-host environments mean that tracking and analysing pathogen dynamics within infections and among hosts poses considerable statistical challenges. Simple statistical models make assumptions that will rarely be satisfied in data collected on host and parasite parameters. In particular, model residuals (unexplained variance in the data) should not be correlated in time or space. Here we demonstrate how failure to account for such correlations can result in incorrect biological inference from statistical analysis. We then show how mixed effects models can be used as a powerful tool to analyse such repeated measures data in the hope that this will encourage better statistical practices in parasitology.

Parasitologists aim to understand the factors that determine the outcome of infections (e.g., host and pathogen genetic effects), and how these factors change in response to a new intervention or other environmental variables. However, infections are complex and dynamic: multiple interacting factors shape parasite traits, and within-host environments vary over time and between different niches

Recent years have seen the increasing use of statistical tools, such as mixed effects models, which allow researchers to analyse pathogen dynamics within infections while controlling for issues of pseudo-replication arising from repeated measurements on the same host (i.e., time-series data

At the heart of most statistical tests is the “independence assumption”, which states that model residuals should not be correlated in time or space. Studies where individual subjects are measured on multiple occasions (repeated measures studies) contain potential sources of non-independence, which are not present when individuals are only measured once. In particular for time-series data, unmeasured factors can produce correlations in the data (temporal auto-correlation) over days, weeks, or months (i.e., data points adjacent to one another in a time series are more likely to be similar than those further apart). These correlations can be strong and, importantly, may lead to the appearance of spurious patterns in the data (

Estimates of temporal auto-correlation in key parasite and host traits observed during daily sampling of infections initiated with controlled

Simulated data sets with known levels of temporal auto-correlation between residuals (spanning the range observed in published data sets) were generated using R version 2.12.1 (The R Foundation for Statistical Computing;

There are straightforward solutions to the problems caused by temporal auto-correlation that are routinely used in other biological disciplines, but remain rarely implemented in parasitology. While previous reviews have highlighted the need to track infection dynamics (e.g.,

Bars show the result of a literature search for papers using mixed effects models to analyse time-course data sets in seven parasitology journals from January 2009 to August 2011. Of 76 papers examined, 19 explicitly controlled for temporal auto-correlation (blue), but no controls were mentioned in 56 (red).

There are various ways in which violations of the auto-correlation assumption can be dealt with. One approach is to make tests more conservative by reducing the

When analysing time course data, researchers should apply the following approach: 1) fit grouping variables as a random effect if required, 2) add temporal auto-correlation structure to the model at the appropriate level within the random effects structure, 3) compare the above models to test whether the auto-correlation structure improves the fit, 4) retain the auto-correlation structure if the fit is improved and exclude it if not, 5) ensure the inclusion or exclusion of an auto-correlation structure is reported in the analysis methods or results section of manuscripts.

For example, in the R statistical software package (The R Foundation for Statistical Computing;

Advances in statistical methodology should provide important and useful tools for understanding infections and disease in just the same way as do advances in genetic, molecular, and immunological methods. Investing in learning how to effectively use tools, such as mixed effects models, pays by providing robust and novel insight into the roles of hosts and parasites in shaping patterns of disease. However, as with other methodological advances, the improvements to biological understanding they provide depend crucially on them being applied and interpreted correctly. Temporal correlation in time-course data can compromise statistical analyses by increasing the likelihood of false positives, yet this problem has been largely overlooked in parasitology. We strongly support the implementation of more sophisticated statistical analyses in which the assumptions underlying models are fulfilled to safeguard against inaccurate or misleading results and provide a solid foundation from which to progress understanding of disease.

We would like to thank Sarah Knowles, Ricardo Ramiro, Angus Buckling, and Sam Brown for discussion as well as Judi Allen, Claire Bourke, and Paula MacGregor for reading earlier versions of the manuscript.