^{1}

^{2}

^{2}

^{3}

^{*}

Conceived and designed the experiments: ACP RI SB ALJ. Performed the experiments: ACP. Analyzed the data: ACP ALJ. Contributed reagents/materials/analysis tools: ACP ALJ. Wrote the paper: ACP RI SB ALJ.

The authors have declared that no competing interests exist.

Stable isotope analysis is increasingly being utilised across broad areas of ecology and biology. Key to much of this work is the use of mixing models to estimate the proportion of sources contributing to a mixture such as in diet estimation.

By accurately reflecting natural variation and uncertainty to generate robust probability estimates of source proportions, the application of Bayesian methods to stable isotope mixing models promises to enable researchers to address an array of new questions, and approach current questions with greater insight and honesty.

We outline a framework that builds on recently published Bayesian isotopic mixing models and present a new open source R package, SIAR. The formulation in R will allow for continued and rapid development of this core model into an all-encompassing single analysis suite for stable isotope research.

Stable isotope approaches are an important ecological tool, enabling increasingly sophisticated questions to be addressed in a number of fields

Numerous approaches to solving isotopic mixing models have been proposed

Although these approaches have been successful, some recurring issues remain:

The task of dealing with uncertainties inherent in all types of biological systems, particularly ecological situations

Working with underdetermined systems, where there are many more potential sources than isotopes.

Incorporating variability into the input parameters, such as the end members (consumers), sources and trophic enrichment factors (TEFs).

Dealing with external sources of variation not connected to isotopic uncertainty (such as physiological differences or unidentified minor dietary sources).

In general, some existing models can incorporate variability but are constrained by the number of sources, e.g. IsoError

Bayesian inference offers to circumvent the limitations indicated above, incorporating many more sources of variability within the model, while allowing for multiple dietary sources and then generating potential dietary solutions as true probability distributions. We present a novel methodology for analysing mixing models implemented in the software package SIAR (Stable Isotope Analysis in R)

First, we outline the algebra for our system. We deal with a generic situation where data comprise

_{ij}

_{jk}_{jk}_{jk}^{2}

_{jk}_{jk}_{jk}^{2}

_{k}

_{jk}

_{ij}_{j}^{2}

The model is formulated as follows:^{2}_{k}

The Dirichlet distribution treats each source input as independent but requires they sum to unity. SIAR allows users to specify prior information on the mean proportions (that sum to unity) for each dietary source and a standard deviation for the first of these proportions; this is used to generate

The generated marginal distributions of a Dirichlet distribution with ^{th} and ^{th} dietary proportions. The default SIAR model sets each of the ^{2}(K+1))_{k}

Model fitting is via Markov chain Monte Carlo (MCMC) which produces simulations of plausible values of _{k}

First, we illustrate the model with a simulated example involving 2 unique isotope measurements on 10 organisms whose diets comprise 3 different uncertain sources: A, B and C; in SIAR these are treated as normally distributed. We set TEFs to zero and the concentration dependencies as equal with no loss of generality. Setting the trophic enrichment values to zero mean and zero standard deviation has no bearing on the performance of the model. Adding variation here is mathematically identical to increasing variation on the sources since the variances are combined additively in the formation of the likelihood function:

(A) Consumer (open circles and crosses) and source (filled squares) isotope values from two data sets with different between-individual variability in the animal isotope measurements (standard deviation

Below we outline pseudo-code for generating the data sets used for testing the SIAR coverage properties. The user first has to input the number of data sets required for testing (we used 1000), the number of consumers required for each data set (we use 10), as well as lower and upper limits on the number of sources (we use 3 to 5) and the number of isotopes (we use 2 to 3). Key to the pseudo-code is the likelihood function (which applies when the concentration dependence parameters are set equal _{1}_{2}_{k}

Loop dataset number;

Generate a random set of proportions, _{k}

Generate source means (

Generate source standard deviations (

Similarly generate fractionation correction means (

Generate consumer means (as given by the mean of the likelihood function eq 9) as a proportion-weighted sum of source and correction means

Generate residual standard errors (

Generate consumer standard deviations (right hand side of likelihood function eq 9) as a weighted sum of the squared proportions times the sum of the source and correction variances. Finally add on the residual variance.

Generate consumer values (

Run the SIAR model for 200,000 iterations.

Check whether estimated 95% credibility intervals for each proportion contain the original generated proportions.

Repeat for next data set.

The values reported in

The model performs well for all of the different scenarios considered. The figure shows the deterioration of model predictions as the number of sources is increased. Performance can be improved by increasing the number of isotopes used.

Second, we conduct a fuller examination of the model, picking a selection of ‘reasonable’ scenarios and test how often the simulated true proportions lie inside the 95% credible intervals of the estimates. Clearly, it is impossible to examine all possible scenarios; the 3 we consider are:

Model as given, with normally distributed error term _{ij}

Model as given, with _{4}-distributed error terms (this is Student's t-distribution with 4 degrees of freedom _{4}-distributed sources and correction values. The _{4} distribution provides long-tailed errors which may be more natural when source and TEF standard deviations are based on few observations (_{obs}

Model as given, but where the two closest sources have been combined to produce a single source.

These more complex scenarios of the sensitivity analysis are easily re-created by adapting the above steps for the simple case by altering the distributions of random variables and averaging across sources when combining the two nearest sources. In each case, 1000 simulated data sets of 10 target organism values were produced for data with between 2 and 3 isotopes and 3 and 5 sources. SIAR performs extremely well (

SIAR works exceptionally well for numerous datasets, appearing robust to violations of its core assumptions (

The Bayesian approach naturally propagates sources of uncertainty into posterior probability distributions, and as such we can make statements about which solutions are more likely than others, allowing us to use these estimates in down-stream statistical models such as relating proportion of a particular source to another measured parameter of interest such as fitness. Ideally one would bolt another Bayesian model onto the SIAR output and use the full posterior distribution. However, such techniques are not currently widely available to ecologists. Instead, since the posterior contains information on which parameters are more likely than others, a measure of central tendency (preferably the mode) could be used and passed into standard frequentist generalised linear models, particularly if the posterior distributions of interest are precise and not highly skewed. We caution users to be aware that the posterior dietary proportion estimates may be highly uncertain and that single summary values (such as the modes) should be used with care. There is also no reason to expect the modes of the marginal posterior distributions to sum-to-unity: something that is not an issue if the full posterior distribution is used in down-stream analyses.

Not surprisingly there are caveats to consider before applying SIAR (several that are common to all mixing models). Some of these are:

SIAR can produce precise estimates, but the underlying model may remain undetermined and thus the outputs represent probable solutions.

SIAR (reasonably) assumes that the variability associated with sources and the uncertainty associated with TEFs is normally distributed. If it is suspected that the distributions depart from this assumption then it is possible to change the likelihood function in SIAR (requiring non-trivial recoding).

SIAR currently assumes that no isotopic routing occurs within the body of the consumer and that all isotopes are assimilated equally

SIAR will always attempt to fit a model, even if the sources lie outside of the isotopic mixing polygon

Recent quantitative advances allow comparison of community structure based on isotope data alone – in

In most instances it will be the causes or consequences of dietary differences that are of interest to the researcher. The Bayesian approach allows further development via the model output, for example the inclusion of the dietary proportions with their uncertainty in generalised linear models to relate diet with other explanatory variables such as the inclusion of random effects in MixSIR

We are grateful to Carlos Martinez Del Rio, Don Phillips and Brian Fry for discussion and insight which greatly improved this paper.