^{1}

^{1}

^{1}

^{‡}

^{2}

^{3}

^{‡}

^{1}

^{1}

The authors have declared that no competing interests exist.

‡ These authors also contributed equally to this work.

Earth System Models (ESMs) are excellent tools for quantifying many aspects of future climate dynamics but are too computationally expensive to produce large collections of scenarios for downstream users of ESM data. In particular, many researchers focused on the impacts of climate change require large collections of ESM runs to rigorously study the impacts to both human and natural systems of low-frequency high-importance events, such as multi-year droughts. Climate model emulators provide an effective mechanism for filling this gap, reproducing many aspects of ESMs rapidly but with lower precision. The

Two important topics to researchers in earth sciences, future climate dynamics, and joint human-Earth system modeling are the effects of extreme events and uncertainty in climate impacts [

Process-rich Earth System Models (ESMs) are capable of providing the high-resolution future climate scenarios for impacts studies, but are too computationally expensive to directly produce hundreds or thousands of realizations needed to understand the impacts of extreme events. Climate model emulators attempt to solve this problem by approximating the output a climate model would have produced had it been run repeatedly for a specified scenario [

In

Nevertheless, temperature is not the only important variable for quantifying the effects of future climate on human systems. Precipitation, for example, is of interest because of the serious impacts of extreme precipitation events, as the distribution of precipitation over space and time is a primary driver of extreme events like droughts and floods [

Recent research has shown that modeling temperature and precipitation extremes independently mischaracterizes drought hazards as the covariance between climate variables is missed [

The scientific details of generating new realizations of residuals from a training matrix of ESM residuals are detailed in the

When the training data features normally distributed residuals in every grid cell, the field generating process generates new realizations of time series that preserves three key statistical properties of the training residuals (arrow 3 in

Distribution of values in a grid cell over time and between realizations. In other words, residuals in the grid cell are normally distributed with the same mean and variance as the training data in that grid cell.

Correlation between values in different grid cells.

Time autocorrelation of spatially correlated patterns of grid cells.

Workflow: Extending fldgen v1.0 (arrow 3) to fldgen v2.0 (arrows 1-4) for use with joint temperature and precipitation fields.

For many ESMs, the temperature residuals in each grid cell are indeed approximately normally distributed. However, residuals for other variables, such as precipitation, in many grid cells may have a non-normal distribution (e.g.

The empirical CDF of precipitation residuals in a single grid cell (black) and the CDF of a normal distribution with the same mean and variance as the precipitation residuals (red).

For the core field generating method, each ESM variable must effectively be able to accept residuals of between −∞ and ∞ for addition to its mean field. This is already the case for temperature and no transformation is needed. However, because precipitation values in an ESM cannot be negative, either the generated residuals have to be constrained to avoid negative precipitation while preserving the ESM spatiotemporal and intervariable statistics, or the method must operate on a transformation of precipitation that can accept residuals between −∞ and ∞. The latter is more straightforward, and so residuals are generated for log(precipitation) rather than precipitation. The transformation of generated full fields from log(precipitation) space back to precipitation space is trivial for log transformations. Indeed, any function that is continuous, invertible, strictly increasing, and results in a transformed ESM variable whose residuals are supported on (−∞, ∞) will preserve the ESM spatiotemporal statistical properties as desired.

Producing joint temperature and log(precipitation) residual realizations therefore requires the development of an algorithm to handle non-normality in transformed ESM training data. This algorithm extension for

Details of the steps shared with version 1.0 (denoted with

1. |
Select ESM runs for training the emulator. |

2. |
Select and fit the mean response model relating local temperature ( |

3. |
Calculate the residuals by subtracting mean response from ESM output. |

4. |
Map the distribution of ( |

5. |
Form a joint matrix of state residuals (spatially flattened, concatenated, normally distributed |

6 |
Perform principal components analysis (PCA) on the joint _{i}( |

7. |
Compute the discrete Fourier transform (DFT) [ |

8. |
Choose new phases of Φ_{i} randomly, uniformly on [0, 2 |

9. |
Compute the projection coefficients |

10. |
Compute the generated, joint |

11. | Map the generated ( |

12. |
Add to the respective |

13. | Take the exponential of the generated log( |

* Denotes a step shared with

The

We have developed a continuous, invertible, and strictly increasing transformation method that allows us to map between the native distributions of residuals for each grid cell and a normal distribution of residuals for each grid cell. This transformation removes the need for user expertise as to whether residuals in every grid cell follow a distribution sufficiently close to a normal distribution for the field generating method outlined in Paper LS1 to work.

For a given grid cell, the native distribution of residuals (temperature or precipitation) over time can be captured with an empirical cumulative distribution function (CDF), _{i}) = _{i}. These sampled quantile values are used to calculate the corresponding values sampled from the standard normal distribution,

Left: The empirical CDF of log(precipitation) residuals in the same grid cell as

Given a set of generated residuals that follow

This mapping is applied to input, native residuals to create normally distributed residuals for emulator training (arrow 2 in

These generated residual time series are added to the mean fields for temperature and log(precipitation) to generate new full field time series. The transformation of the full field log(precipitation) time series values to precipitation values does not undo any of these statistical properties of the residual fields.

Extensive integrated, automatic testing is provided in the

The testing is automatically run when a Pull Request is opened in the

While it mathematically follows from the construction of this method that the rank-correlation coefficient is the same in generated data as it was in the training ESM data for every pairwise possibility of grid cell to grid cell comparison of temperature with temperature, precipitation with precipitation, and temperature with precipitation, we also performed an explicit check of of this using

The extension of

Perhaps the most significant current limitation of

A future extension to handle more than two variables jointly would also be straightforward by design. Thus there is wide potential for reuse of

## install the package if needed

devtools::install_github(‘JGCRI/fldgen’, ref = ‘v2.0.0 − rc.1’)

As an open source R package,

The user must select an ESM to emulate and provide annual, spatially disaggregated temperature and precipitation NetCDF files to be used for training. The

## load the package

library(‘fldgen’)

## specify the location of the training data

datadir <- ‘training/data/directory’

## train the emulator

trainTP(dat = datadir,

tvarname = ‘tas’, tlatvar = ‘lat’, tlonvar = ‘lon’,

tvarconvert_fcn = NULL,

pvarname = ‘pr’, platvar = ‘lat’, plonvar = ‘on’,

pvarconvert_fcn = log) ->

emulator

Full details of the functions used internally by

The inputs to

A single directory name, or a list of NetCDF files. If a directory name is given, all NetCDF files in the directory will be used. The pairing of temperature and precipitation NetCDF files in the directory relies on the CMIP5 file naming conventions. Other naming conventions are not currently supported.

A string with the name of the temperature variable in the temperature NetCDF.

A string with the name of the latitude coordinate variable in the temperature NetCDF files. Normally this is simply ‘lat’.

A string with the name of the longitude coordinate variable in the temperature NetCDF files. Normally this is simply ‘lon’.

The function used to transform the temperature variable prior to training so that it has support on (−∞, ∞). Defaults to

A string with the name of the temperature variable in the precipitation NetCDF.

A string with the name of the latitude coordinate variable in the precipitation NetCDF files. Normally this is simply ‘lat’.

A string with the name of the longitude coordinate variable in the precipitation NetCDF files. Normally this is simply ‘lon’.

The function used to transform the precipitation variable prior to training so that it has support on (−∞, ∞). Defaults to

A trained emulator can easily be stored as an R .rds object for later use. A trained emulator contains all information learned about the ESM, including components that are useful for downstream analysis, but not necessary for generating new fields of residuals:

From a trained

## Set RNG seed if reproducible results desired:

set.seed (11)

## Generate new residuals

residgrids <- generate.TP.resids (emulator, ngen = 5)

The function

From a trained

generate.TP.fullgrids(emulator, residgrids, tgav = tgav,

tvarunconvert_fcn = NULL,

pvarunconvert_fcn = exp,

reconstruction_function = pscl_apply) ->

fullgrids

Strict requirements for input files are documented. An example workflow for training an emulator and generating new residuals (using the commands outlined above) is included in the package. Using the R packages

R (≥ 3.3.3).

Required dependencies include the R packages: assertthat (≥ 0.2.0), dplyr (≥ 0.7), tidyr (≥ 0.7.1), tibble (≥ 1.3.4), ggplot2 (≥ 2.2.1), scales (≥ 0.5.0), reshape2 (≥ 1.4.2), ncdf4 (≥ 1.16), rlang (≥ 0.1.2).

Optional dependencies include the R packages: testthat, gcammaptools (≥ 0.4), covr, knitr, rmarkdown.

Name: zenodo

Persistent identifier:

Licence: BSD 2-Clause

Publisher: Robert Link

Version published: v2.0.0 prerelease

Date published: archive of prerelease published 25/03/2019

Name: GitHub

Identifier:

Licence: BSD 2-Clause

Date published: prerelease published 25/03/2019