UQLAB USER MANUAL

STATISTICAL INFERENCE

E. Torre, S. Marelli, B. Sudret

CHAIR OF RISK, SAFETY AND UNCERTAINTY QUANTIFICATION

STEFANO-FRANSCINI-PLATZ 5

CH-8093 Z

URICH

Risk, Safety &

Uncertainty Quantification

How to cite UQ[PY]LAB

C. Lataniotis, S. Marelli, B. Sudret, Uncertainty Quantiﬁcation in the cloud with UQCloud, Proceedings of the 4th

International Conference on Uncertainty Quantiﬁcation in Computational Sciences and Engineering (UNCECOMP

2021), Athens, Greece, June 27–30, 2021.

How to cite this manual

E. Torre, S. Marelli, B. Sudret, UQLAB user manual – Statistical inference, Report UQ[py]Lab -V1.0-114, Chair of

Risk, Safety and Uncertainty Quantiﬁcation, ETH Zurich, Switzerland, 2024

BIBT

X entry

@TechReport{UQdoc_10_114,

author = {Torre, E. and Marelli, S. and Sudret, B.},

title = {{UQ[py]Lab user manual -- Statistical inference }},

institution = {Chair of Risk, Safety and Uncertainty Quantification, ETH Zurich,

Switzerland},

year = {2024},

note = {Report UQ[py]Lab - V1.0-114}

}

List of contributors:

Name Contribution

A. Hlobilov

a Translation from the UQLab manual

Document Data Sheet

Document Ref. UQ[PY]LAB-V1.0-114

Title: UQLAB user manual – Statistical inference

Authors: E. Torre, S. Marelli, B. Sudret

Chair of Risk, Safety and Uncertainty Quantiﬁcation, ETH Zurich,

Switzerland

Date: 27/05/2024

Doc. Version Date Comments

V0.9 24/02/2023 Initial release

V1.0 27/05/2024 Minor update for UQ[PY]LAB release 1.0

Abstract

UQ[PY]LAB supports the statistical inference of probabilistic models by established inference

tools. In the context of UQ, inference allows one to ﬁt different probabilistic models to input

data, thereby selecting the model that best describes them. The input joint PDF is represented

in UQ[PY]LAB by the marginal distributions of the input variables and their copula (see the

companion UQ[PY]LAB User Manual – the INPUT module). Consequently, UQ[PY]LAB offers

tools to infer both marginal distributions and copulas.

This user manual includes a review of the methods that are used to perform inference for

marginals and different classes of copulas. After introducing the theoretical aspects, an in-

depth example-driven user guide is provided to help users to build INPUT objects by inference.

Finally, a comprehensive reference list of the methods and functions available for inference

in the UQ[PY]LAB INPUT module is given at the end of the manual.

Keywords: Probabilistic Input Model, Marginals, Copula, Inference

Contents

1 Theory 1

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Distribution ﬁtting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2.1 Fitting by maximum likelihood . . . . . . . . . . . . . . . . . . . . . . 2

1.2.2 ML estimation in the copula formalism . . . . . . . . . . . . . . . . . . 2

1.2.3 ML estimation for vine copulas . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Model selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.3.1 Non-parametric inference . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.3.2 Parametric inference and selection criteria . . . . . . . . . . . . . . . . 5

1.3.3 Pair-copula selection for vine copulas . . . . . . . . . . . . . . . . . . . 6

1.4 Separating random variables into mutually independent subgroups (block in-

dependence) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2 Usage 9

2.1 Inference of marginals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.1.1 Fully automated inference of marginals . . . . . . . . . . . . . . . . . 9

2.1.2 Specifying inference options for each marginal separately . . . . . . . 10

2.1.3 Fitting or ﬁxing parameters for a given family . . . . . . . . . . . . . . 11

2.1.4 Non-parametric inference: kernel smoothing . . . . . . . . . . . . . . . 11

2.1.5 Fitting a custom marginal distribution . . . . . . . . . . . . . . . . . . 12

2.1.6 Goodness of ﬁt of inferred marginals . . . . . . . . . . . . . . . . . . . 12

2.2 Inference of copulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.2.1 Fully automated inference of copula . . . . . . . . . . . . . . . . . . . 13

2.2.2 Testing for block independence . . . . . . . . . . . . . . . . . . . . . . 13

2.2.3 Speciﬁcation of inference data for the copula . . . . . . . . . . . . . . 14

2.2.4 Inference among a selected list of copula types . . . . . . . . . . . . . 14

2.2.5 Testing for pair independence . . . . . . . . . . . . . . . . . . . . . . . 15

2.2.6 Inference of pair copulas . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.2.7 Inference of Gaussian copulas . . . . . . . . . . . . . . . . . . . . . . . 17

2.2.8 Inference of vine copulas . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.2.9 Goodness of ﬁt of inferred copulas . . . . . . . . . . . . . . . . . . . . 18

3 Reference List 19

3.1 Create an Input with inferred marginals and/or copula . . . . . . . . . . . . . 21

3.1.1 General inference options (valid for marginals and copula) . . . . . . . 21

3.1.2 Options for inference of marginal distributions . . . . . . . . . . . . . 22

3.1.3 Copula inference options . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.2 Working with inferred inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

Chapter 1

Theory

1.1 Introduction

In applied problems, the probability distribution of uncertain quantities is often not known

and has to be inferred from data. In some cases, expert knowledge or theoretical consid-

erations allow one to assume at least a parametric form of the distribution, and inference

is limited to estimating its parameters. Otherwise, the full shape of the distribution needs

to be assessed, using either parametric or non-parametric inference. Multivariate distribu-

tions pose the additional challenge that, beside the marginal distribution of each component,

their dependence structure, or copula, may also be unknown. This user manual describes

how statistical inference can be performed in UQ[PY]LAB to select, among a broad class of

probabilistic models of marginal and copula distributions, the model that best ﬁts a given

dataset.

To avoid confusion, we stress the distinction between inference and ﬁtting. Fitting is a proce-

dure by which the parameters of a given parametric probability distribution are determined

based on the data, so as to maximize the plausibility of that distribution. Inference is a more

general term that encompasses ﬁtting several probability distributions to the data, and then

selecting the most plausible one. The measures of plausibility used for parameter ﬁtting and

for model selection may differ, and will be introduced below.

This manual focuses on the inference process only, while an extensive description of the prob-

abilistic models supported by the UQ[PY]LAB INPUT module is available in the UQ[PY]LAB

User Manual – the INPUT module.

1.2 Distribution ﬁtting

Statistical inference is a process that typically requires two main steps: ﬁtting a number of

plausible probabilistic models to available data, and comparing them to select the best ﬁtting

one. Here we describe the ﬁtting problem.

UQLAB user manual

1.2.1 Fitting by maximum likelihood

Consider M uncertain, real-valued quantities grouped into a random vector X = (X

, . . . , X

for which a probability distribution F

(x; θ) with density f

is assumed. The distribution

has k real parameters θ = (θ

, . . . , θ

), k ≥ 1, which have to be determined based on a set of

n independent observations X = {

(1)

, . . . ,

(n)

} of X .

Various ways exist to estimate the parameters θ. By far the most common maximum likeli-

hood (ML) estimation. The ML estimator of θ based on the data set X is deﬁned as

θ = arg max

h=1

(

(h)

; θ)

def

= arg max

L(θ). (1.1)

The rationale is to seek the parameters

θ that maximize the value of the joint probability of

the observations in X , namely their likelihood. The term L(θ) in Eq. (1.1) is a function of θ

only, rather than of x, and is known as the likelihood function. Maximizing the likelihood in

(1.1) is equivalent to solving

θ = arg max

log (L(θ)) = arg max

h=1

log(f

(

(h)

; θ)). (1.2)

Working with the sum of logarithms is often easier, both analytically and numerically, due to

the inherently small values of f

, as well as the widespread occurrence of exponential forms

in commonly used PDFs, and therefore preferred. The quantity

log(L) =

h=1

log(f

(

(h)

;

θ)) (1.3)

is the total log-likelihood of f

(·;

θ) on the data X . It is a relative measure of goodness of ﬁt

of f

on the data, and it is often used in model selection to identify the distribution that best

ﬁts the data (see Section 1.3.2).

ML solutions to the problem (1.2) are available analytically for some distributions. In the

general case, various numerical algorithms exist to ﬁnd approximate solutions. For instance,

gradient methods start from an initial guess θ

of θ and then iteratively move away from it

towards the direction where the sum of log-likelihoods in (1.3) increases.

Note that not all probability distributions admit a ﬁnite or even a unique solution. For in-

stance, truncated univariate distributions (see the UQ[PY]LAB User Manual – the INPUT mod-

ule) do not in the general case, even when their support is known. In UQ[PY]LAB, these

cases are handled separately by imposing bounds on the parameter estimates. The bounds

can be speciﬁed as speciﬁc values, or as functions of the inference data.

1.2.2 ML estimation in the copula formalism

In UQ[PY]LAB, an M-variate distribution F

is generically speciﬁed in terms of a set of M

univariate distributions F

, . . . , F

, which describe the marginal behavior of X

, . . . , X

UQ[PY]LAB-V1.0-114 - 2 -

Statistical inference

and a copula C

that deﬁnes their mutual dependence:

(x) = C

), F

), ..., F

)) (1.4)

More details can be found in Section 1.3 of the UQ[PY]LAB User Manual – the INPUT mod-

ule. In particular, as discussed in Section 1.3.2 of the same document, C

is also the joint

distribution of the random vector

U = (F

), . . . , F

)).

Fitting F

on X thus involves the following steps:

1. Separately ﬁt the parameters of F

on X

= {x

(1)

, . . . , x

(n)

} to obtain the ﬁtted marginal

, i = 1, . . . , M;

2. Transform the data set X into the set of pseudo-observations U = {

(1)

, . . . ,

(n)

}, where

(h)

:= (

(h)

), . . . ,

(h)

)); (1.5)

3. Fit the parameters of C

on U to obtain the ﬁtted copula

1.2.3 ML estimation for vine copulas

A case that deserves a separate discussion is that of vine copulas (for more details, see

UQ[PY]LAB User Manual – the INPUT module, Section 1.4.4). From a theoretical stand-

point, vine copulas are multivariate distributions (Joe, 2015). As such, their parameters can

be estimated by maximum likelihood numerically. Nevertheless, vine copulas typically com-

prise a large number of parameters: an M-dimensional vine is obtained as the tensor product

of M(M − 1)/2 pair copulas, each typically entailing one or more parameters. Solving the

maximum likelihood maximization problem (1.2) in high dimension is impractical and time

consuming.

The very structure of a vine copula, however, suggests a considerably faster way to ﬁnd an

approximate solution. Rather than globally optimize the vector θ of all vine parameters, one

can estimate the parameters of each pair copula separately, starting from the pair copulas

in the ﬁrst tree of the vine (the unconditional pair copulas). Speciﬁcally, if U is the set of

pseudo-observations deﬁned in (1.5), a sequential (rather than global) estimation approach

for vine copula parameters encompasses the following steps:

1. For each unconditional pair copula C

, which couples X

and X

, estimate its parame-

ters θ

from X

= {(ˆx

(h)

, ˆx

), h = 1, . . . , n}, thereby obtaining

;

2. For each conditional pair copula C

jl |i

, which couples X

and X

conditioned on X

introduce the simplifying assumption that it depends on U

only through the argument

j|i

. Under this assumption:

UQ[PY]LAB-V1.0-114 - 3 -

UQLAB user manual

(a) Derive the univariate conditional distributions

j|i

) and

l|i

), where

j|i

) :=

∂

(u, v)

∂u



(u,v)=(u

)

j|i

is the CDF of U

conditioned on U

;

(b) Obtain the set of conditional pseudo-observations U

jl |i

= {(ˆu

(h)

j|i

, ˆu

(h)

l|i

), h = 1, . . . , n},

where

ˆu

(n)

j|i

(ˆu

(h)

|ˆu

(h)

);

jl |i

on U

jl |i

to obtain the ﬁtted copula

jl |i

3. Iterate to ﬁt the pair-copulas in deeper trees of the vine, that is, with more conditioning

variables.

This general procedure needs to be adapted to the speciﬁc set of pair copulas that populate

the vine. Algorithms for ML estimation of C- and D-vines were originally proposed by Aas

et al. (2009), and are implemented in UQ[PY]LAB.

Note that in the general case sequential ﬁtting does not yield the global solution to prob-

lem (1.2). Nevertheless, it typically provides a sufﬁciently close approximation thereof, at a

considerably reduced computational cost.

1.3 Model selection

Inference of a probabilistic model from data typically has to deal with the problem that

the latter are not associated to a prescribed parametric probability distribution. In such

circumstance, the question arises about which distribution should be ﬁtted to the data. In

practice, this problem can be dealt with by various approaches, brieﬂy discussed below.

1.3.1 Non-parametric inference

Non-parametric inference assigns to the data a parameter-free model, rather than ﬁtting a

parametric one. An example of non-parametric model is the empirical CDF

(x) =

h=1

{

(h)

≤x}

, (1.6)

where x

′

≤ x if and only if x

′

≤ x

for each i = 1, . . . , M . In words, H

has a jump of +1/n

at each observation in X . The corresponding probability mass function is a normalized sum

of Dirac delta functions, each centered at one observation:

(x) =

h=1

{

(h)

≤x}

. (1.7)

Another example of non-parametric inference is kernel density estimation (KDE), also re-

ferred to as kernel smoothing. KDE replaces the delta functions of the empirical PDF with

UQ[PY]LAB-V1.0-114 - 4 -

Statistical inference

a non-negative function (kernel) k(·). The expression for the univariate KDE, often used for

inference of marginal distributions, reads

(x) =

h=1

||ˆx

(h)

− x||

. (1.8)

The kernel is a decaying function of the distance between the point x where the distribution

is evaluated and the point ˆx ∈ X where the kernel is centered. The parameter w, called

the bandwidth, dictates how fast

decays with ||x − ˆx||. A common choice for the kernel

is the standard Gaussian distribution. When the size n of the data set is large enough(e.g..,

n ≥ 100), the choice of the kernel function has a negligible effect on the ﬁnal KDE estimate,

provided that the kernel bandwidth w is properly selected according to n. A common choice

when using the Gaussian kernel is Silverman’s rule (Silverman, 1996)

w =



4ˆσ



≃ 1.06ˆσn

−1/5

, (1.9)

where ˆσ is the sample standard deviation of the data. This choice minimizes the mean

integrated square error, but should be used with care. Indeed, it may yield widely inaccurate

(for instance, strongly over-smoothed) estimates if the data come from a strongly skewed or

multimodal distribution. Finally, KDE generalizes to the multivariate case (Simonoff, 1996),

which is here omitted for simplicity.

Non-parametric inference typically produces models that ﬁt the training data X very well,

in the sense that their total log-likelihood (1.3) is high. However, these models exhibit poor

prediction power, because they concentrate all probability mass at (for the empirical CDF)

or around (for KDE) the observed points, thereby assigning negligible probability elsewhere

(overﬁtting).

1.3.2 Parametric inference and selection criteria

Parametric inference produces probability distributions that are less ﬂexible than empirical

or kernel smoothing ones, as they have a ﬁxed analytical expression that is described by just

a few parameters. The parameters of any such probability distribution are ﬁtted to the data

as described in Section 1.2. The question then becomes which of the many known probability

distributions to ﬁt to the data. At least in low dimension (M = 1 or 2), a plot of the data or

other considerations may help to discard a priori some families of probability distributions

and retain others. For instance, univariate data with a semi-ﬁnite support or with a multi-

modal histogram cannot have originated from a Gaussian distribution, while very skewed

data cannot come from a symmetric distribution.

Once a set of possible families of parametric distributions has been determined, the family

that best describes the data must be selected. Various selection criteria exist to this end. The

following applies to both marginal distributions and copulas, unless stated otherwise.

The maximum likelihood criterion selects the ﬁtted distribution with the highest log-likelihood

UQ[PY]LAB-V1.0-114 - 5 -

UQLAB user manual

score (see Eq. (1.3)). This choice, however, tends to favor distributions with a larger number

of parameters, which more easily adapt to the data but possibly lead to overﬁtting. To avoid

overﬁtting, a penalty term on the number of model parameters can be introduced.

The Akaike Information Criterion (AIC; Akaike, 1974) selects the distribution which mini-

mizes the quantity

AIC = 2k − 2 log(L), (1.10)

where k is the number of model parameters.

An even stronger penalization by the number of distribution parameters is the Bayesian In-

formation Criterion (BIC; Schwartz, 1978), which minimizes

BIC = log(n)k − 2 log(L), (1.11)

where n is the number of data points used to ﬁt the distribution.

Sometimes, maximizing the total likelihood (with or without penalization) produces a prob-

ability distribution with a substantial peak centered where most data points accumulate, and

too little probability mass elsewhere. An example of this behavior is shown in Figure 1. A set

of 1 000 data points is drawn from a Gaussian distribution truncated to the left (µ = 0, σ = 1,

support [1, +∞); true distribution and histogram of the data shown in the left panel). Five

different families are then ﬁtted to the data: Gaussian, truncated Gaussian, Weibull, Gamma,

and Lognormal distributions. For the latter four, the truncation interval is speciﬁed as the

true one, [1, +∞). The Gamma distribution yields the lowest AIC, despite visually exhibiting

the largest deviation from the histogram of the data. In this case, one would intuitively want

to select the distribution which most closely follows the data histogram.

This intuition is formalized by the Kolmogorov-Smirnov distance (K-S) criterion. The criterion

selects the family whose cumulative distribution has the lowest maximum distance from the

empirical CDF of the data (1.6), that is, which minimizes

= max

x∈R

(x) − H(x)|. (1.12)

In the example shown in Figure 1, right panel, the K-S criterion would select the Lognormal

distribution (cyan) among those illustrated in the ﬁgure.

Currently, the K-S criterion in supported in UQ[PY]LAB for univariate distributions only.

1.3.3 Pair-copula selection for vine copulas

Selecting the vine copula that best ﬁts a data set X , or its counterpart U deﬁned in (1.5),

among all existing vines of dimension M, involves the following steps:

1. Selecting a vine structure (for C- and D-vines: selecting the order of the nodes);

2. Selecting the parametric family of each pair copula;

3. Fitting the pair copula parameters to U.

UQ[PY]LAB-V1.0-114 - 6 -

Statistical inference

0 1 2 3 4

0.5

1.5

(x)

Data

true pdf

0 1 2 3 4

0.5

1.5

(x)

Data

true pdf

Gaussian

Gamma

Weibull

LogNormal

Figure 1: Example of inference on a dataset generated from a truncated Gaussian distribu-

tion. The left panel shows the normalized histogram of the data and the true PDF. The right

panel shows different parametric distributions ﬁtted to the data. The Lognormal distribution

(cyan) would be selected by the Kolmogorov-Smornov (KS) criterion.

Steps 1-2 form the representation problem (for more details on vine copula representation

in UQ[PY]LAB, please refer to the UQ[PY]LAB User Manual – the INPUT module). Concern-

ing step 3, algorithms to compute the likelihood of C- and D-vines (Aas et al., 2009) and of

R-vines (Joe, 2015) given a data set U exist, enabling parameter ﬁtting based on maximum

likelihood. In principle, the vine copula that best ﬁts the data may be determined by iterat-

ing maximum likelihood ﬁtting over all possible vine structures and all possible parametric

families of comprising pair copulas. In practice, however, this approach is computationally

infeasible in even moderate dimension M due to the large number of possible structures

(Morales-N

apoles, 2011) and of pair copulas comprising the vine. Aas et al. (2009) therefore

suggested a different, faster approach which ﬁrst solves step 1 separately. The optimal vine

structure is selected heuristically so as to capture ﬁrst the pairs (X

, X

) with the strongest

dependence (which then fall in the upper trees of the vine). Kendall’s tau (Stuart et al., 1999)

is the measure of dependence of choice, deﬁned by

= P((X

−

)(X

−

) > 0) − P((X

−

)(X

−

) < 0), (1.13)

where (

) is an independent copy of (X

, X

). If the copula of (X

, X

) is C

, then

, X

) = 4

Z Z

[0,1]

(u, v)dC

(u, v) − 1. (1.14)

For a C-vine, ordering the variables X

, . . . , X

in decreasing order of dependence corre-

sponds to selecting the central node in tree T

as the variable X

which maximizes

j̸=i

then the node of tree T

as the variable X

which maximizes

j /∈{i

}

, and so on. For a

D-vine, this means ordering the variables X

, X

, . . . , X

in the ﬁrst tree so as to maximize

M−1

k=1

k+1

, which can be solved as an open traveling salesman problem (OTSP; Applegate

et al., 2006). UQ[PY]LAB solves the OTSP problem looping through all paths if M ≤ 9,

and stochastically using a genetic algorithm otherwise. An algorithm to ﬁnd the optimal

UQ[PY]LAB-V1.0-114 - 7 -

UQLAB user manual

structure for more general regular vines has been proposed by Dißmann et al. (2013). For

a selected vine structure, the corresponding pair copulas are inferred following a sequential

approach analogous to the one outlined in Section 1.2.3: each pair copula is selected among

the allowed families separately, thereby avoiding to compare all combinations. Any of the

selection criteria illustrated in Section 1.3.2 can be used.

Finally (and optionally), one can keep parametric expression of the vine determined through

the sequential approach, and perform global likelihood maximization. Note that this step

consists in the search for a maximum in a k-dimensional space, where k is the (usually large)

total number of parameters of the vine, and may therefore be extremely time consuming

while improving the quality of the ﬁt only marginally.

1.4 Separating random variables into mutually independent sub-

groups (block independence)

A useful preprocessing step for data from a multivariate sample consists in determining

groups of random variables that exhibit inter-group independence. That is, taken any pair

, X

) of random variables such that X

and X

belong to different groups, X

and X

are mutually independent. Once these groups, or blocks, have been identiﬁed by means of

an appropriate block independence test, their copula can be expressed as the tensor product

of the copulas of the individual subgroups (see the UQ[PY]LAB User Manual – the INPUT

module, Eq. (1.8)).

The copula of each subgroup can be then inferred from data as explained below. Of course,

the block independence test can be skipped and an individual copula can be inferred from the

full dataset for the entire input random vector. However, inferring lower-dimensional copulas

for independent subgroups separately is typically computationally cheaper, and leads to more

interpretable dependence structures.

UQ[PY]LAB-V1.0-114 - 8 -

Chapter 2

Usage

2.1 Inference of marginals

The following sections describe how to infer marginal distributions on available data. All

examples refer to a univariate problem, which requires the user to specify a single marginal

iOpts['Marginals'][0]. Multivariate cases can be handled analogously by specifying the

additional marginals iOpts['Marginals'][1], iOpts['Marginals'][2], and so on, con-

sistently with the basic UQ[PY]LAB INPUT syntax (see UQ[PY]LAB User Manual – the INPUT

module). All marginals are treated separately.

All UQ[PY]LAB options to perform inference of marginal distributions on data are listed in

Section 3.1 (see Table 4 and subsequent ones).

2.1.1 Fully automated inference of marginals

The most extensive (least informed) form of statistical inference consists in ﬁnding the dis-

tribution family that best represents the data, and the parameters of such distribution that

best ﬁt the data. Given a data set X, inferring the family of each marginal distribution across

all families supported in UQ[PY]LAB, and ﬁtting its parameters, is possible with the code

iOpts = {

'Inference': {

'Data': X

'Copula': {

'Type': 'Independent'

}

InputHat = uq.createInput(iOpts)

Note that the matrix X must be of type list. The user can use X.tolist() to convert the

data elements of an array into a proper type. For each column i in the matrix X, UQ[PY]LAB

loops across all supported univariate distributions, ﬁts them all to the data in that column,

and assigns the selected distribution to the i-th marginal of InputHat. In particular:

• The optimal parameters for each family are obtained by likelihood maximization, that

is, by solving (1.1) or the equivalent problem (1.2)

UQLAB user manual

• The distribution family that best ﬁts the data is selected according to the default Akaike

information criterion (AIC, see (1.11)). Different criteria can be speciﬁed through the

key iOpts['Inference']['Criterion'] (see Table 1);

• Only supported parametric families are considered by default (no KDE). For a list of

supported families and their properties, see the companion UQ[PY]LAB User Manual –

the INPUT module, Appendix A;

• UQ[PY]LAB automatically discards those parametric families that are incompatible with

the data (for instance, distributions with positive support for data with negative ele-

ments);

• In the example above, the copula is set to be the independence copula. If the copula is

not speciﬁed, it is also inferred among all supported copula types (see Section 2.2).

The explicit, but in this case superﬂuous, code for automatic inference of the marginals on

data X with M columns would be to set all marginal types to 'auto':

iOpts['Marginals'] = [{'Type': 'auto'} for i in range(M)]

Note: For the inference of the marginals, UQ[PY]LAB uses the Matlab built-in function

fitdist. During the ﬁtting process, this function might generate warnings. This

should not impact the quality of the resulting ﬁt as long as the inferred distribu-

tion type did not generate any warnings.

2.1.2 Specifying inference options for each marginal separately

In the above examples, the (implicit) inference options speciﬁed in iOpts['Inference']

are identical for all marginals. Furthermore, the set of families considered for inference is

'auto' for all marginals. UQ[PY]LAB also allows one to specify each inference option

separately for each marginal. For instance,

iOpts['Marginals'][i]['Inference'] = {'Criterion': 'KS'}

overwrites the default inference criterion 'AIC' with 'KS' (Kolmogorov-Smirnov distance)

for marginal i.

Analogously, the data to be used for inference can be assigned to each marginal ﬁeld sepa-

rately, as a one-dimensional array:

iOpts['Marginals'][i]['Inference']['Data'] = x_new

iOpts['Marginals'][i]['Inference'] accepts the same keys as iOpts['Inference'],

but the options speciﬁed in the former are speciﬁc to one marginal only, and have priority

over those speciﬁed in the latter. Note that the matrix x_new must be of type list. The user

can use x_new.tolist() to convert the data elements of an array into a proper type.

The distribution families to be tested can also be assigned explicitly for each marginal, as a

list of strings (one per distribution family):

iOpts['Marginals'][i]['Type'] = ['Lognormal', 'Exponential', 'Weibull']

UQ[PY]LAB-V1.0-114 - 10 -

Statistical inference

All marginals for which a speciﬁc key is not assigned explicitly are assigned default values.

Using the three lines of code above, each j-th marginal, j ̸= i, will be inferred among all sup-

ported families (as with the implicit code iOpts['Marginals'][j]['Type'] = 'auto')

on the data X[:,j] speciﬁed in iOpts['Inference']['Data'] (if any), using the AIC cri-

terion (iOpts['Marginals'][j]['Inference']['Criterion'] = 'AIC').

Finally, inference works for all distributions, including user-deﬁned ones (see Section 2.1.5)

and truncated distributions. For instance, to specify that marginal 1 must be inferred as a

truncated distribution in the interval [a, b], write

iOpts['Marginals'][1]['Bounds'] = [a, b]

Inﬁnite values for the bounds are accepted, e.g., float('inf') for positive inﬁnity.

Note: Note that it is not currently possible in UQ[PY]LAB to infer the support bounds

of a truncated distribution. The only exception is the uniform distribution, the

parameters of which are also its bounds.

2.1.3 Fitting or ﬁxing parameters for a given family

The simplest form of parametric inference consists in ﬁtting the parameters of a given prob-

abilistic input model. The following code sets Marginals[0] as a Gaussian distribution with

parameters obtained by likelihood maximization:

iOpts['Marginals'][0]['Type'] = 'Gaussian'

Again, all marginals can be assigned different Type(s). Those whose Type is not speciﬁed are

assigned the default 'auto' (inference performed among all supported parametric families).

Following the same scheme, any marginal can be fully speciﬁed by assigning its Parameters

(or alternatively its Moments), as in

iOpts['Marginals'][0]['Parameters'] = [0, 1]

If so, no inference takes place for that marginal.

2.1.4 Non-parametric inference: kernel smoothing

Kernel smoothing (1.8) can be added to the list of marginal types to be considered for infer-

ence by

iOpts['Marginals'][i]['Type'] = ['ks', ...]

If kernel smoothing is the only distribution type desired, one can simply write

iOpts['Marginals'][i]['Type'] = 'ks'

In this case, inference reduces to setting the marginal’s parameters to the observations, used

as centers of the kernels. This can be done explicitly, thus avoiding inference, by setting

iOpts['Marginals'][i]['Parameters'] = x_i

UQ[PY]LAB-V1.0-114 - 11 -

UQLAB user manual

were x_i are the observations from marginal i (see also the UQ[PY]LAB User Manual – the

INPUT module, Section 2.1.1.2).

The kernel type and bandwidth can be set explicitly in the ['Marginals'][i]['Options']

criteria. For instance, the code below uses a triangular kernel, with a bandwidth equal to 0.1:

iOpts['Marginals'][i]['Options'] = {

'Kernel': 'Triangular',

'Bandwidth': 0.1

}

The default kernel type is Gaussian, and the default kernel bandwidth is set using Silverman’s

rule (1.9). For a list of supported values, see Table 6.

2.1.5 Fitting a custom marginal distribution

UQ[PY]LAB does not support custom marginals.

2.1.6 Goodness of ﬁt of inferred marginals

Once inference is completed and the corresponding INPUT object InputHat has been gener-

ated, a summary of the inference results for each marginal i can be found under

InputHat['Marginals'][i]['GoF']. The latter is a dictionary that contains various good-

ness of ﬁt measures for each distribution that has been ﬁtted to the data. For instance, if the

exponential distribution was ﬁtted to the data, then

InputHat['Marginals'][i]['GoF']['Exponential']

is a dictionary with the following keys:

• LL: the distribution’s log-likelihood;

• AIC: the distribution’s AIC;

• BIC: the distribution’s BIC;

• FittedMarginal: the distribution type and its parameters;

• KSstat: the distribution’s K-S statistic;

• KSpvalue: the p-value of the Kolmogorov-Smirnov test, which represents the approxi-

mate probability that the data come from the speciﬁed (in this case, exponential) dis-

tribution.

Note: InputHat['Marginals'][i]['GoF'] contains such information for each distri-

bution that has been ﬁtted during the inference process, not only for the distri-

bution that was eventually selected as the best-ﬁtting one.

2.2 Inference of copulas

UQ[PY]LAB currently supports ﬁve types of copulas: the independent copula, pair copulas,

the multivariate Gaussian copula, and two classes of vine copulas: the C-vine and the D-

UQ[PY]LAB-V1.0-114 - 12 -

Statistical inference

vine. More details on theoretical aspects and properties of these copulas can be found in

the companion UQ[PY]LAB User Manual – the INPUT module. All UQ[PY]LAB options to

perform copula inference are listed in Section 3.1 (see Table 7 and subsequent ones).

(See also: uq.inferCopula).

2.2.1 Fully automated inference of copula

Inference among all copulas (and marginals) supported in UQ[PY]LAB can be done by

iOpts['Inference'] = {

'Data': X

}

InputHat = uq.createInput(iOpts)

Note that the matrix X must be of type list. The user can use X.tolist() to convert the

data elements of an array into a proper type. The above code is equivalent to additionally

specifying (before the uq.createInput command):

iOpts['Copula']['Type'] = 'auto'

Speciﬁcally:

• if X is an list with only 1 column, no copula exists (the problem is univariate);

• if X has 3 or more columns, a block independence test is ﬁrst performed to identify

possible subgroups of random variables being mutually independent (see Section 1.4);

a copula is ﬁtted to each subgroup.

Note: The block independence test can be skipped; in this case, a single subgroup

is identiﬁed, being equivalent to full input random vector.

For M = 2, the block independence test reduces to a pair-independence test

• for each subgroup X_i:

– if X_i has 2 columns, a pair independence test is ﬁrst performed. If the test is

passed, the two variables are assigned the independence pair copula; otherwise,

all other supported pair copula families are ﬁtted to the corresponding data.

– if X_i has three or more columns, pair copulas are excluded and all other copula

types are ﬁtted instead (for exactly three columns, the class of D-vines is also

excluded as it is equivalent to the class of C-vines). For default inference options

assigned to speciﬁc copula types, see the next sections.

– the best ﬁtting copula is retained.

2.2.2 Testing for block independence

By default, data with dimension 3 or larger are tested for block independence before para-

metric inference actually takes place. The test divides the data into inter-independent blocks,

UQ[PY]LAB-V1.0-114 - 13 -

UQLAB user manual

and copula inference is performed on each block separately. The ﬁnal copula is the product

of the individual copulas inferred for each block.

Two sets of random variables form two separate blocks if and only if each variable in one

block is independent of each variable in the other block. Thus, block independence consists

of a number of tests of pairwise independence. The code

iOpts['Inference']['BlockIndepTest'] = {

'Alpha': 0.05,

'Type': 'Kendall',

'Correction': 'none'

}

instructs each pair independence test to be performed using the Kendall’s tau statistic, at a

signiﬁcance threshold α = 0.05, using no statistical correction for the number of pairwise

tests being performed in total. These are also the default test options. The list of supported

values is illustrated in Table 3.

To turn off the block independence test, simply type

iOpts['Inference']['BlockIndepTest']['Alpha'] = 0

Also notice that the block independence test options can be provided as copula inference

options rather than general options, such as,

iOpts['Copula']['Inference'] = {

'BlockIndepTest':

{

'Alpha': 0.05

}

If speciﬁed in this way, the options under iOpts['Inference']['BlockIndepTest']are

ignored.

2.2.3 Speciﬁcation of inference data for the copula

The data for copula inference can be speciﬁed through either one of the following ﬁelds:

• iOpts['Inference']['Data']. In this case, the data are used for inference of both

the marginals and the copula. For the copula, they are transformed into pseudo-

observations in [0, 1]

through (1.5);

• iOpts['Copula']['Inference']['Data'], equivalent to the former;

• iOpts['Copula']['Inference']['DataU'], if the data are pseudo-observations in

[0, 1]

, obtained from data in the physical space through (1.5).

The latter two are mutually exclusive and cannot be both assigned.

2.2.4 Inference among a selected list of copula types

Inference among a selected list of copula types can be performed by specifying a list of

copula types, such as

UQ[PY]LAB-V1.0-114 - 14 -

Statistical inference

iOpts['Copula']['Type'] = ['DVine', 'CVine']

Each copula type may involve additional mandatory or optional subkeys of Inference, as

described in the next sections and summarized in Table 9-Table 10.

The code

iOpts['Copula']['Type'] = 'DVine'

instead limits inference to a speciﬁc copula type (here, D-vines).

2.2.5 Testing for pair independence

The independence copula C(u, v) = uv assigns two random variables statistical indepen-

dence. One may therefore expect that inference would assign two independent random

variables the independent copula. However, this is not necessarily the case. Due to random-

ness in the data, inference based solely on (penalized) likelihood maximization criteria may

select a different parametric family. For 'Pair' copulas and vines ('CVine', 'DVine'), it

is possible to perform statistical testing to address this issue directly. This can be done by

specifying the key iOpts['Inference']['PairIndepTest']or, equivalently,

iOpts['Copula']['Inference']['PairIndepTest'](the former is ignored if the latter is

provided). For instance, the code

iOpts['Copula']['Inference'] = {

'PairIndepTest': {

'Type': 'Kendall',

'Alpha': 0.05

}

instructs UQ[PY]LAB to perform a classical independence test based on Kendall’s tau prior

to inferring the copula, setting the signiﬁcance threshold to α = 0.05. If the resulting p-value

is larger than α, then the null hypothesis of independence is accepted and the independence

copula is selected. Otherwise, inference proceeds as usual. For vines, it is also possible

to correct the threshold by the total number of tests performed. For instance, Bonferroni

correction sets the effective signiﬁcance threshold of a set of T independent tests performed

over the same data, each with threshold α, to α/T (Shaffer, 1995). It can be requested by

iOpts['Copula']['Inference']['PairIndepTest']['Correction'] = 'Bonferroni'

The option is ignored for 'Pair' copulas, for which a single test is performed.

For an extensive list of all the available options for tests of pair independence, see Table 2.

2.2.6 Inference of pair copulas

Pair copulas describe the dependence among two random variables. As such, their inference

requires bivariate data to be performed on. In this section, the data X used for inference is

an array with two columns. The resulting INPUT object is bivariate, with two marginals and

a two-dimensional copula among them.

UQ[PY]LAB-V1.0-114 - 15 -

UQLAB user manual

The following code infers the copula as a pair copula among all supported pair copula fam-

ilies (the marginals are also inferred, among all supported marginals, as described in Sec-

tion 2.1.1):

iOpts = {

'Inference': {

'Data' = X

'Copula': {

'Type': 'Pair'

}

InputHat = uq.createInput(iOpts)

The different lines of code instruct UQ[PY]LAB that the input copula

1. is to be obtained by inference on data X,

2. as a pair copula,

3. selecting among all supported pair copula families. This can be made explicit by the

(redundant) declaration (to be made before the uq.createInput command):

iOpts['Copula']['Inference']['PCfamilies'] = 'auto'

4. trying any rotation of the pair copula density: 0, 90, 180 and 270

◦

(or only 0 and 90

◦

for families with a symmetric density with respect to the main diagonal, that is, such

that c(u, v) = c(v, u) for all u, v ∈ [0, 1]

A selected list of pair copula families can be speciﬁed as a list of strings , one per family

name, such as

iOpts['Copula']['Inference']['PCfamilies'] = ['Gaussian', 'Frank', 'Clayton']

To infer the copula, UQ[PY]LAB transforms the data X into pseudo-observations U in the unit

hypercube [0, 1]

through (1.5). This and alternative supported ways to specify inference

data for copulas are illustrated in 2.2.3.

It is possible to infer the copula on data X

new

that are not used for inference of the marginals,

by typing

iOpts['Copula']['Inference']['Data'] = Xnew

In this case, iOpts['Inference']['Data']is ignored for copula inference, if existing.

Similarly, pseudo-observations U can be directly provided for copula inference by

iOpts['Copula']['Inference']['DataU'] = U

X and U are n × 2 arrays: each column contains observations of one random variable.

The default selection criterion is the AIC. It can be changed to the BIC or the ML ones by

iOpts['Copula']['Inference']['Criterion'] = 'BIC' # or 'ML'

Note that this will overwrite the criterion possibly speciﬁed in iOpts['Inference']['Criterion'].

UQ[PY]LAB-V1.0-114 - 16 -

Statistical inference

2.2.7 Inference of Gaussian copulas

The following code ﬁts the parameters of a Gaussian copula:

iOpts['Copula']['Type'] = 'Gaussian'

The parameters of the Gaussian copula are obtained as the Spearman’s correlation coefﬁ-

cients between all pairs of random variables.

2.2.8 Inference of vine copulas

The following code infers a C-vine copula with ﬁxed structure 3 − 1 − 2 (that is, comprising

pair copulas C

3,1

, C

3,2

and C

1,2|3

) on a 3-dimensional data set X. The pair copulas are inferred

among the Frank, Gaussian, and t- families only:

iOpts['Copula'] = {

'Type': 'CVine',

'Inference': {

'CVineStructure': [[3,1,2]],

'PCFamilies': ['Gaussian', 'Gumbel', 't']

}

The vine structure, in particular, determines which pair copulas are to be explicitly part of

the vine, and therefore need to be inferred. In this case, the vine comprises the copulas C

3,1

between X

and X

, C

3,2

between X

and X

, and C

1,2|3

between X

1|3

and X

2|3

. C

3,1

inferred from the data X[:,2] and X[:,0], and so on.

It is possible to retrieve this information before performing inference by using the function

uq.CopulaSummary (see also Section 3.8.5 of the UQ[PY]LAB User Manual – the INPUT mod-

ule):

uq.CopulaSummary('CVine', [3, 1, 2])

which returns the output

CVine, dimension 3, structure [3 1 2]. Pair copulas:

Index | Pair Copula

==================================

1 | C_3,1

2 | C_3,2

3 | C_1,2|3

A D-vine could be inferred analogously by specifying

iOpts['Copula'] = {

'Type': 'DVine',

'Inference': {

'DVineStructure': <list>

}

As for the 'Pair' copula case, vine copulas admit inference among a selected list of families,

rather than all supported ones (see Table 9). For instance, one can specify

iOpts['Copula']['Inference']['PCfamilies'] = ['t', 'Frank', 'Gumbel']

UQ[PY]LAB-V1.0-114 - 17 -

UQLAB user manual

Typically, the vine structure is also unknown and needs to be inferred (see Section 1.3.3).

This is done, for a C-vine, by specifying

iOpts['Copula']['Inference']['CVineStructure'] = 'auto'

and analogously for a D-vine. If the structure is not speciﬁed, it is set to 'auto' by default.

Finally, pair independence tests may be performed prior to solving the inference problem, as

illustrated in Section 2.2.5.

Once a vine copula myVine has been inferred from data, its graphical structure can be visu-

alized by the code

uq.display(myVine, show_vine=True)

Information on the constituing pair copulas can be printed on screen by

uq.CopulaSummary(myVine)

2.2.9 Goodness of ﬁt of inferred copulas

Once inference is completed and the corresponding INPUT object InputHat has been gen-

erated, a summary of the results can be found in the key InputHat['Copula']['GoF'].

The latter is a Dictionary that contains various goodness of ﬁt measures for each copula that

has been ﬁtted to the data. For instance, if the 'Gaussian' copula was ﬁtted to the data,

then InputHat['Copula']['GoF']['Gaussian'] is a Dictionary object with the following

keys:

• LL: the copula’s total log-likelihood over the data used for inference;

• AIC: the copula’s AIC;

• BIC: the copula’s BIC;

Note: InputHat['Copula']['GoF'] contains such information for the best ﬁt-

ting copula (the one whose parameters maximize the likelihood func-

tion over the inference data) of each type selected for inference in

iOpts['Copula']['Inference']['Types'], not only for the type that was

eventually selected as the overall best.

UQ[PY]LAB-V1.0-114 - 18 -

Chapter 3

Reference List

How to read the reference list

Python dictionaries play an important role throughout the UQ[PY]LAB syntax. They offer

a natural way to semantically group conﬁguration options and output quantities. Due to

the complexity of the algorithms implemented, it is not uncommon to employ nested dictio-

naries to ﬁne-tune the inputs and outputs. Throughout this reference guide, a table-based

description of the conﬁguration dictionaries is adopted.

The simplest case is given when a value of a dictionary key is a simple value or a list:

Table X: Input

Name String A description of the ﬁeld is put here

which corresponds to the following syntax:

Input = {

'Name' : 'My Input'

}

The columns, from left to right, correspond to the name, the data type and a brief description

of each key-value pair. At the beginning of each row a symbol is given to inform as to whether

the corresponding key is mandatory, optional, mutually exclusive, etc. The comprehensive

list of symbols is given in the following table:

Mandatory

□ Optional

⊕ Mandatory, mutually exclusive (only one of

the keys can be set)

⊞ Optional, mutually exclusive (one of them

can be set, if at least one of the group is set,

otherwise none is necessary)

When the value of one of the keys of a dictionary is a dictionary itself, a link to a table that

describes the structure of that nested dictionary is provided, as in the case of the Options

key in the following example:

UQLAB user manual

Table X: Input

Name String Description

□ Options Table Y Description of the Options

dictionary

Table Y: Input['Options']

Key1 String Description of Key1

□ Key2 Double Description of Key2

In some cases, an option value gives the possibility to deﬁne further options related to that

value. The general syntax would be:

Input = {

'Option1' : 'VALUE1',

'VALUE1' : {

'Val1Opt1' : ... ,

'Val1Opt2' : ...

}

This is illustrated as follows:

Table X: Input

Option1 String Short description

'VALUE1' Description of 'VALUE1'

'VALUE2' Description of 'VALUE2'

⊞ VALUE1 Table Y Options for 'VALUE1'

⊞ VALUE2 Table Z Options for 'VALUE2'

Table Y: Input['VALUE1']

□ Val1Opt1 String Description

□ Val1Opt2 Float Description

Table Z: Input['VALUE2']

□ Val2Opt1 String Description

□ Val2Opt2 Float Description

UQ[PY]LAB-V1.0-114 - 20 -

Statistical inference

3.1 Create an Input with inferred marginals and/or copula

Syntax

InputHat = uq.createInput(iOpts)

The dictionary variable iOpts contains the conﬁguration information for the input object

whose marginals and/or copula are to be inferred from data.

Note: Inference and representation can be mixed: one may, for instance, declare some

marginals to be inferred while others are known, or infer the marginals but not

the copula, and so on. The inference options described in this chapter are applied

only to the distributions to be inferred, and ignored for the others.

3.1.1 General inference options (valid for marginals and copula)

The list of general inference options valid for both marginals and copulas in provided in Ta-

ble 1. Each of these options can also be provided for a speciﬁc marginal (resp. copula) under

iOpts['Marginals'][ii]['Inference'] (resp. iOpts['Copula']['Inference']). The

latter two (see Table 5 and Table 9) take priority and overwrite the ﬁrst, in case both are pro-

vided.

Table 1: iOpts['Inference']

Data n-by-M list of lists . n

instances of dimension M

Inference data.

□ Criterion String

default: 'AIC'

Selection criterion (Section 1.3.2)

'ML' Maximum likelihood

'AIC' Akaike information criterion

'BIC' Bayesian information criterion

'KS' Kolmogorov-Smirnov CDF distance.

Valid only for marginals (by default

set to 'AIC' for copulas instead)

□ PairIndepTest Dictionary Speciﬁcations for the pair

independence test (see Table 2).

Concerns copula inference only

□ BlockIndepTest Dictionary Speciﬁcations for the block

independence test (see Table 3).

Concerns copula inference only

Table 2: iOpts['Inference']['PairIndepTest']

Does not need to be speciﬁed if and only if both iOpts['Marginals']['Inference']['Data']

and iOpts['Copula']['Inference']['Data'] or iOpts['Copula']['Inference']['DataU']

are provided instead

UQ[PY]LAB-V1.0-114 - 21 -

UQLAB user manual

□ Alpha Scalar in [0, 1]

default: 0.1

Statistical threshold for the test of

pair independence. Set to 0 or [] to

disable the test.

□ Type String

default: 'Kendall'

Test statistic. One of 'Kendall',

'Spearman', or 'Pearson'

□ Correction String

default: 'auto'

Correction for the number of

pairwise tests. Applies to inference of

vine copulas only:

'none' No statistical correction

'fdr' False discovery rate correction

'Bonferroni' Bonferroni correction

'auto' or '' No correction if M ≤ 8, FDR

correction otherwise

Table 3: iOpts['Inference']['BlockIndepTest']

□ Alpha Scalar in [0, 1]

default: 0.1

Statistical threshold for the test of

block independence. Set to 0 or []

to disable the test.

□ Type String

default: 'Kendall'

Test statistic. One of 'Kendall',

'Spearman', or 'Pearson'

□ Correction String

default: 'auto'

Correction for the number of

pairwise tests:

'none' No statistical correction

'fdr' False discovery rate correction

'Bonferroni' Bonferroni correction

'auto' or '' No correction if M ≤ 8, FDR

correction otherwise

3.1.2 Options for inference of marginal distributions

The detailed list of options available for inference of marginal distributions is reported in the

tables below. For a list of marginal distribution families currently supported in UQ[PY]LAB,

see the UQ[PY]LAB User Manual – the INPUT module, Appendix A.

Table 4: iOpts['Marginals'][ii]

UQ[PY]LAB-V1.0-114 - 22 -

Statistical inference

□ Type String or List of strings

default: 'auto'

As a string , can be

• 'auto': marginal inferred among

all parametric distributions

• the name of any supported or

custom marginal distribution

As a List , contains the name of the

marginal distributions among which

to select the inferred one

□ Inference Dictionary Inference options for marginal ii, see

Table 5. If provided, overwrites the

corresponding options speciﬁed in

iOpts['Inference']

□ Bounds List [a, b],

−∞ ≤ a < b ≤ +∞

Support of truncated distribution

(Section 1.2.3 of the UQ[PY]LAB

User Manual – the INPUT module)

□ Options Dictionary Kernel smoothing options (see

Table 6)

Table 5: iOpts['Marginals'][ii]['Inference']

□

Data List n × 1 Inference data for marginal ii

□ Criterion String

default: 'AIC'

Selection criterion (see

Section 1.3.2)

'ML' Maximum likelihood

'AIC' Akaike information criterion

'BIC' Bayesian information criterion

'KS' Kolmogorov-Smirnov CDF distance

Table 6: iOpts['Marginals'][ii]['Options']

□ Kernel String

default: 'Gaussian'

Kernel shape

'Gaussian' or

'Normal'

'Triangle' or

'Triangular'

'Box'

'Epanechnikov'

□ Bandwidth Float

default: set using Silver-

man’s rule (1.9)

Kernel bandwith

Mandatory if iOpts['Inference']['Data']is not provided

UQ[PY]LAB-V1.0-114 - 23 -

UQLAB user manual

3.1.3 Copula inference options

Options for inference of copulas

The detailed list of options available for inference of copulas is reported in the Tables below.

For a list of copula types currently supported in UQ[PY]LAB, see the UQ[PY]LAB User Manual

– the INPUT module, Section 1.4.

For a list of supported pair copula families, see in particular the UQ[PY]LAB User Manual –

the INPUT module, Table 1.

Table 7: iOpts['Copula']

□ Type String or list of strings

default: 'auto'

As a string , can be

• 'auto': copula inferred among

all supported types

• the name of any supported copula

type

As a List , contains the copula types

among which to perform inference

Inference Dictionary Inference options, see Table 9

Table 8: iOpts['Copula']['Inference']

⊕ Data List of lists n × M Data in the physical space to use for

copula inference

⊕ DataU List of lists n × M Data in the unit hypercube to use for

copula inference

□ Criterion String

default: 'AIC'

Copula selection criterion (see

Section 1.3.2)

'ML' Maximum likelihood

'AIC' Akaike information criterion

'BIC' Bayesian information criterion

□ PairIndepTest Dictionary Speciﬁcations of test for pair

independence (see Table 2)

□ BlockIndepTest Dictionary Speciﬁcations of test for block

independence (see Table 3)

If iOpts['Copula']['Type'] includes any of 'Pair', 'CVine' or 'DVine', or is the string

'auto', the following optional key is supported:

Table 9: iOpts['Copula']['Inference'](For pair or vine copulas)

□ PCfamilies String or list of strings

default: 'auto'

Name(s) or copula types to choose

from. If 'auto', try all supported

parametric families

UQ[PY]LAB-V1.0-114 - 24 -

Statistical inference

If iOpts['Copula']['Type']includes 'CVine' or 'DVine', or is the string 'auto', the

following optional keys are also accepted:

Table 10: iOpts['Copula']['Inference'](for C- and D- vines)

□ CVineStructure String 'auto' or list

(any permutation of

0:M-1)

default: 'auto'

C-vine structure. If 'auto', it is

inferred from data (Section 1.3.2)

□ DVineStructure String 'auto' or list

(any permutation of

0:M-1)

default: 'auto'

D-vine structure. If 'auto', it is

inferred from data (Section 1.3.2)

3.2 Working with inferred inputs

Using an input object whose marginals and/or copula were obtained by inference works

analogously to using any input. Methods such as uq.getSample, uq.print, uq.display,

uq.enrichSobol, etc., work as usual (see the UQ[PY]LAB User Manual – the INPUT module).

The inference options speciﬁed for the individual marginals and/or for the copula can be

retrieved into the subkey InputHat['Marginals'][ii]['Inference'] and

InputHat['Copula']['Inference']. Similarly, various goodness-of-ﬁt measures for the

inferred distribution over the data used for inference are stored under

InputHat['Marginals'][ii]['GoF'] and InputHat['Copula']['GoF'].

UQ[PY]LAB-V1.0-114 - 25 -

References

Aas, K., Czado, C., Frigessi, A., and Bakkend, H. (2009). Pair-copula constructions of multiple

dependence. Insurance: Mathematics and Economics, 44(2):182–198. 4, 7

Akaike, H. (1974). A new look at the statistical model identiﬁcation. IEEE Transactions on

Automatic Control, 19(6):716–723. 6

Applegate, D. L., Bixby, R. E., Chv

atal, V., and Cook, W. J. (2006). The Traveling Salesman

Problem: A Computational Study. NewJersey: Princeton University Press. 7

Dißmann, J., Brechmann, E. C., Czado, C., and Kurowicka, D. (2013). Selecting and estimat-

ing regular vine copulae and application to ﬁnancial returns. Computational Statistics and

Data Analysis, 59:52–69. 8

Joe, H. (2015). Dependence modeling with copulas. CRC Press. 3, 7

Morales-N

apoles, O. (2011). Counting vines. In Kurowicka, D. and Joe, H., editors, De-

pendence Modeling: Vine Copula Handbook, chapter 9, pages 189–218. World Scientiﬁc

Publisher Co. 7

Schwartz, G. (1978). Estimating the dimension of a model. Annals Stat., 6(2):461–464. 6

Shaffer, J. P. (1995). Multiple hypothesis testing. Annual Review of Psychology, 46:561–584.

Silverman, B. W. (1996). Density estimation for statistics and data analysis, volume 26 of

Monographs on Statistics and Applied Probability. Chapman & Hall. 5

Simonoff, J. S. (1996). Smoothing Methods in Statistics. Springer. 5

Stuart, A., Ord, K., and Arnold, S. (1999). Kendall’s advanced theory of statistics Vol. 2A –

Classical inference and the linear model. Arnold, 6th edition. 7