UQLAB USER MANUAL
STATISTICAL INFERENCE
E. Torre, S. Marelli, B. Sudret
CHAIR OF RISK, SAFETY AND UNCERTAINTY QUANTIFICATION
STEFANO-FRANSCINI-PLATZ 5
CH-8093 Z
¨
URICH
Risk, Safety &
Uncertainty Quantification
How to cite UQ[PY]LAB
C. Lataniotis, S. Marelli, B. Sudret, Uncertainty Quantification in the cloud with UQCloud, Proceedings of the 4th
International Conference on Uncertainty Quantification in Computational Sciences and Engineering (UNCECOMP
2021), Athens, Greece, June 27–30, 2021.
How to cite this manual
E. Torre, S. Marelli, B. Sudret, UQLAB user manual – Statistical inference, Report UQ[py]Lab -V1.0-114, Chair of
Risk, Safety and Uncertainty Quantification, ETH Zurich, Switzerland, 2024
BIBT
E
X entry
@TechReport{UQdoc_10_114,
author = {Torre, E. and Marelli, S. and Sudret, B.},
title = {{UQ[py]Lab user manual -- Statistical inference }},
institution = {Chair of Risk, Safety and Uncertainty Quantification, ETH Zurich,
Switzerland},
year = {2024},
note = {Report UQ[py]Lab - V1.0-114}
}
List of contributors:
Name Contribution
A. Hlobilov
´
a Translation from the UQLab manual
Document Data Sheet
Document Ref. UQ[PY]LAB-V1.0-114
Title: UQLAB user manual – Statistical inference
Authors: E. Torre, S. Marelli, B. Sudret
Chair of Risk, Safety and Uncertainty Quantification, ETH Zurich,
Switzerland
Date: 27/05/2024
Doc. Version Date Comments
V0.9 24/02/2023 Initial release
V1.0 27/05/2024 Minor update for UQ[PY]LAB release 1.0
Abstract
UQ[PY]LAB supports the statistical inference of probabilistic models by established inference
tools. In the context of UQ, inference allows one to fit different probabilistic models to input
data, thereby selecting the model that best describes them. The input joint PDF is represented
in UQ[PY]LAB by the marginal distributions of the input variables and their copula (see the
companion UQ[PY]LAB User Manual – the INPUT module). Consequently, UQ[PY]LAB offers
tools to infer both marginal distributions and copulas.
This user manual includes a review of the methods that are used to perform inference for
marginals and different classes of copulas. After introducing the theoretical aspects, an in-
depth example-driven user guide is provided to help users to build INPUT objects by inference.
Finally, a comprehensive reference list of the methods and functions available for inference
in the UQ[PY]LAB INPUT module is given at the end of the manual.
Keywords: Probabilistic Input Model, Marginals, Copula, Inference
Contents
1 Theory 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Distribution fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2.1 Fitting by maximum likelihood . . . . . . . . . . . . . . . . . . . . . . 2
1.2.2 ML estimation in the copula formalism . . . . . . . . . . . . . . . . . . 2
1.2.3 ML estimation for vine copulas . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Model selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3.1 Non-parametric inference . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3.2 Parametric inference and selection criteria . . . . . . . . . . . . . . . . 5
1.3.3 Pair-copula selection for vine copulas . . . . . . . . . . . . . . . . . . . 6
1.4 Separating random variables into mutually independent subgroups (block in-
dependence) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2 Usage 9
2.1 Inference of marginals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.1 Fully automated inference of marginals . . . . . . . . . . . . . . . . . 9
2.1.2 Specifying inference options for each marginal separately . . . . . . . 10
2.1.3 Fitting or fixing parameters for a given family . . . . . . . . . . . . . . 11
2.1.4 Non-parametric inference: kernel smoothing . . . . . . . . . . . . . . . 11
2.1.5 Fitting a custom marginal distribution . . . . . . . . . . . . . . . . . . 12
2.1.6 Goodness of fit of inferred marginals . . . . . . . . . . . . . . . . . . . 12
2.2 Inference of copulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.1 Fully automated inference of copula . . . . . . . . . . . . . . . . . . . 13
2.2.2 Testing for block independence . . . . . . . . . . . . . . . . . . . . . . 13
2.2.3 Specification of inference data for the copula . . . . . . . . . . . . . . 14
2.2.4 Inference among a selected list of copula types . . . . . . . . . . . . . 14
2.2.5 Testing for pair independence . . . . . . . . . . . . . . . . . . . . . . . 15
2.2.6 Inference of pair copulas . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2.7 Inference of Gaussian copulas . . . . . . . . . . . . . . . . . . . . . . . 17
2.2.8 Inference of vine copulas . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2.9 Goodness of fit of inferred copulas . . . . . . . . . . . . . . . . . . . . 18
3 Reference List 19
3.1 Create an Input with inferred marginals and/or copula . . . . . . . . . . . . . 21
3.1.1 General inference options (valid for marginals and copula) . . . . . . . 21
3.1.2 Options for inference of marginal distributions . . . . . . . . . . . . . 22
3.1.3 Copula inference options . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.2 Working with inferred inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Chapter 1
Theory
1.1 Introduction
In applied problems, the probability distribution of uncertain quantities is often not known
and has to be inferred from data. In some cases, expert knowledge or theoretical consid-
erations allow one to assume at least a parametric form of the distribution, and inference
is limited to estimating its parameters. Otherwise, the full shape of the distribution needs
to be assessed, using either parametric or non-parametric inference. Multivariate distribu-
tions pose the additional challenge that, beside the marginal distribution of each component,
their dependence structure, or copula, may also be unknown. This user manual describes
how statistical inference can be performed in UQ[PY]LAB to select, among a broad class of
probabilistic models of marginal and copula distributions, the model that best fits a given
dataset.
To avoid confusion, we stress the distinction between inference and fitting. Fitting is a proce-
dure by which the parameters of a given parametric probability distribution are determined
based on the data, so as to maximize the plausibility of that distribution. Inference is a more
general term that encompasses fitting several probability distributions to the data, and then
selecting the most plausible one. The measures of plausibility used for parameter fitting and
for model selection may differ, and will be introduced below.
This manual focuses on the inference process only, while an extensive description of the prob-
abilistic models supported by the UQ[PY]LAB INPUT module is available in the UQ[PY]LAB
User Manual – the INPUT module.
1.2 Distribution fitting
Statistical inference is a process that typically requires two main steps: fitting a number of
plausible probabilistic models to available data, and comparing them to select the best fitting
one. Here we describe the fitting problem.
1
UQLAB user manual
1.2.1 Fitting by maximum likelihood
Consider M uncertain, real-valued quantities grouped into a random vector X = (X
1
, . . . , X
M
),
for which a probability distribution F
X
(x; θ) with density f
X
is assumed. The distribution
has k real parameters θ = (θ
1
, . . . , θ
k
), k 1, which have to be determined based on a set of
n independent observations X = {
ˆ
x
(1)
, . . . ,
ˆ
x
(n)
} of X .
Various ways exist to estimate the parameters θ. By far the most common maximum likeli-
hood (ML) estimation. The ML estimator of θ based on the data set X is defined as
ˆ
θ = arg max
θ
n
Y
h=1
f
X
(
ˆ
x
(h)
; θ)
def
= arg max
θ
L(θ). (1.1)
The rationale is to seek the parameters
ˆ
θ that maximize the value of the joint probability of
the observations in X , namely their likelihood. The term L(θ) in Eq. (1.1) is a function of θ
only, rather than of x, and is known as the likelihood function. Maximizing the likelihood in
(1.1) is equivalent to solving
ˆ
θ = arg max
θ
log (L(θ)) = arg max
θ
n
X
h=1
log(f
X
(
ˆ
x
(h)
; θ)). (1.2)
Working with the sum of logarithms is often easier, both analytically and numerically, due to
the inherently small values of f
X
, as well as the widespread occurrence of exponential forms
in commonly used PDFs, and therefore preferred. The quantity
log(L) =
n
X
h=1
log(f
X
(
ˆ
x
(h)
;
ˆ
θ)) (1.3)
is the total log-likelihood of f
X
(·;
ˆ
θ) on the data X . It is a relative measure of goodness of fit
of f
X
on the data, and it is often used in model selection to identify the distribution that best
fits the data (see Section 1.3.2).
ML solutions to the problem (1.2) are available analytically for some distributions. In the
general case, various numerical algorithms exist to find approximate solutions. For instance,
gradient methods start from an initial guess θ
0
of θ and then iteratively move away from it
towards the direction where the sum of log-likelihoods in (1.3) increases.
Note that not all probability distributions admit a finite or even a unique solution. For in-
stance, truncated univariate distributions (see the UQ[PY]LAB User Manual the INPUT mod-
ule) do not in the general case, even when their support is known. In UQ[PY]LAB, these
cases are handled separately by imposing bounds on the parameter estimates. The bounds
can be specified as specific values, or as functions of the inference data.
1.2.2 ML estimation in the copula formalism
In UQ[PY]LAB, an M-variate distribution F
X
is generically specified in terms of a set of M
univariate distributions F
X
1
, . . . , F
X
M
, which describe the marginal behavior of X
1
, . . . , X
M
,
UQ[PY]LAB-V1.0-114 - 2 -
Statistical inference
and a copula C
X
that defines their mutual dependence:
F
X
(x) = C
X
(F
X
1
(x
1
), F
X
2
(x
2
), ..., F
X
M
(x
M
)) (1.4)
More details can be found in Section 1.3 of the UQ[PY]LAB User Manual the INPUT mod-
ule. In particular, as discussed in Section 1.3.2 of the same document, C
U
is also the joint
distribution of the random vector
U = (F
X
1
(X
1
), . . . , F
X
M
(X
M
)).
Fitting F
X
on X thus involves the following steps:
1. Separately fit the parameters of F
X
i
on X
i
= {x
(1)
i
, . . . , x
(n)
i
} to obtain the fitted marginal
ˆ
F
X
i
, i = 1, . . . , M;
2. Transform the data set X into the set of pseudo-observations U = {
ˆ
u
(1)
, . . . ,
ˆ
u
(n)
}, where
ˆ
u
(h)
:= (
ˆ
F
X
1
(x
(h)
1
), . . . ,
ˆ
F
X
M
(x
(h)
M
)); (1.5)
3. Fit the parameters of C
X
on U to obtain the fitted copula
ˆ
C
X
.
1.2.3 ML estimation for vine copulas
A case that deserves a separate discussion is that of vine copulas (for more details, see
UQ[PY]LAB User Manual the INPUT module, Section 1.4.4). From a theoretical stand-
point, vine copulas are multivariate distributions (Joe, 2015). As such, their parameters can
be estimated by maximum likelihood numerically. Nevertheless, vine copulas typically com-
prise a large number of parameters: an M-dimensional vine is obtained as the tensor product
of M(M 1)/2 pair copulas, each typically entailing one or more parameters. Solving the
maximum likelihood maximization problem (1.2) in high dimension is impractical and time
consuming.
The very structure of a vine copula, however, suggests a considerably faster way to find an
approximate solution. Rather than globally optimize the vector θ of all vine parameters, one
can estimate the parameters of each pair copula separately, starting from the pair copulas
in the first tree of the vine (the unconditional pair copulas). Specifically, if U is the set of
pseudo-observations defined in (1.5), a sequential (rather than global) estimation approach
for vine copula parameters encompasses the following steps:
1. For each unconditional pair copula C
ij
, which couples X
i
and X
j
, estimate its parame-
ters θ
ij
from X
ij
= {(ˆx
(h)
i
, ˆx
h
j
), h = 1, . . . , n}, thereby obtaining
ˆ
C
ij
;
2. For each conditional pair copula C
jl |i
, which couples X
j
and X
l
conditioned on X
i
,
introduce the simplifying assumption that it depends on U
i
only through the argument
F
j|i
. Under this assumption:
UQ[PY]LAB-V1.0-114 - 3 -
UQLAB user manual
(a) Derive the univariate conditional distributions
ˆ
C
j|i
(u
j
|u
i
) and
ˆ
C
l|i
(u
l
|u
i
), where
ˆ
C
j|i
(u
j
|u
i
) :=
ˆ
C
ij
(u, v)
u
(u,v)=(u
i
,u
j
)
.
ˆ
C
j|i
is the CDF of U
j
conditioned on U
i
;
(b) Obtain the set of conditional pseudo-observations U
jl |i
= {(ˆu
(h)
j|i
, ˆu
(h)
l|i
), h = 1, . . . , n},
where
ˆu
(n)
j|i
:=
ˆ
C
j|i
(ˆu
(h)
j
|ˆu
(h)
i
);
(c) Fit the parameters of C
jl |i
on U
jl |i
to obtain the fitted copula
ˆ
C
jl |i
.
3. Iterate to fit the pair-copulas in deeper trees of the vine, that is, with more conditioning
variables.
This general procedure needs to be adapted to the specific set of pair copulas that populate
the vine. Algorithms for ML estimation of C- and D-vines were originally proposed by Aas
et al. (2009), and are implemented in UQ[PY]LAB.
Note that in the general case sequential fitting does not yield the global solution to prob-
lem (1.2). Nevertheless, it typically provides a sufficiently close approximation thereof, at a
considerably reduced computational cost.
1.3 Model selection
Inference of a probabilistic model from data typically has to deal with the problem that
the latter are not associated to a prescribed parametric probability distribution. In such
circumstance, the question arises about which distribution should be fitted to the data. In
practice, this problem can be dealt with by various approaches, briefly discussed below.
1.3.1 Non-parametric inference
Non-parametric inference assigns to the data a parameter-free model, rather than fitting a
parametric one. An example of non-parametric model is the empirical CDF
H
X
(x) =
1
n
n
X
h=1
{
ˆ
x
(h)
x}
, (1.6)
where x
x if and only if x
i
x
i
for each i = 1, . . . , M . In words, H
X
has a jump of +1/n
at each observation in X . The corresponding probability mass function is a normalized sum
of Dirac delta functions, each centered at one observation:
h
X
(x) =
1
n
n
X
h=1
δ
{
ˆ
x
(h)
x}
. (1.7)
Another example of non-parametric inference is kernel density estimation (KDE), also re-
ferred to as kernel smoothing. KDE replaces the delta functions of the empirical PDF with
UQ[PY]LAB-V1.0-114 - 4 -
Statistical inference
a non-negative function (kernel) k(·). The expression for the univariate KDE, often used for
inference of marginal distributions, reads
ˆ
f
X
(x) =
1
nw
n
X
h=1
k
||ˆx
(h)
x||
w
!
. (1.8)
The kernel is a decaying function of the distance between the point x where the distribution
is evaluated and the point ˆx X where the kernel is centered. The parameter w, called
the bandwidth, dictates how fast
ˆ
f
X
decays with ||x ˆx||. A common choice for the kernel
is the standard Gaussian distribution. When the size n of the data set is large enough(e.g..,
n 100), the choice of the kernel function has a negligible effect on the final KDE estimate,
provided that the kernel bandwidth w is properly selected according to n. A common choice
when using the Gaussian kernel is Silverman’s rule (Silverman, 1996)
w =
4ˆσ
5
3n
1
5
1.06ˆσn
1/5
, (1.9)
where ˆσ is the sample standard deviation of the data. This choice minimizes the mean
integrated square error, but should be used with care. Indeed, it may yield widely inaccurate
(for instance, strongly over-smoothed) estimates if the data come from a strongly skewed or
multimodal distribution. Finally, KDE generalizes to the multivariate case (Simonoff, 1996),
which is here omitted for simplicity.
Non-parametric inference typically produces models that fit the training data X very well,
in the sense that their total log-likelihood (1.3) is high. However, these models exhibit poor
prediction power, because they concentrate all probability mass at (for the empirical CDF)
or around (for KDE) the observed points, thereby assigning negligible probability elsewhere
(overfitting).
1.3.2 Parametric inference and selection criteria
Parametric inference produces probability distributions that are less flexible than empirical
or kernel smoothing ones, as they have a fixed analytical expression that is described by just
a few parameters. The parameters of any such probability distribution are fitted to the data
as described in Section 1.2. The question then becomes which of the many known probability
distributions to fit to the data. At least in low dimension (M = 1 or 2), a plot of the data or
other considerations may help to discard a priori some families of probability distributions
and retain others. For instance, univariate data with a semi-finite support or with a multi-
modal histogram cannot have originated from a Gaussian distribution, while very skewed
data cannot come from a symmetric distribution.
Once a set of possible families of parametric distributions has been determined, the family
that best describes the data must be selected. Various selection criteria exist to this end. The
following applies to both marginal distributions and copulas, unless stated otherwise.
The maximum likelihood criterion selects the fitted distribution with the highest log-likelihood
UQ[PY]LAB-V1.0-114 - 5 -
UQLAB user manual
score (see Eq. (1.3)). This choice, however, tends to favor distributions with a larger number
of parameters, which more easily adapt to the data but possibly lead to overfitting. To avoid
overfitting, a penalty term on the number of model parameters can be introduced.
The Akaike Information Criterion (AIC; Akaike, 1974) selects the distribution which mini-
mizes the quantity
AIC = 2k 2 log(L), (1.10)
where k is the number of model parameters.
An even stronger penalization by the number of distribution parameters is the Bayesian In-
formation Criterion (BIC; Schwartz, 1978), which minimizes
BIC = log(n)k 2 log(L), (1.11)
where n is the number of data points used to fit the distribution.
Sometimes, maximizing the total likelihood (with or without penalization) produces a prob-
ability distribution with a substantial peak centered where most data points accumulate, and
too little probability mass elsewhere. An example of this behavior is shown in Figure 1. A set
of 1 000 data points is drawn from a Gaussian distribution truncated to the left (µ = 0, σ = 1,
support [1, +); true distribution and histogram of the data shown in the left panel). Five
different families are then fitted to the data: Gaussian, truncated Gaussian, Weibull, Gamma,
and Lognormal distributions. For the latter four, the truncation interval is specified as the
true one, [1, +). The Gamma distribution yields the lowest AIC, despite visually exhibiting
the largest deviation from the histogram of the data. In this case, one would intuitively want
to select the distribution which most closely follows the data histogram.
This intuition is formalized by the Kolmogorov-Smirnov distance (K-S) criterion. The criterion
selects the family whose cumulative distribution has the lowest maximum distance from the
empirical CDF of the data (1.6), that is, which minimizes
d
KS
= max
xR
M
|F
X
(x) H(x)|. (1.12)
In the example shown in Figure 1, right panel, the K-S criterion would select the Lognormal
distribution (cyan) among those illustrated in the figure.
Currently, the K-S criterion in supported in UQ[PY]LAB for univariate distributions only.
1.3.3 Pair-copula selection for vine copulas
Selecting the vine copula that best fits a data set X , or its counterpart U defined in (1.5),
among all existing vines of dimension M, involves the following steps:
1. Selecting a vine structure (for C- and D-vines: selecting the order of the nodes);
2. Selecting the parametric family of each pair copula;
3. Fitting the pair copula parameters to U.
UQ[PY]LAB-V1.0-114 - 6 -
Statistical inference
0 1 2 3 4
x
0
0.5
1
1.5
2
f
X
(x)
Data
true pdf
0 1 2 3 4
x
0
0.5
1
1.5
2
f
X
(x)
Data
true pdf
Gaussian
Gaussian
Gamma
Weibull
LogNormal
Figure 1: Example of inference on a dataset generated from a truncated Gaussian distribu-
tion. The left panel shows the normalized histogram of the data and the true PDF. The right
panel shows different parametric distributions fitted to the data. The Lognormal distribution
(cyan) would be selected by the Kolmogorov-Smornov (KS) criterion.
Steps 1-2 form the representation problem (for more details on vine copula representation
in UQ[PY]LAB, please refer to the UQ[PY]LAB User Manual – the INPUT module). Concern-
ing step 3, algorithms to compute the likelihood of C- and D-vines (Aas et al., 2009) and of
R-vines (Joe, 2015) given a data set U exist, enabling parameter fitting based on maximum
likelihood. In principle, the vine copula that best fits the data may be determined by iterat-
ing maximum likelihood fitting over all possible vine structures and all possible parametric
families of comprising pair copulas. In practice, however, this approach is computationally
infeasible in even moderate dimension M due to the large number of possible structures
(Morales-N
´
apoles, 2011) and of pair copulas comprising the vine. Aas et al. (2009) therefore
suggested a different, faster approach which first solves step 1 separately. The optimal vine
structure is selected heuristically so as to capture first the pairs (X
i
, X
j
) with the strongest
dependence (which then fall in the upper trees of the vine). Kendall’s tau (Stuart et al., 1999)
is the measure of dependence of choice, defined by
τ
ij
= P((X
i
˜
X
i
)(X
j
˜
X
j
) > 0) P((X
i
˜
X
i
)(X
j
˜
X
j
) < 0), (1.13)
where (
˜
X
i
,
˜
X
j
) is an independent copy of (X
i
, X
j
). If the copula of (X
i
, X
j
) is C
ij
, then
τ
K
(X
i
, X
j
) = 4
Z Z
[0,1]
2
C
ij
(u, v)dC
ij
(u, v) 1. (1.14)
For a C-vine, ordering the variables X
1
, . . . , X
M
in decreasing order of dependence corre-
sponds to selecting the central node in tree T
1
as the variable X
i
1
which maximizes
P
j̸=i
1
τ
i
1
j
,
then the node of tree T
2
as the variable X
i
2
which maximizes
P
j /∈{i
1
,i
2
}
τ
i
2
j
, and so on. For a
D-vine, this means ordering the variables X
i
1
, X
i
2
, . . . , X
i
M
in the first tree so as to maximize
P
M1
k=1
τ
i
k
i
k+1
, which can be solved as an open traveling salesman problem (OTSP; Applegate
et al., 2006). UQ[PY]LAB solves the OTSP problem looping through all paths if M 9,
and stochastically using a genetic algorithm otherwise. An algorithm to find the optimal
UQ[PY]LAB-V1.0-114 - 7 -
UQLAB user manual
structure for more general regular vines has been proposed by Dißmann et al. (2013). For
a selected vine structure, the corresponding pair copulas are inferred following a sequential
approach analogous to the one outlined in Section 1.2.3: each pair copula is selected among
the allowed families separately, thereby avoiding to compare all combinations. Any of the
selection criteria illustrated in Section 1.3.2 can be used.
Finally (and optionally), one can keep parametric expression of the vine determined through
the sequential approach, and perform global likelihood maximization. Note that this step
consists in the search for a maximum in a k-dimensional space, where k is the (usually large)
total number of parameters of the vine, and may therefore be extremely time consuming
while improving the quality of the fit only marginally.
1.4 Separating random variables into mutually independent sub-
groups (block independence)
A useful preprocessing step for data from a multivariate sample consists in determining
groups of random variables that exhibit inter-group independence. That is, taken any pair
(X
1
, X
2
) of random variables such that X
1
and X
2
belong to different groups, X
1
and X
2
are mutually independent. Once these groups, or blocks, have been identified by means of
an appropriate block independence test, their copula can be expressed as the tensor product
of the copulas of the individual subgroups (see the UQ[PY]LAB User Manual the INPUT
module, Eq. (1.8)).
The copula of each subgroup can be then inferred from data as explained below. Of course,
the block independence test can be skipped and an individual copula can be inferred from the
full dataset for the entire input random vector. However, inferring lower-dimensional copulas
for independent subgroups separately is typically computationally cheaper, and leads to more
interpretable dependence structures.
UQ[PY]LAB-V1.0-114 - 8 -
Chapter 2
Usage
2.1 Inference of marginals
The following sections describe how to infer marginal distributions on available data. All
examples refer to a univariate problem, which requires the user to specify a single marginal
iOpts['Marginals'][0]. Multivariate cases can be handled analogously by specifying the
additional marginals iOpts['Marginals'][1], iOpts['Marginals'][2], and so on, con-
sistently with the basic UQ[PY]LAB INPUT syntax (see UQ[PY]LAB User Manual – the INPUT
module). All marginals are treated separately.
All UQ[PY]LAB options to perform inference of marginal distributions on data are listed in
Section 3.1 (see Table 4 and subsequent ones).
2.1.1 Fully automated inference of marginals
The most extensive (least informed) form of statistical inference consists in finding the dis-
tribution family that best represents the data, and the parameters of such distribution that
best fit the data. Given a data set X, inferring the family of each marginal distribution across
all families supported in UQ[PY]LAB, and fitting its parameters, is possible with the code
iOpts = {
'Inference': {
'Data': X
},
'Copula': {
'Type': 'Independent'
}
}
InputHat = uq.createInput(iOpts)
Note that the matrix X must be of type list. The user can use X.tolist() to convert the
data elements of an array into a proper type. For each column i in the matrix X, UQ[PY]LAB
loops across all supported univariate distributions, fits them all to the data in that column,
and assigns the selected distribution to the i-th marginal of InputHat. In particular:
The optimal parameters for each family are obtained by likelihood maximization, that
is, by solving (1.1) or the equivalent problem (1.2)
9
UQLAB user manual
The distribution family that best fits the data is selected according to the default Akaike
information criterion (AIC, see (1.11)). Different criteria can be specified through the
key iOpts['Inference']['Criterion'] (see Table 1);
Only supported parametric families are considered by default (no KDE). For a list of
supported families and their properties, see the companion UQ[PY]LAB User Manual –
the INPUT module, Appendix A;
UQ[PY]LAB automatically discards those parametric families that are incompatible with
the data (for instance, distributions with positive support for data with negative ele-
ments);
In the example above, the copula is set to be the independence copula. If the copula is
not specified, it is also inferred among all supported copula types (see Section 2.2).
The explicit, but in this case superfluous, code for automatic inference of the marginals on
data X with M columns would be to set all marginal types to 'auto':
iOpts['Marginals'] = [{'Type': 'auto'} for i in range(M)]
Note: For the inference of the marginals, UQ[PY]LAB uses the Matlab built-in function
fitdist. During the fitting process, this function might generate warnings. This
should not impact the quality of the resulting fit as long as the inferred distribu-
tion type did not generate any warnings.
2.1.2 Specifying inference options for each marginal separately
In the above examples, the (implicit) inference options specified in iOpts['Inference']
are identical for all marginals. Furthermore, the set of families considered for inference is
'auto' for all marginals. UQ[PY]LAB also allows one to specify each inference option
separately for each marginal. For instance,
iOpts['Marginals'][i]['Inference'] = {'Criterion': 'KS'}
overwrites the default inference criterion 'AIC' with 'KS' (Kolmogorov-Smirnov distance)
for marginal i.
Analogously, the data to be used for inference can be assigned to each marginal field sepa-
rately, as a one-dimensional array:
iOpts['Marginals'][i]['Inference']['Data'] = x_new
iOpts['Marginals'][i]['Inference'] accepts the same keys as iOpts['Inference'],
but the options specified in the former are specific to one marginal only, and have priority
over those specified in the latter. Note that the matrix x_new must be of type list. The user
can use x_new.tolist() to convert the data elements of an array into a proper type.
The distribution families to be tested can also be assigned explicitly for each marginal, as a
list of strings (one per distribution family):
iOpts['Marginals'][i]['Type'] = ['Lognormal', 'Exponential', 'Weibull']
UQ[PY]LAB-V1.0-114 - 10 -
Statistical inference
All marginals for which a specific key is not assigned explicitly are assigned default values.
Using the three lines of code above, each j-th marginal, j ̸= i, will be inferred among all sup-
ported families (as with the implicit code iOpts['Marginals'][j]['Type'] = 'auto')
on the data X[:,j] specified in iOpts['Inference']['Data'] (if any), using the AIC cri-
terion (iOpts['Marginals'][j]['Inference']['Criterion'] = 'AIC').
Finally, inference works for all distributions, including user-defined ones (see Section 2.1.5)
and truncated distributions. For instance, to specify that marginal 1 must be inferred as a
truncated distribution in the interval [a, b], write
iOpts['Marginals'][1]['Bounds'] = [a, b]
Infinite values for the bounds are accepted, e.g., float('inf') for positive infinity.
Note: Note that it is not currently possible in UQ[PY]LAB to infer the support bounds
of a truncated distribution. The only exception is the uniform distribution, the
parameters of which are also its bounds.
2.1.3 Fitting or fixing parameters for a given family
The simplest form of parametric inference consists in fitting the parameters of a given prob-
abilistic input model. The following code sets Marginals[0] as a Gaussian distribution with
parameters obtained by likelihood maximization:
iOpts['Marginals'][0]['Type'] = 'Gaussian'
Again, all marginals can be assigned different Type(s). Those whose Type is not specified are
assigned the default 'auto' (inference performed among all supported parametric families).
Following the same scheme, any marginal can be fully specified by assigning its Parameters
(or alternatively its Moments), as in
iOpts['Marginals'][0]['Parameters'] = [0, 1]
If so, no inference takes place for that marginal.
2.1.4 Non-parametric inference: kernel smoothing
Kernel smoothing (1.8) can be added to the list of marginal types to be considered for infer-
ence by
iOpts['Marginals'][i]['Type'] = ['ks', ...]
If kernel smoothing is the only distribution type desired, one can simply write
iOpts['Marginals'][i]['Type'] = 'ks'
In this case, inference reduces to setting the marginal’s parameters to the observations, used
as centers of the kernels. This can be done explicitly, thus avoiding inference, by setting
iOpts['Marginals'][i]['Parameters'] = x_i
UQ[PY]LAB-V1.0-114 - 11 -
UQLAB user manual
were x_i are the observations from marginal i (see also the UQ[PY]LAB User Manual – the
INPUT module, Section 2.1.1.2).
The kernel type and bandwidth can be set explicitly in the ['Marginals'][i]['Options']
criteria. For instance, the code below uses a triangular kernel, with a bandwidth equal to 0.1:
iOpts['Marginals'][i]['Options'] = {
'Kernel': 'Triangular',
'Bandwidth': 0.1
}
The default kernel type is Gaussian, and the default kernel bandwidth is set using Silverman’s
rule (1.9). For a list of supported values, see Table 6.
2.1.5 Fitting a custom marginal distribution
UQ[PY]LAB does not support custom marginals.
2.1.6 Goodness of fit of inferred marginals
Once inference is completed and the corresponding INPUT object InputHat has been gener-
ated, a summary of the inference results for each marginal i can be found under
InputHat['Marginals'][i]['GoF']. The latter is a dictionary that contains various good-
ness of fit measures for each distribution that has been fitted to the data. For instance, if the
exponential distribution was fitted to the data, then
InputHat['Marginals'][i]['GoF']['Exponential']
is a dictionary with the following keys:
LL: the distribution’s log-likelihood;
AIC: the distribution’s AIC;
BIC: the distribution’s BIC;
FittedMarginal: the distribution type and its parameters;
KSstat: the distribution’s K-S statistic;
KSpvalue: the p-value of the Kolmogorov-Smirnov test, which represents the approxi-
mate probability that the data come from the specified (in this case, exponential) dis-
tribution.
Note: InputHat['Marginals'][i]['GoF'] contains such information for each distri-
bution that has been fitted during the inference process, not only for the distri-
bution that was eventually selected as the best-fitting one.
2.2 Inference of copulas
UQ[PY]LAB currently supports five types of copulas: the independent copula, pair copulas,
the multivariate Gaussian copula, and two classes of vine copulas: the C-vine and the D-
UQ[PY]LAB-V1.0-114 - 12 -
Statistical inference
vine. More details on theoretical aspects and properties of these copulas can be found in
the companion UQ[PY]LAB User Manual the INPUT module. All UQ[PY]LAB options to
perform copula inference are listed in Section 3.1 (see Table 7 and subsequent ones).
(See also: uq.inferCopula).
2.2.1 Fully automated inference of copula
Inference among all copulas (and marginals) supported in UQ[PY]LAB can be done by
iOpts['Inference'] = {
'Data': X
}
InputHat = uq.createInput(iOpts)
Note that the matrix X must be of type list. The user can use X.tolist() to convert the
data elements of an array into a proper type. The above code is equivalent to additionally
specifying (before the uq.createInput command):
iOpts['Copula']['Type'] = 'auto'
Specifically:
if X is an list with only 1 column, no copula exists (the problem is univariate);
if X has 3 or more columns, a block independence test is first performed to identify
possible subgroups of random variables being mutually independent (see Section 1.4);
a copula is fitted to each subgroup.
Note: The block independence test can be skipped; in this case, a single subgroup
is identified, being equivalent to full input random vector.
For M = 2, the block independence test reduces to a pair-independence test
for each subgroup X_i:
if X_i has 2 columns, a pair independence test is first performed. If the test is
passed, the two variables are assigned the independence pair copula; otherwise,
all other supported pair copula families are fitted to the corresponding data.
if X_i has three or more columns, pair copulas are excluded and all other copula
types are fitted instead (for exactly three columns, the class of D-vines is also
excluded as it is equivalent to the class of C-vines). For default inference options
assigned to specific copula types, see the next sections.
the best fitting copula is retained.
2.2.2 Testing for block independence
By default, data with dimension 3 or larger are tested for block independence before para-
metric inference actually takes place. The test divides the data into inter-independent blocks,
UQ[PY]LAB-V1.0-114 - 13 -
UQLAB user manual
and copula inference is performed on each block separately. The final copula is the product
of the individual copulas inferred for each block.
Two sets of random variables form two separate blocks if and only if each variable in one
block is independent of each variable in the other block. Thus, block independence consists
of a number of tests of pairwise independence. The code
iOpts['Inference']['BlockIndepTest'] = {
'Alpha': 0.05,
'Type': 'Kendall',
'Correction': 'none'
}
instructs each pair independence test to be performed using the Kendall’s tau statistic, at a
significance threshold α = 0.05, using no statistical correction for the number of pairwise
tests being performed in total. These are also the default test options. The list of supported
values is illustrated in Table 3.
To turn off the block independence test, simply type
iOpts['Inference']['BlockIndepTest']['Alpha'] = 0
Also notice that the block independence test options can be provided as copula inference
options rather than general options, such as,
iOpts['Copula']['Inference'] = {
'BlockIndepTest':
{
'Alpha': 0.05
}
}
If specified in this way, the options under iOpts['Inference']['BlockIndepTest']are
ignored.
2.2.3 Specification of inference data for the copula
The data for copula inference can be specified through either one of the following fields:
iOpts['Inference']['Data']. In this case, the data are used for inference of both
the marginals and the copula. For the copula, they are transformed into pseudo-
observations in [0, 1]
M
through (1.5);
iOpts['Copula']['Inference']['Data'], equivalent to the former;
iOpts['Copula']['Inference']['DataU'], if the data are pseudo-observations in
[0, 1]
M
, obtained from data in the physical space through (1.5).
The latter two are mutually exclusive and cannot be both assigned.
2.2.4 Inference among a selected list of copula types
Inference among a selected list of copula types can be performed by specifying a list of
copula types, such as
UQ[PY]LAB-V1.0-114 - 14 -
Statistical inference
iOpts['Copula']['Type'] = ['DVine', 'CVine']
Each copula type may involve additional mandatory or optional subkeys of Inference, as
described in the next sections and summarized in Table 9-Table 10.
The code
iOpts['Copula']['Type'] = 'DVine'
instead limits inference to a specific copula type (here, D-vines).
2.2.5 Testing for pair independence
The independence copula C(u, v) = uv assigns two random variables statistical indepen-
dence. One may therefore expect that inference would assign two independent random
variables the independent copula. However, this is not necessarily the case. Due to random-
ness in the data, inference based solely on (penalized) likelihood maximization criteria may
select a different parametric family. For 'Pair' copulas and vines ('CVine', 'DVine'), it
is possible to perform statistical testing to address this issue directly. This can be done by
specifying the key iOpts['Inference']['PairIndepTest']or, equivalently,
iOpts['Copula']['Inference']['PairIndepTest'](the former is ignored if the latter is
provided). For instance, the code
iOpts['Copula']['Inference'] = {
'PairIndepTest': {
'Type': 'Kendall',
'Alpha': 0.05
}
}
instructs UQ[PY]LAB to perform a classical independence test based on Kendall’s tau prior
to inferring the copula, setting the significance threshold to α = 0.05. If the resulting p-value
is larger than α, then the null hypothesis of independence is accepted and the independence
copula is selected. Otherwise, inference proceeds as usual. For vines, it is also possible
to correct the threshold by the total number of tests performed. For instance, Bonferroni
correction sets the effective significance threshold of a set of T independent tests performed
over the same data, each with threshold α, to α/T (Shaffer, 1995). It can be requested by
iOpts['Copula']['Inference']['PairIndepTest']['Correction'] = 'Bonferroni'
The option is ignored for 'Pair' copulas, for which a single test is performed.
For an extensive list of all the available options for tests of pair independence, see Table 2.
2.2.6 Inference of pair copulas
Pair copulas describe the dependence among two random variables. As such, their inference
requires bivariate data to be performed on. In this section, the data X used for inference is
an array with two columns. The resulting INPUT object is bivariate, with two marginals and
a two-dimensional copula among them.
UQ[PY]LAB-V1.0-114 - 15 -
UQLAB user manual
The following code infers the copula as a pair copula among all supported pair copula fam-
ilies (the marginals are also inferred, among all supported marginals, as described in Sec-
tion 2.1.1):
iOpts = {
'Inference': {
'Data' = X
},
'Copula': {
'Type': 'Pair'
}
}
InputHat = uq.createInput(iOpts)
The different lines of code instruct UQ[PY]LAB that the input copula
1. is to be obtained by inference on data X,
2. as a pair copula,
3. selecting among all supported pair copula families. This can be made explicit by the
(redundant) declaration (to be made before the uq.createInput command):
iOpts['Copula']['Inference']['PCfamilies'] = 'auto'
4. trying any rotation of the pair copula density: 0, 90, 180 and 270
(or only 0 and 90
for families with a symmetric density with respect to the main diagonal, that is, such
that c(u, v) = c(v, u) for all u, v [0, 1]
2
).
A selected list of pair copula families can be specified as a list of strings , one per family
name, such as
iOpts['Copula']['Inference']['PCfamilies'] = ['Gaussian', 'Frank', 'Clayton']
To infer the copula, UQ[PY]LAB transforms the data X into pseudo-observations U in the unit
hypercube [0, 1]
M
through (1.5). This and alternative supported ways to specify inference
data for copulas are illustrated in 2.2.3.
It is possible to infer the copula on data X
new
that are not used for inference of the marginals,
by typing
iOpts['Copula']['Inference']['Data'] = Xnew
In this case, iOpts['Inference']['Data']is ignored for copula inference, if existing.
Similarly, pseudo-observations U can be directly provided for copula inference by
iOpts['Copula']['Inference']['DataU'] = U
X and U are n × 2 arrays: each column contains observations of one random variable.
The default selection criterion is the AIC. It can be changed to the BIC or the ML ones by
iOpts['Copula']['Inference']['Criterion'] = 'BIC' # or 'ML'
Note that this will overwrite the criterion possibly specified in iOpts['Inference']['Criterion'].
UQ[PY]LAB-V1.0-114 - 16 -
Statistical inference
2.2.7 Inference of Gaussian copulas
The following code fits the parameters of a Gaussian copula:
iOpts['Copula']['Type'] = 'Gaussian'
The parameters of the Gaussian copula are obtained as the Spearman’s correlation coeffi-
cients between all pairs of random variables.
2.2.8 Inference of vine copulas
The following code infers a C-vine copula with fixed structure 3 1 2 (that is, comprising
pair copulas C
3,1
, C
3,2
and C
1,2|3
) on a 3-dimensional data set X. The pair copulas are inferred
among the Frank, Gaussian, and t- families only:
iOpts['Copula'] = {
'Type': 'CVine',
'Inference': {
'CVineStructure': [[3,1,2]],
'PCFamilies': ['Gaussian', 'Gumbel', 't']
}
}
The vine structure, in particular, determines which pair copulas are to be explicitly part of
the vine, and therefore need to be inferred. In this case, the vine comprises the copulas C
3,1
between X
3
and X
1
, C
3,2
between X
3
and X
2
, and C
1,2|3
between X
1|3
and X
2|3
. C
3,1
is
inferred from the data X[:,2] and X[:,0], and so on.
It is possible to retrieve this information before performing inference by using the function
uq.CopulaSummary (see also Section 3.8.5 of the UQ[PY]LAB User Manual – the INPUT mod-
ule):
uq.CopulaSummary('CVine', [3, 1, 2])
which returns the output
CVine, dimension 3, structure [3 1 2]. Pair copulas:
Index | Pair Copula
==================================
1 | C_3,1
2 | C_3,2
3 | C_1,2|3
A D-vine could be inferred analogously by specifying
iOpts['Copula'] = {
'Type': 'DVine',
'Inference': {
'DVineStructure': <list>
}
}
As for the 'Pair' copula case, vine copulas admit inference among a selected list of families,
rather than all supported ones (see Table 9). For instance, one can specify
iOpts['Copula']['Inference']['PCfamilies'] = ['t', 'Frank', 'Gumbel']
UQ[PY]LAB-V1.0-114 - 17 -
UQLAB user manual
Typically, the vine structure is also unknown and needs to be inferred (see Section 1.3.3).
This is done, for a C-vine, by specifying
iOpts['Copula']['Inference']['CVineStructure'] = 'auto'
and analogously for a D-vine. If the structure is not specified, it is set to 'auto' by default.
Finally, pair independence tests may be performed prior to solving the inference problem, as
illustrated in Section 2.2.5.
Once a vine copula myVine has been inferred from data, its graphical structure can be visu-
alized by the code
uq.display(myVine, show_vine=True)
Information on the constituing pair copulas can be printed on screen by
uq.CopulaSummary(myVine)
2.2.9 Goodness of fit of inferred copulas
Once inference is completed and the corresponding INPUT object InputHat has been gen-
erated, a summary of the results can be found in the key InputHat['Copula']['GoF'].
The latter is a Dictionary that contains various goodness of fit measures for each copula that
has been fitted to the data. For instance, if the 'Gaussian' copula was fitted to the data,
then InputHat['Copula']['GoF']['Gaussian'] is a Dictionary object with the following
keys:
LL: the copula’s total log-likelihood over the data used for inference;
AIC: the copula’s AIC;
BIC: the copula’s BIC;
Note: InputHat['Copula']['GoF'] contains such information for the best fit-
ting copula (the one whose parameters maximize the likelihood func-
tion over the inference data) of each type selected for inference in
iOpts['Copula']['Inference']['Types'], not only for the type that was
eventually selected as the overall best.
UQ[PY]LAB-V1.0-114 - 18 -
Chapter 3
Reference List
How to read the reference list
Python dictionaries play an important role throughout the UQ[PY]LAB syntax. They offer
a natural way to semantically group configuration options and output quantities. Due to
the complexity of the algorithms implemented, it is not uncommon to employ nested dictio-
naries to fine-tune the inputs and outputs. Throughout this reference guide, a table-based
description of the configuration dictionaries is adopted.
The simplest case is given when a value of a dictionary key is a simple value or a list:
Table X: Input
Name String A description of the field is put here
which corresponds to the following syntax:
Input = {
'Name' : 'My Input'
}
The columns, from left to right, correspond to the name, the data type and a brief description
of each key-value pair. At the beginning of each row a symbol is given to inform as to whether
the corresponding key is mandatory, optional, mutually exclusive, etc. The comprehensive
list of symbols is given in the following table:
Mandatory
Optional
Mandatory, mutually exclusive (only one of
the keys can be set)
Optional, mutually exclusive (one of them
can be set, if at least one of the group is set,
otherwise none is necessary)
When the value of one of the keys of a dictionary is a dictionary itself, a link to a table that
describes the structure of that nested dictionary is provided, as in the case of the Options
key in the following example:
19
UQLAB user manual
Table X: Input
Name String Description
Options Table Y Description of the Options
dictionary
Table Y: Input['Options']
Key1 String Description of Key1
Key2 Double Description of Key2
In some cases, an option value gives the possibility to define further options related to that
value. The general syntax would be:
Input = {
'Option1' : 'VALUE1',
'VALUE1' : {
'Val1Opt1' : ... ,
'Val1Opt2' : ...
}
}
This is illustrated as follows:
Table X: Input
Option1 String Short description
'VALUE1' Description of 'VALUE1'
'VALUE2' Description of 'VALUE2'
VALUE1 Table Y Options for 'VALUE1'
VALUE2 Table Z Options for 'VALUE2'
Table Y: Input['VALUE1']
Val1Opt1 String Description
Val1Opt2 Float Description
Table Z: Input['VALUE2']
Val2Opt1 String Description
Val2Opt2 Float Description
UQ[PY]LAB-V1.0-114 - 20 -
Statistical inference
3.1 Create an Input with inferred marginals and/or copula
Syntax
InputHat = uq.createInput(iOpts)
The dictionary variable iOpts contains the configuration information for the input object
whose marginals and/or copula are to be inferred from data.
Note: Inference and representation can be mixed: one may, for instance, declare some
marginals to be inferred while others are known, or infer the marginals but not
the copula, and so on. The inference options described in this chapter are applied
only to the distributions to be inferred, and ignored for the others.
3.1.1 General inference options (valid for marginals and copula)
The list of general inference options valid for both marginals and copulas in provided in Ta-
ble 1. Each of these options can also be provided for a specific marginal (resp. copula) under
iOpts['Marginals'][ii]['Inference'] (resp. iOpts['Copula']['Inference']). The
latter two (see Table 5 and Table 9) take priority and overwrite the first, in case both are pro-
vided.
Table 1: iOpts['Inference']
1
Data n-by-M list of lists . n
instances of dimension M
Inference data.
Criterion String
default: 'AIC'
Selection criterion (Section 1.3.2)
'ML' Maximum likelihood
'AIC' Akaike information criterion
'BIC' Bayesian information criterion
'KS' Kolmogorov-Smirnov CDF distance.
Valid only for marginals (by default
set to 'AIC' for copulas instead)
PairIndepTest Dictionary Specifications for the pair
independence test (see Table 2).
Concerns copula inference only
BlockIndepTest Dictionary Specifications for the block
independence test (see Table 3).
Concerns copula inference only
Table 2: iOpts['Inference']['PairIndepTest']
1
Does not need to be specified if and only if both iOpts['Marginals']['Inference']['Data']
and iOpts['Copula']['Inference']['Data'] or iOpts['Copula']['Inference']['DataU']
are provided instead
UQ[PY]LAB-V1.0-114 - 21 -
UQLAB user manual
Alpha Scalar in [0, 1]
default: 0.1
Statistical threshold for the test of
pair independence. Set to 0 or [] to
disable the test.
Type String
default: 'Kendall'
Test statistic. One of 'Kendall',
'Spearman', or 'Pearson'
Correction String
default: 'auto'
Correction for the number of
pairwise tests. Applies to inference of
vine copulas only:
'none' No statistical correction
'fdr' False discovery rate correction
'Bonferroni' Bonferroni correction
'auto' or '' No correction if M 8, FDR
correction otherwise
Table 3: iOpts['Inference']['BlockIndepTest']
Alpha Scalar in [0, 1]
default: 0.1
Statistical threshold for the test of
block independence. Set to 0 or []
to disable the test.
Type String
default: 'Kendall'
Test statistic. One of 'Kendall',
'Spearman', or 'Pearson'
Correction String
default: 'auto'
Correction for the number of
pairwise tests:
'none' No statistical correction
'fdr' False discovery rate correction
'Bonferroni' Bonferroni correction
'auto' or '' No correction if M 8, FDR
correction otherwise
3.1.2 Options for inference of marginal distributions
The detailed list of options available for inference of marginal distributions is reported in the
tables below. For a list of marginal distribution families currently supported in UQ[PY]LAB,
see the UQ[PY]LAB User Manual – the INPUT module, Appendix A.
Table 4: iOpts['Marginals'][ii]
UQ[PY]LAB-V1.0-114 - 22 -
Statistical inference
Type String or List of strings
default: 'auto'
As a string , can be
'auto': marginal inferred among
all parametric distributions
the name of any supported or
custom marginal distribution
As a List , contains the name of the
marginal distributions among which
to select the inferred one
Inference Dictionary Inference options for marginal ii, see
Table 5. If provided, overwrites the
corresponding options specified in
iOpts['Inference']
Bounds List [a, b],
−∞ a < b +
Support of truncated distribution
(Section 1.2.3 of the UQ[PY]LAB
User Manual – the INPUT module)
Options Dictionary Kernel smoothing options (see
Table 6)
Table 5: iOpts['Marginals'][ii]['Inference']
2
Data List n × 1 Inference data for marginal ii
Criterion String
default: 'AIC'
Selection criterion (see
Section 1.3.2)
'ML' Maximum likelihood
'AIC' Akaike information criterion
'BIC' Bayesian information criterion
'KS' Kolmogorov-Smirnov CDF distance
Table 6: iOpts['Marginals'][ii]['Options']
Kernel String
default: 'Gaussian'
Kernel shape
'Gaussian' or
'Normal'
'Triangle' or
'Triangular'
'Box'
'Epanechnikov'
Bandwidth Float
default: set using Silver-
man’s rule (1.9)
Kernel bandwith
2
Mandatory if iOpts['Inference']['Data']is not provided
UQ[PY]LAB-V1.0-114 - 23 -
UQLAB user manual
3.1.3 Copula inference options
Options for inference of copulas
The detailed list of options available for inference of copulas is reported in the Tables below.
For a list of copula types currently supported in UQ[PY]LAB, see the UQ[PY]LAB User Manual
– the INPUT module, Section 1.4.
For a list of supported pair copula families, see in particular the UQ[PY]LAB User Manual
the INPUT module, Table 1.
Table 7: iOpts['Copula']
Type String or list of strings
default: 'auto'
As a string , can be
'auto': copula inferred among
all supported types
the name of any supported copula
type
As a List , contains the copula types
among which to perform inference
Inference Dictionary Inference options, see Table 9
Table 8: iOpts['Copula']['Inference']
Data List of lists n × M Data in the physical space to use for
copula inference
DataU List of lists n × M Data in the unit hypercube to use for
copula inference
Criterion String
default: 'AIC'
Copula selection criterion (see
Section 1.3.2)
'ML' Maximum likelihood
'AIC' Akaike information criterion
'BIC' Bayesian information criterion
PairIndepTest Dictionary Specifications of test for pair
independence (see Table 2)
BlockIndepTest Dictionary Specifications of test for block
independence (see Table 3)
If iOpts['Copula']['Type'] includes any of 'Pair', 'CVine' or 'DVine', or is the string
'auto', the following optional key is supported:
Table 9: iOpts['Copula']['Inference'](For pair or vine copulas)
PCfamilies String or list of strings
default: 'auto'
Name(s) or copula types to choose
from. If 'auto', try all supported
parametric families
UQ[PY]LAB-V1.0-114 - 24 -
Statistical inference
If iOpts['Copula']['Type']includes 'CVine' or 'DVine', or is the string 'auto', the
following optional keys are also accepted:
Table 10: iOpts['Copula']['Inference'](for C- and D- vines)
CVineStructure String 'auto' or list
(any permutation of
0:M-1)
default: 'auto'
C-vine structure. If 'auto', it is
inferred from data (Section 1.3.2)
DVineStructure String 'auto' or list
(any permutation of
0:M-1)
default: 'auto'
D-vine structure. If 'auto', it is
inferred from data (Section 1.3.2)
3.2 Working with inferred inputs
Using an input object whose marginals and/or copula were obtained by inference works
analogously to using any input. Methods such as uq.getSample, uq.print, uq.display,
uq.enrichSobol, etc., work as usual (see the UQ[PY]LAB User Manual the INPUT module).
The inference options specified for the individual marginals and/or for the copula can be
retrieved into the subkey InputHat['Marginals'][ii]['Inference'] and
InputHat['Copula']['Inference']. Similarly, various goodness-of-fit measures for the
inferred distribution over the data used for inference are stored under
InputHat['Marginals'][ii]['GoF'] and InputHat['Copula']['GoF'].
UQ[PY]LAB-V1.0-114 - 25 -
References
Aas, K., Czado, C., Frigessi, A., and Bakkend, H. (2009). Pair-copula constructions of multiple
dependence. Insurance: Mathematics and Economics, 44(2):182–198. 4, 7
Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on
Automatic Control, 19(6):716–723. 6
Applegate, D. L., Bixby, R. E., Chv
´
atal, V., and Cook, W. J. (2006). The Traveling Salesman
Problem: A Computational Study. NewJersey: Princeton University Press. 7
Dißmann, J., Brechmann, E. C., Czado, C., and Kurowicka, D. (2013). Selecting and estimat-
ing regular vine copulae and application to financial returns. Computational Statistics and
Data Analysis, 59:52–69. 8
Joe, H. (2015). Dependence modeling with copulas. CRC Press. 3, 7
Morales-N
´
apoles, O. (2011). Counting vines. In Kurowicka, D. and Joe, H., editors, De-
pendence Modeling: Vine Copula Handbook, chapter 9, pages 189–218. World Scientific
Publisher Co. 7
Schwartz, G. (1978). Estimating the dimension of a model. Annals Stat., 6(2):461–464. 6
Shaffer, J. P. (1995). Multiple hypothesis testing. Annual Review of Psychology, 46:561–584.
15
Silverman, B. W. (1996). Density estimation for statistics and data analysis, volume 26 of
Monographs on Statistics and Applied Probability. Chapman & Hall. 5
Simonoff, J. S. (1996). Smoothing Methods in Statistics. Springer. 5
Stuart, A., Ord, K., and Arnold, S. (1999). Kendall’s advanced theory of statistics Vol. 2A
Classical inference and the linear model. Arnold, 6th edition. 7
26