INPUT MODULE: INFERENCE OF MARGINALS AND COPULA¶

This example showcases how to infer the marginal distributions from a given multivariate data.

NOTE: The original UQLab example includes a couple of advanced features that are currently not supported in UQ[py]Lab. Hence, there are currently small variations in the contents of this example compared to the UQLab one. In particular, UQ[py]Lab does not currently support the specification of custom marginal distrubutions. "Infinity" values e.g. in the bounds of a distribution can be specified by float('inf') for positive infinity and -float('inf') for negative infinity.

Package imports¶

In [1]:
from uqpylab import sessions
import numpy as np

Start a remote UQCloud session¶

In [2]:
# Start the session
mySession = sessions.cloud()
# (Optional) Get a convenient handle to the command line interface
uq = mySession.cli
# Reset the session
mySession.reset()
Processing .
.
 done!

 uqpylab.sessions :: INFO     :: This is UQ[py]Lab, version 1.00, running on https://uqcloud.ethz.ch. 
                                 UQ[py]Lab is free software, published under the open source BSD 3-clause license.
                                 To request special permissions, please contact:
                                  - Stefano Marelli (marelli@ibk.baug.ethz.ch).
                                 A new session (d7fefcbb73144df195d554f39f271a24) started.
 uqpylab.sessions :: INFO     :: Reset successful.

Set the random seed for reproducibility¶

In [3]:
uq.rng(100, 'twister');

Data generation¶

A hypothetical data set used for the inference is first generated using a reference (true) probabilistic input model. The input models inferred from this data set can later on be compared with the true one.

The true probabilistic input model consists of two independent random variables:

  • $X_1 \sim \mathcal{N}(0, 1)$, truncated in $[1, 10]$
  • $X_2 \sim \textrm{Beta}(6,4)$

Specify the marginals of these random variables:

In [4]:
iOptsTrue = {
    'Marginals': [
        {
            'Type': 'Gaussian',
            'Parameters': [0,1],
            'Bounds': [1,10]
        },
        {
            'Type': 'Beta',
            'Parameters': [6,4],
        }
    ]
}

Create an INPUT object based on the specified marginals and copulas:

In [5]:
myInputTrue = uq.createInput(iOptsTrue)

Visualize the input model:

In [6]:
uq.display(myInputTrue);

Generate a sample set of size $1'000$ from the input model:

In [7]:
X = uq.getSample(myInputTrue,1000)

Inference of marginals¶

The examples below infer a joint distribution on X using different inference options for the marginals. The copula, instead, is assumed known (independence copula)

In [8]:
InputOpts = {
    "Copula" : {
        "Type": "Independent"
    }
}

Full inference¶

The marginals are inferred among all supported parametric distributions.

Assign the data set to the input options:

In [9]:
InputOpts["Inference"] = {
       "Data": X.tolist()
}

Create an INPUT object to infer the marginals:

In [10]:
InputHat1 = uq.createInput(InputOpts)
Processing .
.
 done!

Plot the inferred probabilistic input model:

In [11]:
uq.display(InputHat1);

Print out a report on the inferred input model:

In [12]:
uq.print(InputHat1)
==============================================================
Input object name: Input 2
Dimension(M): 2

Marginals:

Index | Name | Type   |  Parameters             | Moments              
------------------------------------------------------------------------
1     | X1   | Gumbel |  1.328e+00, 3.011e-01   | 1.501e+00, 3.862e-01
2     | X2   | Beta   |  6.362e+00, 4.173e+00   | 6.039e-01, 1.440e-01


Copula:

Type: Independent
Dimension: 2
Variables coupled: [1 2]
==============================================================

Full inference with Kolmogorov-Smirnov selection criterion¶

The default selection criterion for the marginal distribution family is The Akaike information criterion (AIC).

The Kolmogorov-Smirnov criterion, meanwhile, tends to be a better choice for data generated from bounded marginals (such as $X_1$ in this example).

Select the Kolmogorov-Smirnov (KS) selection criterion for the first input marginal:

In [13]:
InputOpts["Marginals"] = [{
    "Inference": {
        "Criterion": 'KS'
    }
}]

Create an INPUT object to infer the marginals:

In [14]:
InputHat2 = uq.createInput(InputOpts)

Print out a report on the inferred input model:

In [15]:
uq.print(InputHat2)
==============================================================
Input object name: Input 3
Dimension(M): 2

Marginals:

Index | Name | Type   |  Parameters             | Moments              
------------------------------------------------------------------------
1     | X1   | Gumbel |  1.328e+00, 3.011e-01   | 1.501e+00, 3.862e-01
2     | X2   | Beta   |  6.362e+00, 4.173e+00   | 6.039e-01, 1.440e-01


Copula:

Tensor product of 2 copulas between the random vectors
	X_1, X_2

Copula 1, of X_1:
Type: Independent
Dimension: 1
Variables coupled: 1
Copula 2, of X_2:
Type: Independent
Dimension: 1
Variables coupled: 2
==============================================================

Instead of specifying inference options for each marginals separately, it is also possible to assign collective values:

In [16]:
InputOpts["Inference"]["Criterion"] = 'KS'

Create an INPUT object and print out a report on the inferred input model:

In [17]:
InputHat2b = uq.createInput(InputOpts)
uq.print(InputHat2b)
==============================================================
Input object name: Input 4
Dimension(M): 2

Marginals:

Index | Name | Type   |  Parameters             | Moments              
------------------------------------------------------------------------
1     | X1   | Gumbel |  1.328e+00, 3.011e-01   | 1.501e+00, 3.862e-01
2     | X2   | Beta   |  6.362e+00, 4.173e+00   | 6.039e-01, 1.440e-01


Copula:

Tensor product of 2 copulas between the random vectors
	X_1, X_2

Copula 1, of X_1:
Type: Independent
Dimension: 1
Variables coupled: 1
Copula 2, of X_2:
Type: Independent
Dimension: 1
Variables coupled: 2
==============================================================

Full inference with truncated marginal¶

The above inference options produce a non-truncated marginal distribution.

When the data are known to be of bounded ranges and these ranges are known, they can be specified as an inference option:

In [18]:
InputOpts["Marginals"][0]["Bounds"] = [1, 10]
InputHat3 = uq.createInput(InputOpts)

Print out a report on the inferred input model:

In [19]:
uq.print(InputHat3)
==============================================================
Input object name: Input 5
Dimension(M): 2

Marginals:

Index | Name | Type    |  Parameters             | Moments              
-------------------------------------------------------------------------
1     | X1   | Weibull |  1.105e+00, 1.765e+00   | 9.840e-01, 5.759e-01
2     | X2   | Beta    |  6.362e+00, 4.173e+00   | 6.039e-01, 1.440e-01


Copula:

Tensor product of 2 copulas between the random vectors
	X_1, X_2

Copula 1, of X_1:
Type: Independent
Dimension: 1
Variables coupled: 1
Copula 2, of X_2:
Type: Independent
Dimension: 1
Variables coupled: 2
==============================================================

Constrained set of marginal families¶

By default, inference of marginals is carried out among all supported marginals (if sensible: marginals with positive support, for instance, are discarded if the inference data contain negative values).

It is possible to manually set the list of parametric families to be considered for inference:

In [20]:
InputOpts["Marginals"][0]["Type"] = ["Gaussian", "Exponential", "Weibull"]
InputHat4 = uq.createInput(InputOpts)

Print out a report on the inferred input model:

In [21]:
uq.print(InputHat4)
==============================================================
Input object name: Input 6
Dimension(M): 2

Marginals:

Index | Name | Type    |  Parameters             | Moments              
-------------------------------------------------------------------------
1     | X1   | Weibull |  1.105e+00, 1.765e+00   | 9.840e-01, 5.759e-01
2     | X2   | Beta    |  6.362e+00, 4.173e+00   | 6.039e-01, 1.440e-01


Copula:

Tensor product of 2 copulas between the random vectors
	X_1, X_2

Copula 1, of X_1:
Type: Independent
Dimension: 1
Variables coupled: 1
Copula 2, of X_2:
Type: Independent
Dimension: 1
Variables coupled: 2
==============================================================

Parameter fitting of a fixed marginal family¶

If a marginal type or family is fixed, the inference reduces to parameter fitting:

In [22]:
InputOpts["Marginals"][0]["Type"] = "Gaussian"
InputHat5 = uq.createInput(InputOpts)

By default, inference of marginals is carried out among all supported marginals (if sensible: marginals with positive support, for instance, are discarded if the inference data contain negative values).

It is possible to manually set the list of parametric families to be considered for inference:

Print out a report on the inferred input model:

In [23]:
uq.print(InputHat5)
==============================================================
Input object name: Input 7
Dimension(M): 2

Marginals:

Index | Name | Type     |  Parameters             | Moments              
--------------------------------------------------------------------------
1     | X1   | Gaussian |  2.455e-01, 9.166e-01   | 2.455e-01, 9.166e-01
2     | X2   | Beta     |  6.362e+00, 4.173e+00   | 6.039e-01, 1.440e-01


Copula:

Tensor product of 2 copulas between the random vectors
	X_1, X_2

Copula 1, of X_1:
Type: Independent
Dimension: 1
Variables coupled: 1
Copula 2, of X_2:
Type: Independent
Dimension: 1
Variables coupled: 2
==============================================================

Inference of selected marginals¶

Inference and or fitting can be limited to just some of the marginals, while others can be fully specified.

Below, the marginal distribution of $X_1$ is fully specified while that of of $X_2$ is inferred:

In [24]:
InputOpts["Marginals"][0] = {
        "Type": "Gaussian",
        "Parameters": [0, 1],
        "Bounds": [1,10]
}
InputHat6 = uq.createInput(InputOpts)

Print out a report on the inferred input model:

In [25]:
uq.print(InputHat6)
==============================================================
Input object name: Input 8
Dimension(M): 2

Marginals:

Index | Name | Type     |  Parameters             | Moments              
--------------------------------------------------------------------------
1     | X1   | Gaussian |  0.000e+00, 1.000e+00   | 0.000e+00, 1.000e+00
2     | X2   | Beta     |  6.362e+00, 4.173e+00   | 6.039e-01, 1.440e-01


Copula:

Tensor product of 2 copulas between the random vectors
	X_1, X_2

Copula 1, of X_1:
Type: Independent
Dimension: 1
Variables coupled: 1
Copula 2, of X_2:
Type: Independent
Dimension: 1
Variables coupled: 2
==============================================================

Inference by kernel smoothing¶

Some data may not be suitably represented by any known parametric marginal distribution. If this is the case, a non-parametric fitting may produce better results.

In the example below, the kernel smoothing (ks) is used for the second marginal and the marginals are inferred.

In [26]:
InputOpts["Marginals"].append({
  "Type" : "ks"  
})
InputHat7 = uq.createInput(InputOpts)

Print out a report on the inferred input model:

In [27]:
uq.print(InputHat7)
==============================================================
Input object name: Input 9
Dimension(M): 2

Marginals:

Index | Name | Type     |  Parameters             | Moments              
--------------------------------------------------------------------------
1     | X1   | Gaussian |  0.000e+00, 1.000e+00   | 0.000e+00, 1.000e+00
2     | X2   | ks       |                         | 6.037e-01, 1.442e-01


Copula:

Type: Independent
Dimension: 2
Variables coupled: [1 2]
==============================================================

Specification of inference options for each marginal¶

As hinted above, all inference options for marginal distributions can be specified for each marginal separately, ensuring full flexibility.

In the example below, the marginal of $X_1$ is inferred among all supported parametric distributions, using the Bayesian inference criterion (BIC), based on the first $500$ data points.

The marginal of $X_2$ is inferred as a Beta distribution, using the default inference criterion (AIC), based on all ($1000$) data points.

In [28]:
del InputOpts
InputOpts = {
    'Marginals': [
        {
            'Type': 'auto',
            'Inference': {
                'Criterion': 'BIC',
                'Data': X[0:499,0].tolist()
            }
        },
        {
            'Type': 'Beta',
            'Inference': {
                'Data': X[:,1].tolist()
            }
        }
    ],
    
}
InputHat8 = uq.createInput(InputOpts)

Print out a report on the inferred input model:

In [29]:
uq.print(InputHat8)
==============================================================
Input object name: Input 10
Dimension(M): 2

Marginals:

Index | Name | Type   |  Parameters             | Moments              
------------------------------------------------------------------------
1     | X1   | Gumbel |  1.333e+00, 2.974e-01   | 1.504e+00, 3.815e-01
2     | X2   | Beta   |  6.362e+00, 4.173e+00   | 6.039e-01, 1.440e-01


Copula:

Type: Independent
Dimension: 2
Variables coupled: [1 2]
==============================================================

Terminate the remote UQCloud session¶

In [30]:
mySession.quit()
 uqpylab.sessions :: INFO     :: Session d7fefcbb73144df195d554f39f271a24 terminated.
Out[30]:
True