KRIGING METAMODELING: HOUSING DATA SET¶

This example showcases how to perform Kriging metamodeling using existing data sets. A standard machine learning data set related to the housing prices in Boston is considered.

For more information, see: Harrison, D., Jr. and D. L. Rubinfeld (1978). Hedonic prices and the demand for clean air. Journal of Environmental Economics and Management, 5(1), 81-102. DOI:10.1016/0095-0696(78)90006-2

Package imports¶

In [12]:
from uqpylab import sessions, display_util
import numpy as np
import matplotlib.pyplot as plt
import scipy.io

# Initialize common plotting parameters
display_util.load_plt_defaults()

Start a remote UQCloud session¶

In [13]:
# Start the session
mySession = sessions.cloud()
# (Optional) Get a convenient handle to the command line interface
uq = mySession.cli
# Reset the session
mySession.reset()
 uqpylab.sessions :: INFO     :: A new session (065f701dd17141ccb740601d73b3b419) started.
 uqpylab.sessions :: INFO     :: Reset successful.

Set random seed for reproducibility¶

In [14]:
uq.rng(0,'twister');
np.random.seed(1)
In [15]:
# Read the data set and store the content in matrices:
mat = scipy.io.loadmat('housing.mat')
X = mat['X']
Y = mat['Y']

# Get the size of the experimental design:
N, M = X.shape

Training and validation sets¶

Use $80\%$ of the data for training and the rest for validation:

In [16]:
N_train = int(np.floor(0.8*N))
N_val = N - N_train

Initialize the results matrices:

In [17]:
N_reps = 5
Y_val = np.zeros((N_val, N_reps))
Y_Krg = np.zeros((N_val, N_reps))
legend_text = []

Kriging metamodels¶

Several Kriging metamodels are created using different training sets of the same size (N_train) randomly sampled from the whole housing data set. The steps (repeated five times) are as follows:

  • Randomly split the available data into a traning and a validation sets
  • Define and create a Kriging metamodel based on the training set
  • Evaluate the Kriging metamodel at the validation set points

Select Kriging as the metamodeling tool:

In [18]:
MetaOpts = {
    'Type': 'MetaModel',
    'MetaType': 'Kriging'
}

Use a linear trend for the Kriging metamodel:

In [19]:
MetaOpts['Trend'] = {
    'Type': 'linear'
}

Use the CMAES optimization algorithm to estimate the hyperparameters:

In [20]:
MetaOpts['Optim'] = {
    'Method': 'CMAES'
}
In [21]:
legend_text = []
for iter in range(N_reps):
    
    # Randomly split the data into a training and a validation sets
    idx_train = np.random.choice(N, size=N_train, replace=False)
    idx_val = np.setdiff1d(np.arange(N), idx_train, assume_unique=True)
    X_train = X[idx_train,:]
    Y_train = Y[idx_train]
    X_val = X[idx_val,:]
    Y_val[:,[iter]] = Y[idx_val]

    # Assign the training data set as the experimental design
    MetaOpts['ExpDesign'] = {
        'X': X_train.tolist(),
        'Y': Y_train.tolist()
    }
    
    # Create the metamodel object
    myKriging = uq.createModel(MetaOpts)
    
    # Evaluate the Kriging metamodel at the validation set points
    Y_Krg[:,[iter]] = uq.evalModel(myKriging, X_val)
    
    # Create a text for the plot legend
    legend_text.append(f'Fold {iter+1}')

Validation¶

Compare the Kriging predictions at the validation set points against the true values over the N_reps repetitions:

In [22]:
plt.plot(Y_val, Y_Krg, 'o',alpha=0.7, markersize=3)
plt.plot([0, 60], [0, 60], 'k')
plt.axis([0, 60, 0, 60])
plt.grid(True)
plt.legend(legend_text, loc='lower right')
plt.xlabel('$\mathrm{Y_{true}}$')
plt.ylabel('$\mathrm{Y_{Krg}}$');
plt.show()

Terminate the remote UQCloud session¶

In [ ]:
mySession.quit()