Regression metrics#

Regression metrics are used to evaluate the performance of regression models, which are machine learning models that predict continuous numeric values rather than discrete classes.

As usual, denote dataset \(\mathcal D = \{(\boldsymbol x_i, y_i)\}_{i=1}^n\), \(y_i \in \mathbb R\), and let \(\widehat y_i\) be predictions of some regression model. Regression metrics show how good this predictions are.

Mean Squared Error (MSE)#

MSE calculates the average squared difference between the predicted values and the actual target values.

\[ \mathrm{MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 \]

MSE gives more weight to large errors compared to smaller errors because it squares the differences between predicted and actual values. This can be advantageous or not depending on the task.

Advantages of MSE

  • MSE is a smooth metric which makes it suitable for gradient optimization

  • Mathematical convenience

Disadvantages of MSE

  • MSE is highly sensitive to outliers in the data

  • MSE is not scale-invariant which hurts interpretability

  • The squaring operation in MSE places more emphasis on larger errors

Root Mean Squared Error (RMSE)#

RMSE is the square root of MSE:

\[ \mathrm{RMSE} = \sqrt{\mathrm{MSE}} = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2} \]

It provides a measure of the standard deviation of prediction errors and is in the same units as the target variable. Thus, RMSE is more interpretable than MSE.

\(R^2\)-score#

To overcome some flaws of MSE the coefficient of determination (or \(R^2\)-score) used:

(28)#\[R^2 = 1 - \frac{\sum\limits_{i=1}^n(y_i - \widehat y_i)^2}{\sum\limits_{i=1}^n(y_i - \overline {\boldsymbol y})^2}.\]

The coefficient of determination shows proportion of variance explained. \(R^2\)-score does not exceed \(1\) (the greater — the better).

Mean Absolute Error (MAE)#

MAE calculates the average absolute difference between the predicted values and the actual target values.

It gives an indication of how far off the predictions are on average.

\[ \mathrm{MAE} = \frac{1}{n} \sum_{i=1}^{n} \vert y_i - \hat{y}_i\vert. \]

Advantages of MAE

  • MAE is straightforward to understand and interpret

  • MAE is less sensitive to outliers compared to some other metrics like MSE or RMSE

  • MAE is scale-invariant, meaning it doesn’t change if the units of measurement of the target variable change

Disadvantages of MAE

  • MAE is not differentiable at zero, which can cause issues when trying to use it in gradient optimization algorithms

  • While MAE is less sensitive to outliers than some other metrics, it is not completely immune to their influence; extreme outliers can still have a noticeable impact on MAE

MAPE#

Mean Absolute Percentage Error (MAPE) metric is commonly used to measure the accuracy of forecasts or predictions, especially in time series forecasting and demand forecasting.

\[ \mathrm{MAPE} = \frac 1n\sum_{i=1}^{n} \frac{\vert y_i - \widehat{y}_i\vert}{\vert y_i \vert}. \]

This value is undefinded if \(y_i = 0\), that’s why sometimes another version of MAPE is used — symmetric mean absolute percentage error (sMAPE):

\[ \mathrm{sMAPE} = \frac 2n\sum_{i=1}^{n} \frac{\vert y_i - \widehat{y}_i\vert}{\vert y_i \vert + \vert \widehat y_i\vert}. \]

Q. Does MAPE necessary belongs to interval \([0, 1]\)?

Simulated example#

Take some line and add random noise to it:

import numpy as np
import matplotlib.pyplot as plt

%config InlineBackend.figure_format = 'svg'

xs = np.linspace(0, 1, num=100, endpoint=False)
a, b = -0.5, 1.7
y = a * xs + b + 0.5*np.random.randn(100)

plt.plot(xs, a * xs + b, c="r", lw=2, label="Ground truth")
plt.scatter(xs, y, c="b", s=10, label="Data")
plt.legend()
plt.grid(ls=":");
../_images/b8d339c9dd80c24fc409e77239f3eb5920799006bb984b797b405eb7b8698b8d.svg

Fit linear regression model and check metrics:

from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error, mean_absolute_percentage_error, r2_score


lin_reg = LinearRegression()
lin_reg.fit(xs[:, None], y)
y_hat = lin_reg.predict(xs[:, None])
print("Bias:", lin_reg.intercept_)
print("Slope:", lin_reg.coef_[0])

mae = mean_absolute_error(y, y_hat)
mse = mean_squared_error(y, y_hat)
rmse = np.sqrt(mse)
R2 = r2_score(y, y_hat)
mape = mean_absolute_percentage_error(y, y_hat)

print(f"Mean Absolute Error (MAE): {mae:.4f}")
print(f"Mean Squared Error (MSE): {mse:.4f}")
print(f"Root Mean Squared Error (RMSE): {rmse:.4f}")
print(f"R2-score: {R2:.4f}")
print(f"Mean absolute percentage error(MAPE): {mape:.4f}")
Bias: 1.7949467141964524
Slope: -0.8082378285639453
Mean Absolute Error (MAE): 0.3520
Mean Squared Error (MSE): 0.2036
Root Mean Squared Error (RMSE): 0.4512
R2-score: 0.2110
Mean absolute percentage error(MAPE): 0.6091

Now make one point to be an outlier:

M = 20
y[np.random.randint(len(y))] += M
plt.plot(xs, a * xs + b, c="r", lw=2, label="Ground truth")
plt.scatter(xs, y, c="b", s=10, label="Data")
plt.legend()
plt.grid(ls=":");
../_images/08a656f80f614dbae4ed842785a3ebab64ae46abbfa50200b0e8c132a0b49118.svg

Fit linear regression once again:

lin_reg = LinearRegression()
lin_reg.fit(xs[:, None], y)
y_hat = lin_reg.predict(xs[:, None])
print("Bias:", lin_reg.intercept_)
print("Slope:", lin_reg.coef_[0])
Bias: 1.917718991424175
Slope: -0.6522222270037893

Print metrics:

mae = mean_absolute_error(y, y_hat)
mse = mean_squared_error(y, y_hat)
rmse = np.sqrt(mse)
R2 = r2_score(y, y_hat)
mape = mean_absolute_percentage_error(y, y_hat)

print(f"Mean Absolute Error (MAE): {mae:.4f}")
print(f"Mean Squared Error (MSE): {mse:.4f}")
print(f"Root Mean Squared Error (RMSE): {rmse:.4f}")
print(f"R2-score: {R2:.4f}")
print(f"Mean absolute percentage error(MAPE): {mape:.4f}")
Mean Absolute Error (MAE): 0.5953
Mean Squared Error (MSE): 4.5377
Root Mean Squared Error (RMSE): 2.1302
R2-score: 0.0078
Mean absolute percentage error(MAPE): 0.7639