Numeric optimization

Numeric optimization#

Due to some computational issues it could be inpractical to calculate \(\widehat{\boldsymbol w}\) by the formula (19). If so, a numerical optimization method can be used. The most common one is gradient descent algorithm.

For ordinary linear regression the loss function is proportional to

\[ \mathcal L(\boldsymbol w) = \frac 12\Vert\boldsymbol {Xw} - \boldsymbol y \Vert_2^2, \]

therefore, \(\nabla \mathcal L(\boldsymbol w) = \boldsymbol X^\top(\boldsymbol{Xw} - \boldsymbol y)\).

Gradient descent#

Now let’s implement the gradient descent algorithm for this objective:

import numpy as np

def linear_regression_gd(X, y, learning_rate=0.01, tol=1e-3, max_iter=10000):
    w = np.random.normal(size=X.shape[1])
    gradient = X.T.dot(X.dot(w) - y)
    for i in range(max_iter):
        if np.linalg.norm(gradient) <= tol:
            return w
        w -= learning_rate * gradient
        gradient = X.T.dot(X.dot(w) - y)
    print("max_iter exceeded")
    return w

Try in on some synthetic data:

X = np.random.normal(size=(50, 20))
y = np.random.rand(50)
w = linear_regression_gd(X, y, tol=1e-4)
print("MSE:", np.mean((X.dot(w) - y)**2))

MSE: 0.17863408307966058

from sklearn.linear_model import LinearRegression
LR = LinearRegression(fit_intercept=False)
LR.fit(X, y)
print("r-score:", LR.score(X, y))
print("MSE:", np.mean((LR.predict(X) - y) ** 2))
print("Difference from w:", np.linalg.norm(LR.coef_ - w))

r-score: -1.2670845255665677
MSE: 0.17863408305332582
Difference from w: 1.3493667742254247e-05

Now apply to Boston dataset:

import pandas as pd
boston = pd.read_csv("../ISLP_datsets/Boston.csv").drop("Unnamed: 0", axis=1)
target = boston['medv']
train = boston.drop(['medv'], axis=1)
X_train = np.hstack([np.ones(506)[:, None], train])

weights = linear_regression_gd(X_train, target.values, learning_rate=9e-9)
print("MSE:", np.mean((X_train.dot(weights) - target)**2))

max_iter exceeded
MSE: 54.281765917872875

Looks not so good…

Newton’s method#

TODO

Add more experiments for gradient descent along with visualizations
Apply Newton’s method to linear regression
Add regularization
Compare the performance of GD and Newton’s method

Numeric optimization

Contents

Numeric optimization#

Gradient descent#

Newton’s method#