Gradient descent

Gradient descent#

Algorithm (Gradient descent)

To solve the optimization problem

\[ \mathcal L(\boldsymbol w) \to \min\limits_{\boldsymbol w} \]

do the followind steps:

initialize \(\boldsymbol w\) by some random values (e.g., from \(\mathcal N(0, 1\)))
choose tolerance \(\varepsilon > 0\) and learning rate \(\eta > 0\)
while \(\Vert \nabla\mathcal L(\boldsymbol w) \Vert > \varepsilon\) do the gradient step

\[ \boldsymbol w := \boldsymbol w - \eta\nabla\mathcal L(\boldsymbol w) \]
return \(\boldsymbol w\)

Note

If condition \(\Vert \nabla\mathcal L(\boldsymbol w) \Vert > \varepsilon\) holds for too long, the loop in step 3 terminates after some number iterations max_iter.