Gradient descent#
Algorithm (Gradient descent)
To solve the optimization problem
\[
\mathcal L(\boldsymbol w) \to \min\limits_{\boldsymbol w}
\]
do the followind steps:
initialize \(\boldsymbol w\) by some random values (e.g., from \(\mathcal N(0, 1\)))
choose tolerance \(\varepsilon > 0\) and learning rate \(\eta > 0\)
while \(\Vert \nabla\mathcal L(\boldsymbol w) \Vert > \varepsilon\) do the gradient step
\[ \boldsymbol w := \boldsymbol w - \eta\nabla\mathcal L(\boldsymbol w) \]return \(\boldsymbol w\)
Note
If condition \(\Vert \nabla\mathcal L(\boldsymbol w) \Vert > \varepsilon\) holds for too long, the loop in step 3 terminates after some number iterations max_iter
.