Gradient descent#

Algorithm (Gradient descent)

To solve the optimization problem

\[ \mathcal L(\boldsymbol w) \to \min\limits_{\boldsymbol w} \]

do the followind steps:

  1. initialize \(\boldsymbol w\) by some random values (e.g., from \(\mathcal N(0, 1\)))

  2. choose tolerance \(\varepsilon > 0\) and learning rate \(\eta > 0\)

  3. while \(\Vert \nabla\mathcal L(\boldsymbol w) \Vert > \varepsilon\) do the gradient step

    \[ \boldsymbol w := \boldsymbol w - \eta\nabla\mathcal L(\boldsymbol w) \]
  4. return \(\boldsymbol w\)

Note

If condition \(\Vert \nabla\mathcal L(\boldsymbol w) \Vert > \varepsilon\) holds for too long, the loop in step 3 terminates after some number iterations max_iter.