Open In Colab

from IPython.core.magic import register_cell_magic,register_line_cell_magic
from IPython.display import Javascript, display
import json



display(Javascript(
  """
deleteCell = function() {
      var me = document.currentScript;
      me.parentElement.parentElement.remove();
};
deleteOutput= function(){
      var me = document.currentScript;
      me.parentElement.remove();
};
deleteInput = function(){
      var me = document.currentScript;
      me.parentElement.parentElement.childNodes[1].remove();
};
"""))

@register_line_cell_magic
def deleteCell(*arg):
    display(Javascript('deleteCell();'))

@register_line_cell_magic
def deleteOutput(*arg):
    display(Javascript('deleteOutput();'))

@register_line_cell_magic
def deleteInput(*arg):
    display(Javascript('deleteInput();'))

%deleteInput
%deleteCell
# @title Установка либ { display-mode: "both" }

!pip install jupyterquiz
!pip install matplotlib
!pip install myst_nb
%pip install plotly numpy ipywidgets
zsh:1: command not found: pip
zsh:1: command not found: pip
zsh:1: command not found: pip
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: plotly in /Users/aidarkudabaev/Library/Python/3.9/lib/python/site-packages (5.18.0)
Requirement already satisfied: numpy in /Users/aidarkudabaev/Library/Python/3.9/lib/python/site-packages (1.26.2)
Requirement already satisfied: ipywidgets in /Users/aidarkudabaev/Library/Python/3.9/lib/python/site-packages (8.1.1)
Requirement already satisfied: packaging in /Users/aidarkudabaev/Library/Python/3.9/lib/python/site-packages (from plotly) (23.2)
Requirement already satisfied: tenacity>=6.2.0 in /Users/aidarkudabaev/Library/Python/3.9/lib/python/site-packages (from plotly) (8.2.3)
Requirement already satisfied: comm>=0.1.3 in /Users/aidarkudabaev/Library/Python/3.9/lib/python/site-packages (from ipywidgets) (0.2.0)
Requirement already satisfied: jupyterlab-widgets~=3.0.9 in /Users/aidarkudabaev/Library/Python/3.9/lib/python/site-packages (from ipywidgets) (3.0.9)
Requirement already satisfied: ipython>=6.1.0 in /Users/aidarkudabaev/Library/Python/3.9/lib/python/site-packages (from ipywidgets) (8.18.1)
Requirement already satisfied: traitlets>=4.3.1 in /Users/aidarkudabaev/Library/Python/3.9/lib/python/site-packages (from ipywidgets) (5.14.0)
Requirement already satisfied: widgetsnbextension~=4.0.9 in /Users/aidarkudabaev/Library/Python/3.9/lib/python/site-packages (from ipywidgets) (4.0.9)
Requirement already satisfied: typing-extensions in /Users/aidarkudabaev/Library/Python/3.9/lib/python/site-packages (from ipython>=6.1.0->ipywidgets) (4.8.0)
Requirement already satisfied: pygments>=2.4.0 in /Users/aidarkudabaev/Library/Python/3.9/lib/python/site-packages (from ipython>=6.1.0->ipywidgets) (2.17.2)
Requirement already satisfied: stack-data in /Users/aidarkudabaev/Library/Python/3.9/lib/python/site-packages (from ipython>=6.1.0->ipywidgets) (0.6.3)
Requirement already satisfied: jedi>=0.16 in /Users/aidarkudabaev/Library/Python/3.9/lib/python/site-packages (from ipython>=6.1.0->ipywidgets) (0.19.1)
Requirement already satisfied: matplotlib-inline in /Users/aidarkudabaev/Library/Python/3.9/lib/python/site-packages (from ipython>=6.1.0->ipywidgets) (0.1.6)
Requirement already satisfied: pexpect>4.3 in /Users/aidarkudabaev/Library/Python/3.9/lib/python/site-packages (from ipython>=6.1.0->ipywidgets) (4.9.0)
Requirement already satisfied: prompt-toolkit<3.1.0,>=3.0.41 in /Users/aidarkudabaev/Library/Python/3.9/lib/python/site-packages (from ipython>=6.1.0->ipywidgets) (3.0.41)
Requirement already satisfied: exceptiongroup in /Users/aidarkudabaev/Library/Python/3.9/lib/python/site-packages (from ipython>=6.1.0->ipywidgets) (1.2.0)
Requirement already satisfied: decorator in /Users/aidarkudabaev/Library/Python/3.9/lib/python/site-packages (from ipython>=6.1.0->ipywidgets) (5.1.1)
Requirement already satisfied: parso<0.9.0,>=0.8.3 in /Users/aidarkudabaev/Library/Python/3.9/lib/python/site-packages (from jedi>=0.16->ipython>=6.1.0->ipywidgets) (0.8.3)
Requirement already satisfied: ptyprocess>=0.5 in /Users/aidarkudabaev/Library/Python/3.9/lib/python/site-packages (from pexpect>4.3->ipython>=6.1.0->ipywidgets) (0.7.0)
Requirement already satisfied: wcwidth in /Users/aidarkudabaev/Library/Python/3.9/lib/python/site-packages (from prompt-toolkit<3.1.0,>=3.0.41->ipython>=6.1.0->ipywidgets) (0.2.12)
Requirement already satisfied: executing>=1.2.0 in /Users/aidarkudabaev/Library/Python/3.9/lib/python/site-packages (from stack-data->ipython>=6.1.0->ipywidgets) (2.0.1)
Requirement already satisfied: asttokens>=2.1.0 in /Users/aidarkudabaev/Library/Python/3.9/lib/python/site-packages (from stack-data->ipython>=6.1.0->ipywidgets) (2.4.1)
Requirement already satisfied: pure-eval in /Users/aidarkudabaev/Library/Python/3.9/lib/python/site-packages (from stack-data->ipython>=6.1.0->ipywidgets) (0.2.2)
Requirement already satisfied: six>=1.12.0 in /Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/site-packages (from asttokens>=2.1.0->stack-data->ipython>=6.1.0->ipywidgets) (1.15.0)
WARNING: You are using pip version 21.2.4; however, version 23.3.1 is available.
You should consider upgrading via the '/Library/Developer/CommandLineTools/usr/bin/python3 -m pip install --upgrade pip' command.
Note: you may need to restart the kernel to use updated packages.

Newton’s method#

Introduction#

This chapter covers Newton’s method, starting with its definition and application in solving problems, then detailing the basic steps for finding roots. It later continues with an in-depth exploration of the method.

The problem#

Computers are fantastic at crunching numbers and executing algorithms, but analytical solutions to equations often involve symbolic manipulation and reasoning that can be complex. Some equations don’t have closed-form solutions, meaning their solutions can’t be expressed using a finite number of basic mathematical operations and functions.

For example, consider the equation \(x^5 - x + 1 = 0\). This equation doesn’t have a simple analytical solution using elementary functions. In such cases, numerical methods come to the rescue. Computers can iteratively approach the solution with numerical techniques, providing approximate answers.

In essence, computers excel at finding numerical solutions efficiently, but analytical solutions may be elusive for certain types of equations.

Linear Approximations#

Newton’s method, also known as the Newton-Raphson method, is a numerical method intended for approximating the roots or zeros of a function. There are several versions of the method, we will start with the most basic one – the version that iteratively uses a tangent line of a function to approximate the roots.

This basic version of Newton’s Method has a high convergence speed, especially when the initial guess is already a close approximate to the root of the function. However, this version might be sensitive to the choice of initial guess. It can also fail converging if the initial approximation is close to the singular points of a function.

Linear Approximations are widely used in numerical methods to solve equations, and for optimization. This method is applied in engineering, physics, economics, computer science and other fields.

Here is basic usage of this method

def newton_method(func, func_derivative, initial_guess, tol=1e-6, max_iter=100):

    x = initial_guess
    for iteration in range(max_iter):
        f_x = func(x)
        f_prime_x = func_derivative(x)

        # Avoid division by zero
        if abs(f_prime_x) < tol:
            raise ValueError("Derivative is close to zero. Newton's method may not converge.")

        # Newton's method update
        x = x - f_x / f_prime_x

        print(f"Current approximation root: {x}")

        # Check for convergence
        if abs(f_x) < tol:
            return x, iteration + 1

    raise ValueError("Newton's method did not converge within the maximum number of iterations.")

# Example usage:
def example_function(x):
    return x**2 - 4

def example_derivative(x):
    return 2 * x

initial_guess = 52
root, num_iter = newton_method(example_function, example_derivative, initial_guess)

print(f"Approximate root: {root}")
print(f"Number of iterations: {num_iter}")
Current approximation root: 26.03846153846154
Current approximation root: 13.096040222701967
Current approximation root: 6.700738029591109
Current approximation root: 3.648843599411131
Current approximation root: 2.372540661342378
Current approximation root: 2.0292485070150263
Current approximation root: 2.0002107861998297
Current approximation root: 2.0000000111065352
Current approximation root: 2.0
Approximate root: 2.0
Number of iterations: 9

iterations

One Dimensional Newton’s Method#

Newton’s method, also known as the tangent method, provides an efficient way to solve equations numerically. This method allows you to find approximate values of the roots of functions. Here’s how it works:

  1. Selecting an initial guess: Start by choosing an initial guess \(x_0\), which can be any number.

  2. Calculation of function and derivative: Calculate the value of the function at the point \(x_0\) as \(f(x_0)\) and the value of the derivative of the function at this point as \(f'(x_0)\).

  3. Drawing up the equation of a tangent line: The equation of a tangent line at the point \(x_0\) is represented as: \(y = f(x_0) + f'(x_0)(x - x_0)\).

  4. Finding the root of the tangent line equation: Find the root of the tangent line equation, that is, the value of \(x_1\) at which \(f(x_1) = 0\). This value \(x_1\) becomes a new approximation to the root of the function.

  5. Repeat the process: Continue to apply these steps iteratively using formula at each iteration.

(49)#\[x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)}\]

Example#

To demonstrate this process, let’s use the function \(f(x)=x^{2}-2\)

  1. For an initial guess we are going to choose \(x_{0}=1.5\), since \(1\) is too small and \(2\) is too large.

  2. The value of the function at \(x_{0}\) equals \(f(x_{0})=f(1.5)=1.5^{2}-2=0.25\) The value of the derivative of the function at \(x_{0}\) equals \(f'(x_{0})=f'(1.5)=2\)

  3. Drawing the equation of a tangent line \(y=f(x_{0})+f'(x_{0})(x-x_{0})=0.25+3(x-1.5)=0.25+3x-4.5=3x-4.25\) tangent

  4. The root of the tangent line equation is \(x=1.416667\), thus the \(x_{1}=1.416667\)

  5. Iterating the process, we get Second iteration:
    \(x_{1}=1.416667\), \(f(x_{1})=0.00694539\), \(f'(x_{1})=2.83334\), \(y=2.83334x-4.00695\),
    \(x_{2}=1.414214\)
    Third iteration:
    \(x_{2}=1.414214\), \(f(x_{1})=0.00000124\), \(f'(x_{1})=2.828428\), \(y=2.828428x-4.00000124\),
    \(x_{3}=1.414214\)
    Now since \(x_{2}=x_{3}\) we can stop the iterating process and conclude that \(x=1.414214\).

Newton’s method converges to the root of the function, providing an exact solution with a sufficiently close initial approximation. This method is widely used in various fields to solve equations.

%deleteInput
from jupyterquiz import display_quiz

quiz = [{
    "title": "Quiz: One Dimensional Newton's Method",

            "question": "Consider the function $f(x) = x^2 - 4$ with an initial guess $x_0 = 4$. After one iteration of Newton's Method, what is the value of $x_1$?",
            "type": "numeric",
            "answers": [
                {"value": "2.5", "correct": True}
            ],
            "explanation": "Using Newton's Method: $x_1 = x_0 - f(x_0) / f'(x_0)$. Here, $f'(x) = 2x$. So, $x1 = 3 - (4^2 - 4) / (2 * 4) = 2.5$."
       }
    ]

display_quiz(quiz)

Pros and Cons#

Newton’s Method is a powerful tool for finding roots of a function, though besides its advantages it also has some disadvantages too. We will consider the most important notable ones.

Rapid Convergence#

Newton’s method has a quadratic convergence, thus allowing for a rapid root finding process.

Initial Guess Sensitivity#

Newton’s method might not converge in case if initial guess was a poor estimate.

Proof

For example, let’s consider a function \(f(x)=x^{3}-2x+2\) with a \(0\) as the initial guess. Using the formula (49), we get that \(x_{1}\) is \(1\).

However, the second iteration gives us the value of \(0\) again. So in this case we are stuck in a loop.

Newton’s method continues to be one of the most important tools in numerical analysis and equations solving. Though its application requires careful consideration of the pros and cons in the context of a specific task.

problems

%deleteInput
from jupyterquiz import display_quiz

quiz = [{
        "question": "What does Newton's Method primarily use to find the root of a function?",
        "type": "multiple_choice",
        "answers": [
            {
                "answer": "First derivative of the function",
                "correct": False,
                "feedback": ""
            },
            {
                "answer": "Second derivative of the function",
                "correct": False,
                "feedback": ""
            },
            {
                "answer": "Integral of the function",
                "correct": False,
                "feedback": ""
            },
            {
                "answer": "Both first and second derivatives of the function",
                "correct": True,
                "feedback": "Newton's Method uses both the first (for the slope) and second derivatives (for curvature) of the function to find its root."
            }
        ]
    }]

display_quiz(quiz)

Newton’s method for optimization#

Generalization for \(d\) dimensions#

Let’s make second order Taylor expansion given \(f:\mathbb{R}^d \rightarrow \mathbb{R}\) to solve optimization problem

(51)#\[\min_{x \in \mathbb{R}^d} f(\pmb{x})\]

\(f(\pmb{x}+\Delta \pmb{x})\approx f(\pmb{x}) + \langle \nabla f(\pmb{x}),\Delta \pmb{x} \rangle+\frac{1}{2}\langle \Delta \pmb{x}, B(\pmb{x})\Delta \pmb{x} \rangle\)

to find minimum, let’s set gradient of above equation to zero

this gives us

\(\Delta \pmb{x} = [B(\pmb{x})]^{-1}\nabla f(\pmb{x})\)

let’s label \(B_k=B(\pmb{x}_k), H_k=B_k^{-1}\)

this gives us iterative Newton’s method step

(52)#\[\pmb{x}_{k+1}=\pmb{x}_k - H_k\nabla f(\pmb{x}_k)\]

where \(\nabla f(\pmb{x})\) is called gradient and looks like vector:

(53)#\[\begin{split}\nabla f(\pmb{x}) = \begin{pmatrix} \frac{\partial{f(\pmb{x})}}{\partial{x_{1}}} \\ \frac{\partial{f(\pmb{x})}}{\partial{x_{2}}} \\ \frac{\partial{f(\pmb{x})}}{\partial{x_{3}}} \\ ... \\ \end{pmatrix}\end{split}\]

and \(B(\pmb{x})\) is called Hessian and looks like matrix:

(54)#\[\begin{split}B(\pmb{x}) = \begin{pmatrix} \frac{\partial^2{f(\pmb{x})}}{\partial{x_1}\partial{x_1}} & \frac{\partial^2{f(\pmb{x})}}{\partial{x_1}\partial{x_2}} & \frac{\partial^2{f(\pmb{x})}}{\partial{x_1}\partial{x_3}} & ... \\ \frac{\partial^2{f(\pmb{x})}}{\partial{x_2}\partial{x_1}} & \frac{\partial^2{f(\pmb{x})}}{\partial{x_2}\partial{x_2}} & \frac{\partial^2{f(\pmb{x})}}{\partial{x_2}\partial{x_3}} & ... \\ \frac{\partial^2{f(\pmb{x})}}{\partial{x_3}\partial{x_1}} & \frac{\partial^2{f(\pmb{x})}}{\partial{x_3}\partial{x_2}} & \frac{\partial^2{f(\pmb{x})}}{\partial{x_3}\partial{x_3}} & ... \\ \end{pmatrix}\end{split}\]

Note

It can converge to a saddle point instead of to a local minimum

# %% [code]
%deleteInput
import numpy as np
import plotly.graph_objects as go
from myst_nb import glue

# Define the function and its Jacobian (partial derivatives)
def func(x, y):
    return x**2 - 2*y**2 + 2

def jac(x, y):
    return np.array([2 * x, -4 * y])

def hec():
    return np.array([[2, 0], [0, -4]]) # Hessian matrix for this example

# Newton's Method for Multivariate Optimization
def newtons_method(initial_guess, max_iterations=20, tolerance=1e-6, lr=0.01):
    x, y = initial_guess

    x_vals = [x]
    y_vals = [y]

    for i in range(max_iterations):
        gradient = jac(x, y)
        hessian_inv = np.linalg.inv(hec())

        update = lr * np.dot(hessian_inv, gradient)
        x -= update[0]
        y -= update[1]

        x_vals.append(x)
        y_vals.append(y)

        if np.linalg.norm(gradient) < tolerance:
            break

    return x_vals, y_vals

# Visualize the Optimization Path
initial_guess = np.array([2.0, 2.0])
x_opt, y_opt = newtons_method(initial_guess,lr=0.1)

# Create a surface plot of the function
x_vals = np.linspace(-3, 3, 100)
y_vals = np.linspace(-3, 3, 100)
X, Y = np.meshgrid(x_vals, y_vals)
Z = func(X, Y)

fig = go.Figure(
    data=[go.Surface(z=Z, x=X, y=Y, colorscale='Viridis', opacity=0.8)],
    layout=go.Layout(
        title="Start Title",
        updatemenus=[dict(
            type="buttons",
            buttons=[dict(label="Play",
                          method="animate",
                          args=[None])])]
    ),
frames=[go.Frame(
        data=[go.Scatter3d(
                    x=x_opt[:k+1],
                    y=y_opt[:k+1],
                    z=[func(x, y) for x, y in zip(x_opt[:k+1], y_opt[:k+1])],
                    mode='markers+lines', marker=dict(size=5, color='red')
                    )],
        traces= [0],
        name=f'frame{k}'
        )for k  in  range(len(x_opt)-1)]
)

# Add the surface plot
fig.add_trace(go.Surface(z=Z, x=X, y=Y, colorscale='Viridis', opacity=0.8))


# Set layout
fig.update_layout(scene=dict(zaxis=dict(range=[-2, 10])))
fig.update_layout(title="Saddle point")

# Set the backend to notebook for interactive plotting
%matplotlib notebook

fig.show()

Implementation#

Now let’s implement what we learned in the basic form.

import numpy as np
import plotly.graph_objects as go

# Define the function and its Jacobian (partial derivatives)
def func(x, y):
    return x**2 + 2*y**2 -2

def jac(x, y):
    return np.array([2 * x, 4 * y])

def hess():
    return np.array([[2, 0], [0, 4]]) # Hessian matrix for this example

# Newton's Method for Multivariate Optimization
def newtons_method(initial_guess, max_iterations=20, tolerance=1e-6, alpha=0.01):
    x, y = initial_guess

    x_vals = [x]
    y_vals = [y]

    for i in range(max_iterations):
        gradient = jac(x, y)
        hessian_inv = np.linalg.inv(hess())

        update = alpha * np.dot(hessian_inv, gradient)
        x -= update[0]
        y -= update[1]

        x_vals.append(x)
        y_vals.append(y)

        if np.linalg.norm(gradient) < tolerance:
            break

    return x_vals, y_vals

Note that the method returns an array. To get the final result just print the last element. Here is an example usage of it:

initial_guess = np.array([2.0, 2.0])
x_opt, y_opt = newtons_method(initial_guess,alpha=0.5)
print(f"optimal x:{x_opt[-1]}")
print(f"optimal y:{y_opt[-1]}")
optimal x:1.9073486328125e-06
optimal y:1.9073486328125e-06

You can tweak alpha parameter and max_iterations to test the convergence.

About \(\alpha\)

In literature if \(\alpha \in (0,1)\) then this method is called damped Newton’s method

%deleteInput
# Create a surface plot of the function
x_vals = np.linspace(-3, 3, 100)
y_vals = np.linspace(-3, 3, 100)
x, Y = np.meshgrid(x_vals, y_vals)
Z = func(x, Y)

fig = go.Figure(
    data=[go.Surface(z=Z, x=x, y=Y, colorscale='Viridis', opacity=0.8)],
    layout=go.Layout(
        title="Start Title",
        updatemenus=[dict(
            type="buttons",
            buttons=[dict(label="Play",
                          method="animate",
                          args=[None])])]
    ),
frames=[go.Frame(
        data=[go.Scatter3d(
                    x=x_opt[:k+1],
                    y=y_opt[:k+1],
                    z=[func(x, y) for x, y in zip(x_opt[:k+1], y_opt[:k+1])],
                    mode='markers+lines', marker=dict(size=5, color='red')
                    )],
        traces= [0],
        name=f'frame{k}'
        )for k  in  range(len(x_opt)-1)]
)

# Add the surface plot
fig.add_trace(go.Surface(z=Z, x=x, y=Y, colorscale='Viridis', opacity=0.8))

# Set layout
fig.update_layout(scene=dict(zaxis=dict(range=[-2, 10])))
fig.update_layout(title="Newton's Method in 3D")

fig.show()

Application in ML#

Newton’s Method is a very useful tool for the optimization process in Machine Learning. It allows us to adjust several parameters of a model at once, unlike other methods (e.g. Gradient Descent) that only let us adjust one parameter at a time.

In other words, Newton’s Method looks at both the slope and the curvature – first and second derivatives respectively. This can make it much faster in terms of optimization.

However, it is also worth noting that Newton’s Method might be trickier to use when a model is very complex. Thus, we can say that Newton’s Method is quicker but more risky, while Gradient Descent is more steady but slow.

To solve the optimization problem in ML

\[ \mathcal L(\pmb{w}) \to \min\limits_{\pmb{w}}\]

do the followind steps:

  1. initialize \(\pmb{w}\) by some random values (e.g., from \(\mathcal N(0, 1)\))

  2. choose tolerance \(\varepsilon > 0\) and learning rate \(\eta > 0\)

  3. while \(\Vert \nabla\mathcal L(\pmb{w}) \Vert > \varepsilon\) do the iteration step!

\[ \pmb{w} := \pmb{w} - \eta(\nabla^2\mathcal L(\pmb{w}))^{-1} \nabla \mathcal L\]
  1. return \(\pmb{w}\)

newton

Gradient descent (green) and Newton’s method (red) in comparison for minimizing a function (with small step sizes).

%deleteInput
from jupyterquiz import display_quiz

quiz = [{
        "question": "Which of the following is an advantage of Newton's Method?",
        "type": "multiple_choice",
        "answers": [
            {
                "answer": "Requires less memory compared to other methods",
                "correct": False,
                "feedback": "Incorrect, try again"
            },
            {
                "answer": "Typically converges faster than gradient descent for well-behaved functions",
                "correct": True,
                "feedback": "Newton's Method can converge more quickly for certain types of functions due to its use of both first and second derivatives."
            },
            {
                "answer": "Does not require calculation of second derivatives",
                "correct": False,
                "feedback": "Incorrect, try again"
            },
            {
                "answer": "Suitable for very large datasets",
                "correct": False,
                "feedback": "Incorrect, try again"
            }
        ]
    }]

display_quiz(quiz)