Root Finding and Fixed-Point Iteration

Root Finding: Where Optimization Meets Equations

Finding where f(x)=0f(\mathbf{x}) = \mathbf{0} is closely related to optimization: the first-order optimality condition βˆ‡f=0\nabla f = \mathbf{0} is itself a root-finding problem. SciPy provides fsolve, root, and brentq for different scenarios. Fixed-point iteration xk+1=g(xk)\mathbf{x}_{k+1} = g(\mathbf{x}_k) is a complementary framework that appears in iterative algorithms throughout scientific computing.

Definition:

Root-Finding Problem

Given f:Rnβ†’Rnf : \mathbb{R}^n \to \mathbb{R}^n, find x⋆\mathbf{x}^\star such that:

f(x⋆)=0f(\mathbf{x}^\star) = \mathbf{0}

For n=1n=1, this is finding the zeros of a scalar function. For n>1n > 1, this is solving a system of nonlinear equations.

from scipy.optimize import fsolve, root, brentq

# Scalar: brentq (bracketing, guaranteed for continuous f)
x_star = brentq(f, a, b)  # f(a) and f(b) must have opposite signs

# System: fsolve (Levenberg-Marquardt)
x_star = fsolve(func, x0)

# System: root (multiple methods)
result = root(func, x0, method='hybr')

Definition:

Bisection Method

For a continuous f:R→Rf : \mathbb{R} \to \mathbb{R} with f(a)⋅f(b)<0f(a) \cdot f(b) < 0 (sign change), the bisection method repeatedly halves the interval:

  1. Compute c=(a+b)/2c = (a + b) / 2
  2. If f(a)β‹…f(c)<0f(a) \cdot f(c) < 0, set b=cb = c; else set a=ca = c
  3. Repeat until ∣bβˆ’a∣<Ο΅|b - a| < \epsilon

Convergence is linear with rate 1/21/2: error halves each iteration. After kk iterations, ∣xβ‹†βˆ’ckβˆ£β‰€(bβˆ’a)/2k+1|x^\star - c_k| \le (b-a)/2^{k+1}.

from scipy.optimize import brentq

# brentq combines bisection with inverse quadratic interpolation
x = brentq(f, a, b, xtol=1e-12)

brentq is guaranteed to converge for continuous functions and is the most robust scalar root-finder in SciPy.

Bisection needs a bracket [a,b][a,b] where ff changes sign. For finding brackets automatically, use scipy.optimize.brentq with a manual search or scipy.optimize.ridder.

Definition:

Newton's Method for Root Finding

For solving f(x)=0f(x) = 0, Newton's method linearizes ff at xkx_k:

xk+1=xkβˆ’f(xk)fβ€²(xk)x_{k+1} = x_k - \frac{f(x_k)}{f'(x_k)}

For systems f(x)=0\mathbf{f}(\mathbf{x}) = \mathbf{0}:

xk+1=xkβˆ’J(xk)βˆ’1f(xk)\mathbf{x}_{k+1} = \mathbf{x}_k - \mathbf{J}(\mathbf{x}_k)^{-1} \mathbf{f}(\mathbf{x}_k)

where J\mathbf{J} is the Jacobian matrix Jij=βˆ‚fi/βˆ‚xjJ_{ij} = \partial f_i / \partial x_j.

Convergence is quadratic near the root (digits of accuracy double each iteration), but the method can diverge from poor starting points.

from scipy.optimize import newton

# Scalar with derivative
x = newton(f, x0, fprime=df)

# System: use root with method='hybr' (Powell's hybrid)
res = root(func, x0, jac=jacobian, method='hybr')

Definition:

Fixed-Point Iteration

A fixed point of g:Rnβ†’Rng : \mathbb{R}^n \to \mathbb{R}^n is a point x⋆\mathbf{x}^\star where g(x⋆)=x⋆g(\mathbf{x}^\star) = \mathbf{x}^\star.

The iteration xk+1=g(xk)\mathbf{x}_{k+1} = g(\mathbf{x}_k) converges to x⋆\mathbf{x}^\star if gg is a contraction: βˆ₯g(x)βˆ’g(y)βˆ₯≀Lβˆ₯xβˆ’yβˆ₯\|g(\mathbf{x}) - g(\mathbf{y})\| \le L\|\mathbf{x} - \mathbf{y}\| with L<1L < 1.

from scipy.optimize import fixed_point

# Solve x = cos(x)
x_star = fixed_point(np.cos, 0.5)
# x_star β‰ˆ 0.7390851332 (Dottie number)

Root finding f(x)=0f(\mathbf{x}) = \mathbf{0} is equivalent to fixed-point iteration with g(x)=xβˆ’Ξ±f(x)g(\mathbf{x}) = \mathbf{x} - \alpha f(\mathbf{x}) for suitable Ξ±\alpha.

Definition:

Anderson Acceleration (Anderson Mixing)

Anderson acceleration speeds up fixed-point iteration by combining the last mm iterates:

xk+1=(1βˆ’Ξ²)βˆ‘i=0mkΞΈixkβˆ’i+Ξ²βˆ‘i=0mkΞΈig(xkβˆ’i)\mathbf{x}_{k+1} = (1-\beta)\sum_{i=0}^{m_k} \theta_i \mathbf{x}_{k-i} + \beta \sum_{i=0}^{m_k} \theta_i g(\mathbf{x}_{k-i})

where the weights ΞΈi\theta_i are chosen to minimize the residual norm in a least-squares sense.

from scipy.optimize import root

# Anderson acceleration via root
def residual(x):
    return g(x) - x  # fixed point <=> root of residual

res = root(residual, x0, method='anderson')

Anderson acceleration is particularly effective for self-consistent field (SCF) iterations in electronic structure calculations and for ADMM in optimization.

Theorem: Banach Fixed-Point Theorem (Contraction Mapping)

If g:D→Dg : D \to D is a contraction on a complete metric space DD with Lipschitz constant L<1L < 1:

βˆ₯g(x)βˆ’g(y)βˆ₯≀Lβˆ₯xβˆ’yβˆ₯βˆ€x,y∈D\|g(\mathbf{x}) - g(\mathbf{y})\| \le L \|\mathbf{x} - \mathbf{y}\| \quad \forall \mathbf{x}, \mathbf{y} \in D

then gg has a unique fixed point xβ‹†βˆˆD\mathbf{x}^\star \in D, and the iteration xk+1=g(xk)\mathbf{x}_{k+1} = g(\mathbf{x}_k) converges to x⋆\mathbf{x}^\star from any starting point x0∈D\mathbf{x}_0 \in D with:

βˆ₯xkβˆ’x⋆βˆ₯≀Lk1βˆ’Lβˆ₯x1βˆ’x0βˆ₯\|\mathbf{x}_k - \mathbf{x}^\star\| \le \frac{L^k}{1-L}\|\mathbf{x}_1 - \mathbf{x}_0\|

A contraction shrinks distances. Repeated application of a shrinking map must converge to a unique fixed point β€” like repeatedly photocopying a photocopy, which converges to a single gray blob regardless of the original image.

Theorem: Local Quadratic Convergence of Newton's Method

If ff is twice continuously differentiable, f(x⋆)=0f(x^\star) = 0, fβ€²(x⋆)β‰ 0f'(x^\star) \neq 0, and x0x_0 is sufficiently close to x⋆x^\star, then Newton's method converges quadratically:

∣xk+1βˆ’xβ‹†βˆ£β‰€C∣xkβˆ’xβ‹†βˆ£2|x_{k+1} - x^\star| \le C |x_k - x^\star|^2

where C=∣fβ€²β€²(x⋆)∣/(2∣fβ€²(x⋆)∣)C = |f''(x^\star)| / (2|f'(x^\star)|). The number of correct digits approximately doubles each iteration.

Newton's method replaces ff by its tangent line and finds where the tangent crosses zero. Near the root, the tangent is an excellent approximation, so each step makes the error roughly quadratic.

Example: Solving a Nonlinear System with fsolve

Find the intersection of x2+y2=4x^2 + y^2 = 4 and xy=1xy = 1 using scipy.optimize.fsolve.

Example: Fixed-Point Iteration for Square Root

Compute a\sqrt{a} using the fixed-point iteration xk+1=12(xk+a/xk)x_{k+1} = \frac{1}{2}(x_k + a/x_k) (Babylonian method / Heron's method).

Root-Finding Methods Compared

Compare bisection, Newton's method, and secant method on scalar root-finding problems.

Parameters

Root Finding and Fixed-Point Iteration

python
fsolve, root, brentq, fixed_point, Anderson acceleration examples.
# Code from: ch08/python/root_finding.py
# Load from backend supplements endpoint

Quick Check

Which root-finding method is guaranteed to converge for a continuous function with a sign change on [a, b]?

Newton's method

Secant method

Bisection (brentq)

Fixed-point iteration

Common Mistake: fsolve Silently Returns Non-Solution

Mistake:

Using fsolve(f, x0) and assuming the returned value is a root. By default, fsolve does not raise an error if it fails to converge.

Correction:

Always check the convergence flag:

x, info, ier, msg = fsolve(f, x0, full_output=True)
if ier != 1:
    print(f"WARNING: {msg}")

Or use root which returns a result.success boolean.

Key Takeaway

For scalar roots, use brentq; for systems, use root. brentq is guaranteed to converge for bracketed continuous functions. For systems, root(method='hybr') is the default workhorse. Always provide the Jacobian when available for faster convergence, and always check result.success.

Historical Note: Newton-Raphson Method

1660s

Isaac Newton described his method for root-finding in De analysi (1669), but only for polynomial equations. Joseph Raphson independently published the general form in 1690. Thomas Simpson generalized it to systems of equations in 1740. The method remains the foundation of most modern nonlinear solvers, including SciPy's fsolve (which uses a modified Powell hybrid method combining Newton steps with dogleg trust regions).

Fixed Point

A point x⋆\mathbf{x}^\star where g(x⋆)=x⋆g(\mathbf{x}^\star) = \mathbf{x}^\star. Solving f(x)=0f(\mathbf{x}) = \mathbf{0} is equivalent to finding a fixed point of g(x)=xβˆ’f(x)g(\mathbf{x}) = \mathbf{x} - f(\mathbf{x}).

Contraction Mapping

A function gg with Lipschitz constant L<1L < 1: βˆ₯g(x)βˆ’g(y)βˆ₯≀Lβˆ₯xβˆ’yβˆ₯\|g(\mathbf{x}) - g(\mathbf{y})\| \le L\|\mathbf{x} - \mathbf{y}\|. The Banach theorem guarantees convergence of iteration xk+1=g(xk)x_{k+1} = g(x_k).

Related: Fixed Point

Jacobian Matrix

The nΓ—nn \times n matrix of partial derivatives Jij=βˆ‚fi/βˆ‚xjJ_{ij} = \partial f_i / \partial x_j for a vector-valued function f:Rnβ†’Rn\mathbf{f} : \mathbb{R}^n \to \mathbb{R}^n.

Related: Hessian