Newton's method on \(C^2\) surface functions
This started as a question on my numerical analysis homework, but then it quickly became a tangent project for me while I further explored MATLAB's capiablities. The general setup is an open set \(U \subseteq \mathbb{R}^2\), a twice differentiable function \(f : U \to \mathbb{R}\), and an initial point \(x_0 \in U\). From there we can build the orbit \(\{x_k\}_{k \in \mathbb{N}}\) and graph how the iterates move across the surface.
Now if we have a function that is twice differentiable, and we can evaluate \(f\), \(\nabla f\), and \(\nabla^2 f\), then we can generate the Newton path, compare it against another update rule, and render the trajectory directly on the graph of the function. Rosenbrock is just the best test case because the valley geometry makes the behavior easy to see.
- Derived \(\nabla f\) and \(\nabla^2 f\) for the rosenbrock example and implemented both directly in MATLAB.
- Treated the iterates as a sequence \(\{x_k\}_{k \in \mathbb{N}}\) and recorded the full orbit, not just the endpoint.
- Implemented the Newton step with a linear solve instead of explicitly forming \(\bigl(\nabla^2 f(x_k)\bigr)^{-1}\).
- Compared second-order Newton updates with fixed-step gradient descent on the same function and exported both trajectories as 3d models.
- Extended the plotting/export code so large Newton excursions still enlarge the plotted domain instead of clipping off the path.
for the rosenbrock test surface \[ f(x,y) = (1-x)^2 + 100(y-x^2)^2, \] the derivatives become \[ \nabla f(x,y) = \begin{bmatrix} 2(x-1) - 400x(y-x^2) \\ 200(y-x^2) \end{bmatrix}, \qquad \nabla^2 f(x,y) = \begin{bmatrix} 2 - 400y + 1200x^2 & -400x \\ -400x & 200 \end{bmatrix}. \]
| Method | Initial guess | Observed behavior |
|---|---|---|
| Newton | \(x_0 = (-1.9, 2.2)\) | converged to \((1,1)\) in 5 steps, but only after a big excursion through \((0.94056, -7.126)\). |
| Newton | \(x_0 = (0.5, 1.8)\) | converged to \((1,1)\) in 4 steps with a much shorter path. |
| gradient descent \(\eta = 10^{-3}\) |
\(x_0 = (0.5, 1.8)\) | did not converge within 98 steps and stalled near \((1.1343, 1.2872)\). |
| gradient descent \(\eta = 10^{-3}\) |
\(x_0 = (-1.9, 2.2)\) | did not converge within 98 steps and stalled near \((-1.3976, 1.9608)\). |
The main thing I liked here is that the surfaces functions help visualize the behavior of the algorithms in a way that really match the theory. Newton's method is not “smooth” in the visual sense; it can jump hard because it is reading curvature information from the Hessian. gradient descent is easier to follow step by step, but with \(\eta = 10^{-3}\) it stayed slow and local on the same surface. this made the project feel less like an isolated homework problem and more like a small numerical lab for comparing update rules on the same set \(U \subseteq \mathbb{R}^2\).
x_0 = (-1.9, 2.2), including
the
large off-screen excursion before
convergence.
x_0 = (0.5, 1.8), showing the
much
shorter path to the minimizer.
x_0 = (-1.9, 2.2) showing slow
progress along the valley.
x_0 = (0.5, 1.8) with the same
step size, illustrating the non-converged
trajectory from the homework.