## § The geometry of Lagrange multipliers

If we want to minise a function $f(x)$ subject to the constraints $g(x) = c$,
one uses the method of lagrange multipliers. The idea is to consider a new
function $L(x, \lambda) = f(x) + \lambda (c - g(x))$. Now, if one has a local maxima
$(x^\star, y^\star)$, then the conditions:
- $\frac{\partial L}{\partial x} = 0$: $f'(x^\star) - \lambda g'(x^\star) = 0$.
- $\frac{\partial L}{\partial \lambda} = 0$: $g(x^\star) = c$.

Equation (2) is sensible: we want our optima to satisfy the constraint that
we had originally imposed. What is Equation (1) trying to say?
Geometrically, it's asking us to keep $f'(x^\star)$ parallel to $g'(x^\star)$.
Why is this a good ask?
Let us say that we are at an $(x_0)$ which is a feasible point ($g(x_0) = c$).
We are interested in wiggling
$(x_0) \xrightarrow{wiggle} (x_0 + \vec\epsilon) \equiv x_1$.
- $x_1$ is still feasible: $g(x_1) = c = g(x_0)$.
- $x_1$ is an improvement: $f(x_1) > f(x_0)$.

- If we want $g(x_1)$ to not change, then we need $g'(x_0) \cdot \vec \epsilon = 0$.
- If we want $f(x_1)$ to be larger, we need $f'(x_0) \cdot \vec \epsilon > 0$.

If $f'(x_0)$ and $g'(x_0)$ are parallel, then attempting to improve $f(x_0 + \vec \epsilon)$
by change $g(x_0 + \vec \epsilon)$, and thereby violate the constraint
$g(x_0 + \epsilon) = c$.