§ The geometry of Lagrange multipliers

If we want to minise a function f(x)f(x) subject to the constraints g(x)=cg(x) = c, one uses the method of lagrange multipliers. The idea is to consider a new function L(x,λ)=f(x)+λ(cg(x))L(x, \lambda) = f(x) + \lambda (c - g(x)). Now, if one has a local maxima (x,y)(x^\star, y^\star), then the conditions:
  1. Lx=0\frac{\partial L}{\partial x} = 0: f(x)λg(x)=0f'(x^\star) - \lambda g'(x^\star) = 0.
  2. Lλ=0\frac{\partial L}{\partial \lambda} = 0: g(x)=cg(x^\star) = c.
Equation (2) is sensible: we want our optima to satisfy the constraint that we had originally imposed. What is Equation (1) trying to say? Geometrically, it's asking us to keep f(x)f'(x^\star) parallel to g(x)g'(x^\star). Why is this a good ask? Let us say that we are at an (x0)(x_0) which is a feasible point (g(x0)=cg(x_0) = c). We are interested in wiggling (x0)wiggle(x0+ϵ)x1(x_0) \xrightarrow{wiggle} (x_0 + \vec\epsilon) \equiv x_1.
  • x1x_1 is still feasible: g(x1)=c=g(x0)g(x_1) = c = g(x_0).
  • x1x_1 is an improvement: f(x1)>f(x0)f(x_1) > f(x_0).
  • If we want g(x1)g(x_1) to not change, then we need g(x0)ϵ=0g'(x_0) \cdot \vec \epsilon = 0.
  • If we want f(x1)f(x_1) to be larger, we need f(x0)ϵ>0f'(x_0) \cdot \vec \epsilon > 0.
If f(x0)f'(x_0) and g(x0)g'(x_0) are parallel, then attempting to improve f(x0+ϵ)f(x_0 + \vec \epsilon) by change g(x0+ϵ)g(x_0 + \vec \epsilon), and thereby violate the constraint g(x0+ϵ)=cg(x_0 + \epsilon) = c.