This article is also available as a PDF.

Introduction

This article is the first in a series I plan to write about physics for a mathematically trained audience. We’re going to start by talking about classical mechanics, the stuff that your first physics class was probably about if you’ve ever taken one. The formulation of classical physics usually presented in introductory physics classes is called Newtonian mechanics; it talks about things like masses and forces and Newton’s laws of motion. Newtonian mechanics is easy to teach and to work with without much machinery, but it has some features that can make it difficult to analyze mathematically. Physical systems and their interactions are described in terms of coordinates with velocity and force vectors all over the place, and it can be difficult to know how to deal with things like symmetries and constraints.

There are other, equivalent ways of describing classical mechanics, sometimes collectively called “analytical mechanics,” which are much easier to describe in a coordinate-free way. At the cost of a bit more abstraction, the analytical formulations have two big advantages: they make it easier to set up and solve some very complicated mechanics problems and, probably more importantly for our purposes, they make the relationship between classical mechanics and its generalizations most clear. The two most prominent such formulations are called Hamiltonian and Lagrangian mechanics, and they’re what we’re going to discuss in this article. They are, as we’ll see, two different ways of saying the same thing, but they highlight different enough aspects of the situation that they’re worth talking about separately.

This article assumes some mathematical background beyond what’s usually used to present these ideas in a physics class that covers them. In particular, the reader is expected to be familiar with the basics of the theory of smooth manifolds to the level of someone who’s finished a one-semester class on the subject. I will assume you remember a little bit about physics, but that you have never seen the Hamiltonian and Lagrangian frameworks discussed here.

I am very grateful to Jeff Hicks and Jake Levinson for their many helpful comments on earlier drafts of this article. Some of the examples and a couple ideas about the presentation are adapted from Gerald Folland’s Quantum Field Theory: A Tourist Guide for Mathematicians and Michael Spivak’s Physics for Mathematicians: Mechanics I, both of which I recommend.

Hamiltonian Mechanics

The Newtonian Setup

We’ll start by briefly describing, in coordinates, the sort of Newtonian mechanics problem we’re eventually going to be describing in a coordinate-free way. The prototypical example to keep in mind is that of a collection of NN particles moving in R3\mathbb{R}^3 where particle ii has mass mim_i. We’ll write the position of particle ii as qi\mathbf q_i, with the boldface there to remind you that it’s an element of R3\mathbb{R}^3 and not an R\mathbb{R}-valued coordinate on R3N\mathbb{R}^{3N}. We’ll write pi=mi(dqi/dt)\mathbf p_i=m_i( d \mathbf q_i/ d t) for the momentum of particle ii.

In the Newtonian setup, we describe physics in terms of forces; we imagine that there is some vector Fi\mathbf F_i we can compute for each particle for all time which tells us how that particle is accelerating, or equivalently, how its momentum is changing. Specifically, the relationship is given by “Newton’s second law”: Fi=dpidt=mid2qidt2.\mathbf F_i=\frac{ d \mathbf p_i}{ d t}=m_i\frac{d^2\mathbf q_i}{ d t^2}.

In general one could imagine these forces depending on any data whatsoever about the physical system, but we’re going to be most interested in the case of conservative forces. This is the case where there is a function VV on R3N\mathbb{R}^{3N} called a potential for which the force on particle ii is given by Fi=Vqi.\mathbf F_i=-\frac{ \partial V}{ \partial \mathbf q_i}. So the force is given by the gradient of a function which depends only on positions, not on momenta. This condition is equivalent to saying that the integral of the force vector field around a closed loop — a quantity called the work done by the force while traveling around the loop — is always zero.

The name “conservative” comes from the fact that, suitably interpreted, this last condition is what we mean by saying that energy is conserved. There are many physical phenomena that are often modeled as nonconservative forces; friction is probably the most familiar example. But an overwhelming amount of physical evidence points toward the belief that the fundamental laws of physics do conserve energy, and that physical models of things like friction are merely “neglecting” the energy that leaks into forms like heat and sound that are more difficult to model. It is possible, but somewhat painful, to set up Hamiltonian mechanics in a way that allows for things like friction, but we’re going to focus on the conservative case in this article.

Throughout this short description we’ve already done things that make it difficult to keep track of what needs to be done with all these quantities when we change coordinates. The force on a particle is given by a gradient, and each momentum coordinate is “attached” to both a mass and a particular spatial coordinate. It will often be convenient to switch to a coordinate system that does not isolate each particle so neatly in its own triple of coordinates, or even one that mixes what we are now calling position and momentum coordinates. This all cries out for a description that describes the physical system in terms of points on a manifold, to which we can assign coordinates only once we know how all the mathematical objects involved are defined intrinsically.

Configuration Space and Phase Space

We’ll start our quest for a coordinate-free description of mechanics by fixing a smooth manifold QQ which we’ll call configuration space. You should think of a point in QQ as corresponding the “position” of each component of a physical system at some fixed time. Some examples worth keeping in mind are:

  1. A particle moving in R3\mathbb{R}^3. In this case, QQ is just R3\mathbb{R}^3.
  2. NN particles moving in R3\mathbb{R}^3. We specify the configuration of this system by specifying the position of each particle, which we can do using a point in R3N\mathbb{R}^{3N}.
  3. Two particles connected by a rigid rod of length \ell. We could describe the configuration of this system using a point in {(a,b)R3×R3:ab=}\{(a,b)\in\mathbb{R}^3\times\mathbb{R}^3:||a-b||=\ell\}.
  4. A rigid body moving through space. We could describe its configuration using a point in R3×SO(3)\mathbb{R}^3\times SO(3), specifying the location of the object’s center of mass and its orientation.

In particular, specifying a point qQq\in Q gives you an instantaneous snapshot of the system, but it doesn’t tell you anything about it’s changing. Even if you have a complete description of the physics, this doesn’t provide enough information to predict how the system will evolve in the future. (Imagine a ball rolling on a table; if you just know its position and not its velocity you don’t know where it’s about to move.)

So if we want to describe the state of a physical system in a way that allows us to do physics, the state needs to carry some additional information. Different formulations of analytical mechanics do this in different ways, and unfortunately the version used by the Hamiltonian formulation is one of the more opaque choices: the state of a physical system is given by specifying a point in the cotangent bundle of QQ, which to match with physicists’ conventions we will call phase space. We’ll usually use the coordinates (q,p)(q,p) to refer to a point in phase space. (So qq is a point in QQ and pp is a cotangent vector at qq.) When the system is in the state (q,p)(q,p), we’ll call pp the momentum.

The first time I encountered this setup I was confused by the fact that momentum is represented by a cotangent vector rather than a tangent vector — after all, the velocity of a particle is definitely a tangent vector, and momentum is supposed to be a multiple of it.

It will be easier to talk about this once we have the finished picture in front of us, but we can say a bit right now. While velocities should inarguably be tangent vectors — a velocity is literally the time derivative along the path that a particle is following — it’s actually not clear that this extends to momenta. When we use the word “momentum” we will mean something more general than “mass times velocity”; the two will coincide for Newtonian mechanics in rectangular coordinates but they can be different in general. For example, if we have a particle of mass mm moving in R2\mathbb{R}^2 and use polar coordinates, the momentum corresponding to the θ\theta coordinate turns out to be the angular momentum xpyypxxp_y-yp_x, which is not m(dθ/dt)=(xpyypx)/(x2+y2)m( d \theta/ d t)=(xp_y-yp_x)/(x^2+y^2).

Of course I have not yet said what it means for one expression or another to be the “right” generalization of momentum to a given coordinate system, but the point is that the relationship between momentum coordinates and derivatives of the corresponding position coordinates depends on the physical meaning of those coordinates; it’s not something you can extract just by looking at configuration space. (Indeed, this is true even in rectangular coordinates: the relationship depends on the mass of the particle, which is a physical quantity.) We’ll return to this question later.

Symplectic Geometry

To properly describe Hamiltonian mechanics, we’ll need some basic facts about symplectic geometry, which we’ll briefly go over now in case they aren’t familiar.

A symplectic manifold is a smooth manifold MM together with a choice of a nondegenerate closed 2-form ω\omega on MM. (That is, the antisymmetric bilinear form ω\omega defines on each tangent space is nondegenerate and dω=0d\omega=0. We’ll see soon how each of these two conditions is relevant.) A diffeomorphism between two symplectic manifolds that preserves the symplectic form is called a symplectomorphism.

The main thing that will turn out to make Hamiltonian mechanics go is the fact that the cotangent bundle of a manifold naturally has the structure of a symplectic manifold. The cotangent bundle of any manifold comes with a canonical symplectic form which can be described pretty simply. We start by defining the tautological 1-form on TQT^*Q. Given a tangent vector vv at a point (q,p)TQ(q,p)\in T^*Q, we’ll write θ(v)=p(π(v))\theta(v)=p(\pi_*(v)), where π:TQQ\pi:T^*Q\to Q is the projection map. Given a local coordinate system q1,,qnq_1,\ldots,q_n on a chart on QQ, we also get coordinates p1,,pnp_1,\ldots,p_n on each cotangent space. I encourage you to check that in these coordinates θ=i=1npidqi.\theta=\sum_{i=1}^n p_i d q_i. We then define ω=dθ\omega=d\theta, so that in these same coordinates ω=i=1ndpidqi,\omega=\sum_{i=1}^n d p_i\wedge d q_i, which is clearly nondegenerate.

One very striking difference between Riemannian and symplectic geometry is that in a neighborhood of any point on any symplectic manifold (even if it’s not a cotangent bundle) there is a coordinate system q1,,qn,p1,,pnq_1,\ldots,q_n,p_1,\ldots,p_n for which ω=idpidqi\omega=\sum_i d p_i\wedge d q_i. This result is called “Darboux’s theorem” and the qq’s and pp’s are said to provide canonical coordinates. This means that, very unlike on a Riemannian manifold, a symplectic manifold has no local geometry, so there’s no symplectic analogue of anything like curvature.

Even though phase space will end up being the only symplectic manifold we’ll use to do physics, it’s actually cleaner to describe the required machinery in more generality, so for now MM will be an arbitrary symplectic manifold. We’ll return to the case of phase space soon.

Since ω\omega puts a nondegenerate bilinear form on each tangent space, it gives an isomorphism between the tangent and cotangent spaces at each point of MM, and therefore an isomorphism between vector fields and 1-forms. We will especially be interested in this isomorphism in the case where the 1-form is dfd f for some function ff. In this case, we’ll write XfX_f for the unique vector field for which ω(Y,Xf)=df(Y)\omega(Y,X_f)= d f(Y) for all YY. (There is an arbitrary sign choice to make here — I could have said that ω(Xf,Y)=df(Y)\omega(X_f,Y)= d f(Y). As always happens with such things, this decision seems to about evenly split the authors of books on this subject. Hamilton’s equations, discussed below, do have an arrangement of signs that everyone agrees on, and I’ve made choices in this section that are consistent with that.)

This vector field is sometimes called the symplectic gradient of ff; if we had a Riemannian metric instead of ω\omega here then this construction would of course give the usual gradient. It’s worth emphasizing, though, that while a Riemannian gradient of ff (usually) gives a direction in which ff is increasing, the symplectic gradient gives a direction in which ff is constant, since Xf(f)=df(Xf)=ω(Xf,Xf)=0X_f(f)= d f(X_f)=\omega(X_f,X_f)=0.

Given any vector field XX at all on a smooth manifold, the existence and uniqueness of solutions of ODE’s lets us define a flow, that is, a one-parameter group of diffeomorphisms ϕt:MM\phi^t:M\to M for which ddtt=0ϕt(a)=Xa\left.\frac{d}{ d t}\right|_{t=0}\phi^t(a)=X|_a for any point aMa\in M, where the notation XaX|_a means the tangent vector we get by restricting XX to aa. (In general the flow might only be defined for tt in some neighborhood of 0, but this will always be enough for our purposes.) The flow is used to construct the Lie derivative of a tensor field with respect to a vector field. In order to take any sort of derivative of a tensor field on a manifold it’s necessary to be able to compare values of the tensor field at different points, and the flow gives us a way to do this. We define LX(T)=ddtt=0(ϕt)(T).\mathcal{L}_X(T)=\left.\frac{d}{ d t}\right|_{t=0}(\phi^t)^*(T).

A vector field that arises as a symplectic gradient — that is, as XfX_f for some ff — is called a Hamiltonian vector field and the corresponding flow is called a Hamiltonian flow. Note that since the definition of XfX_f depends on ω\omega, in order for the Hamiltonian flow corresponding to a function to make sense, it’s necessary for ω\omega to be preserved by the flow. Otherwise after running time forward using the flow our vector field won’t be XfX_f for the same ff anymore!

So we’d like to characterize the XX for which LXω=0\mathcal{L}_X\omega=0. To do this we invoke Cartan’s magic formula, which says that LX=ιXd+dιX\mathcal{L}_X=\iota_X\circ d+d\circ\iota_X. (Here ιX\iota_X is the interior product with XX, which is the map from dd-forms to (d1)(d-1)-forms defined by ιXα(Y1,,Yd1)=α(X,Y1,,Yd1)\iota_X\alpha(Y_1,\ldots,Y_{d-1})=\alpha(X,Y_1,\ldots,Y_{d-1}).) This is where we use the fact that ω\omega is closed: we see that LXω=ιX(dω)+d(ιXω)=d(ιXω).\mathcal{L}_X\omega=\iota_X( d \omega)+d(\iota_X\omega)=d(\iota_X\omega). If XX corresponds to α\alpha under the isomorphism between vector fields and 1-forms given by ω\omega, then ιXω=α\iota_X\omega=-\alpha by definition, so we see that flowing along XαX_\alpha preserves ω\omega if and only if α\alpha is closed. In particular, since XfX_f corresponds to dfd f, all Hamiltonian flows preserve ω\omega.

It will be important for us to analyze how functions change along Hamiltonian flows; we will, in fact, basically be translating all the physical questions this framework can address into what values functions take along a Hamiltonian flow. That is, if XfX_f is a Hamiltonian vector field, aa is a point in MM, and gg is a function on MM, we’d like to compute dg/dtd g/ d t along the flow of XfX_f through aa. By definition, this is just Xf(g)X_f(g), so by the definition of XgX_g, dgdt=Xf(g)=dg(Xf)=ω(Xg,Xf).\frac{ d g}{ d t}=X_f(g)= d g(X_f)=\omega(X_g,X_f). This fact will turn out to be important enough to warrant a definition: we’ll write {g,f}=ω(Xg,Xf)\{g,f\}=\omega(X_g,X_f) and call it the Poisson bracket of gg and ff. As we just saw, the Poisson bracket measures how gg changes along the Hamiltonian flow corresponding to ff. In particular, {g,f}=0\{g,f\}=0 if and only if ff’s Hamiltonian flow preserves gg. Note also that the Poisson bracket is antisymmetric (because ω\omega is), which means that ff’s Hamiltonian flow preserves gg if and only if gg’s Hamiltonian flow preserves ff. (The Poisson bracket in fact turns out to put a Lie algebra structure on C(M)C^\infty(M) — that is, it also satisfies the Jacobi identity — but we won’t need this fact here.)

So, to summarize:

  • Phase space, being the cotangent bundle of configuration space, has a natural symplectic structure. In coordinates, the symplectic form is given by ω=dpidqi\omega=\sum d p_i\wedge d q_i.
  • On any symplectic manifold, we can associate to each function ff a vector field XfX_f, and vector fields arising in this way are called Hamiltonian vector fields. Flowing along a Hamiltonian vector field always preserves the symplectic form.
  • This construction lets us define the Poisson bracket {g,f}=ω(Xg,Xf)\{g,f\}=\omega(X_g,X_f), which measures both how ff changes when flowing along XgX_g and how gg changes when flowing along XfX_f.
  • Since flowing along a vector field XX preserves ω\omega if and only if the corresponding 1-form α\alpha is closed, we can reverse this entire process if MM is simply connected. In that case, α=df\alpha= d f for some ff, so X=XfX=X_f, and ff is uniquely determined up to adding a constant. So if MM is simply connected (or if not, then in an open neighborhood of any point), there is a one-to-one correspondence between vector fields whose flow preserves ω\omega and smooth functions on MM modulo constants.

Phase Space and Hamiltonians

We’re now ready to see how this machinery can allow us to do physics. We fix a manifold QQ called configuration space, and we write P=TQP=T^*Q for its cotangent bundle, which we’ll call phase space. The basic assumption of Hamiltonian mechanics is that the way we “run time forward” in our physical system is by following the Hamiltonian flow corresponding to a distinguished function HH, which we’ll call the Hamiltonian. That is, if our system is in state (q,p)(q,p) at time t0t_0 and ϕt\phi^t is the flow along XHX_H, then our system is in state ϕt(q,p)\phi^t(q,p) at time t+t0t+t_0.

Suppose we are using local coordinates q1,,qn,p1,,pnq_1,\ldots,q_n,p_1,\ldots,p_n in which the symplectic form can be written as ω=idpidqi\omega=\sum_i d p_i\wedge d q_i. Given two vector fields X=(aiqi+bipi),X=(aiqi+bipi),X=\sum(a_i\partial_{q_i}+b_i\partial_{p_i}),\quad X'=\sum(a'_i\partial_{q_i}+b'_i\partial_{p_i}), we get that ω(X,X)=(biaiaibi)\omega(X,X')=\sum(b_ia'_i-a_ib'_i). I encourage the reader to verify that this means that for a function ff, Xf=i(fpiqifqipi),X_f=\sum_i\left(\frac{ \partial f}{ \partial p_i}\partial_{q_i}-\frac{ \partial f}{ \partial q_i}\partial_{p_i}\right), and that the Poisson bracket is given by {f,g}=i(fqigpifpigqi).\{f,g\}=\sum_i\left(\frac{ \partial f}{ \partial q_i}\frac{ \partial g}{ \partial p_i}-\frac{ \partial f}{ \partial p_i}\frac{ \partial g}{ \partial q_i}\right).

If we’ve chosen a Hamiltonian HH, then the value of a function ff evolves through time according to solutions of the differential equation df/dt={f,H}df/ d t=\{f,H\}. Plugging in qiq_i and pip_i for ff, we get Hamilton’s equations: dqidt=Hpi,dpidt=Hqi.\frac{ d q_i}{ d t}=\frac{ \partial H}{ \partial p_i},\qquad\frac{ d p_i}{ d t}=-\frac{ \partial H}{ \partial q_i}.

As we saw in the last section, Hamiltonian flows always preserve their corresponding function, so the Hamiltonian itself ought to measure some scalar quantity that doesn’t change as time moves forward. In classical physics there’s really only one such quantity to choose: the value Hamiltonian at a point in PP ought to be physically interpreted as the total energy of the system when it is in that state.

In particular, consider the case where {f,H}=0\{f,H\}=0. This happens exactly when HH’s Hamiltonian flow preserves ff, that is, ff is conserved by the laws of physics. But it is also equivalent to the claim that ff’s Hamiltonian flow preserves HH, that is, flowing along XfX_f preserves HH. This phenomenon gives us the Hamiltonian mechanics version of a result called Noether’s theorem: going between XfX_f and ff gives us a one-to-one correspondence between Hamiltonian vector fields whose flow preserves HH (that is, vector fields whose flow preserves both HH and ω\omega) and scalar functions which are conserved by the laws of physics.

Let’s see how to recover Newtonian mechanics. In mechanics problems, energy is usually given as a sum of two terms, one representing kinetic energy, written TT, and one representing potential energy, written VV. In our Newtonian example from above, the kinetic energy is the usual T=i12midqidt2=ipi22mi,T=\sum_i\frac12 m_i\left|\frac{ d \mathbf{q}_i}{ d t}\right|^2=\sum_i\frac{|\mathbf p_i|^2}{2m_i}, and the potential energy is simply our potential function VV. So our Hamiltonian all together is: H(q,p)=T(p)+V(q)=ipi22mi+V(q),H(q,p)=T(p)+V(q)=\sum_i\frac{|\mathbf p_i|^2}{2m_i}+V(q), and then, combining the three coordinates for each particle into a single vector, Hamilton’s equations give us dqidt=Hpi=pimi\frac{ d \mathbf q_i}{ d t}=\frac{ \partial H}{ \partial \mathbf p_i}=\frac{\mathbf p_i}{m_i} dpidt=Hqi=Vqi.\frac{ d \mathbf p_i}{ d t}=-\frac{ \partial H}{ \partial \mathbf q_i}=-\frac{ \partial V}{ \partial \mathbf q_i}.

Note that Hamilton’s first equation exactly tells you how to compute the velocity of a particle once you know its momentum, which does something to address the concern we had earlier. Importantly, we see that this relationship depends on Hamiltonian; asking which velocity corresponds to a given momentum is meaningless until you’ve specified the laws of physics.

For our mechanical Hamiltonian, since the kinetic energy term is a homogeneous quadratic function of the momentum, we think of it as corresponding to a Riemannian metric on configuration space. In order to get agreement between the two ways of translating between velocity and momentum — using the inner product or going through Hamilton’s first equation — we need to include the masses of the particles in the metric, so that in our case for two tangent vectors v,vv,v' we have v,vT=imivi,vi\langle v,v'\rangle_T=\sum_im_i\langle\mathbf{v}_i,\mathbf{v}'_i\rangle where ,\langle\cdot,\cdot\rangle is the usual inner product on R3\mathbb R^3. This induces a metric on the cotangent space given by p,pT=ipi,pimi,\langle p,p'\rangle_T=\sum_i\frac{\langle\mathbf{p}_i,\mathbf{p}'_i\rangle}{m_i}, so following this convention the Hamiltonian would be written H(q,p)=12p,pT+V(q).H(q,p)=\frac12\langle p,p\rangle_T+V(q).

Examples

The Harmonic Oscillator

First, let’s consider a harmonic oscillator. This is a physical system with one degree of freedom qq in which the potential energy has the form 12kq2\frac12kq^2 for some kk. (The factor of 12\frac12 is of course purely for convenience.) This is a decent model for, for example, a mass attached to a light, frictionless spring.

If the mass of this particle is mm, then our Hamiltonian is H=p2/2m+kq2/2H=p^2/2m+kq^2/2, and Hamilton’s equations are dqdt=pmdpdt=kq.\frac{ d q}{ d t}=\frac pm\qquad\frac{ d p}{ d t}=-kq. This is of course a very easy pair of differential equations to solve: you get, writing α=k/m\alpha=\sqrt{k/m}, that q=Asin(α(tt0))q=A\sin(\alpha(t-t_0)) and p=Aαcos(α(tt0))p=A\alpha\cos(\alpha(t-t_0)) for some AA and t0t_0.

So far this analysis is basically identical to what we would have gotten using regular Newtonian mechanics. Still, even though we just found a solution, we can get some practice with this machinery by performing a change of coordinates that makes the solution even easier. These solutions lie on the ellipse (αq)2+p2=A2(\alpha q)^2+p^2=A^2, which suggests that we ought to rescale qq and pp and switch to polar coordinates.

So let’s first try setting r=kq2+p2/mr=\sqrt{kq^2+p^2/m} and θ=arctan(kmq/p)\theta=\arctan(\sqrt{km}q/p); these are the polar coordinates corresponding to p~=p/m\tilde p=p/\sqrt m and q~=kq\tilde q=\sqrt k q. Sadly, this doesn’t quite do what we want: these aren’t canonical coordinates, that is, the symplectic form isn’t drdθd r\wedge d \theta. Indeed, ω=dpdq=mkdp~dq~=mkrdrdθ.\omega= d p\wedge d q=\sqrt{\frac mk} d \tilde p\wedge d \tilde q=\sqrt{\frac mk}r d r\wedge d \theta.

It would be possible to work out the form of the Poisson bracket in these coordinates and see what equations we get, but it’s even easier to just find coordinates that are canonical and use those. We can do this by replacing rr with s=12m/kr2=r2/2αs=\frac12\sqrt{m/k}r^2=r^2/2\alpha. We then have ω=dsdθ\omega= d s\wedge d \theta and H=αsH=\alpha s, and so Hamilton’s equations are dθdt=Hs=αdsdt=Hθ=0.\frac{ d \theta}{ d t}=\frac{ \partial H}{ \partial s}=\alpha\qquad\frac{ d s}{ d t}=-\frac{ \partial H}{ \partial \theta}=0.

This analysis makes it obvious that ss is a conserved quantity — that’s literally what the second equation says. This is equivalent to saying that {s,H}=0\{s,H\}=0, which we could have checked in the original coordinates if we wanted. In this case this is all kind of silly, since ss is just a constant multiple of HH; the next example will feature a less silly version of this phenomenon.

The Two-Body Problem

Consider two particles, with masses m1m_1 and m2m_2, moving under the influence of a conservative force that depends only on their relative positions, that is, on the difference q1q2\mathbf q_1-\mathbf q_2. (You might imagine for example two celestial bodies moving under the influence of gravity.) So our configuration space is R3×R3\mathbb R^3\times\mathbb R^3, and our Hamiltonian is H=p122m1+p222m2+V(q1q2)H=\frac{|\mathbf p_1|^2}{2m_1}+\frac{|\mathbf p_2|^2}{2m_2}+V(\mathbf q_1-\mathbf q_2) for some function VV.

We can already see another case of Noether’s theorem here. The fact that VV depends only on q1q2\mathbf q_1-\mathbf q_2 means that if we translate both particles by the same vector and leave their momenta fixed, HH is unchanged. For concreteness let’s consider translating in the positive xx direction; this corresponds to flowing along the vector field x1+x2\partial_{x_1}+\partial_{x_2} (writing xix_i for the xx component of qi\mathbf q_i). These translations also self-evidently preserve the symplectic form, and so our vector field must be Hamiltonian. And indeed, it’s XfX_f for f=(p1)x+(p2)xf=(p_1)_x+(p_2)_x, the xx component of the total momentum of the system. You could also check directly that the Poisson bracket {f,H}\{f,H\} is zero. So we see that a Hamiltonian that is preserved by translations in some direction corresponds to physics that preserve the component of total momentum in that direction.

There is a common change of coordinates that makes this system a bit easier to analyze: write Q=m1q1+m2q2m1+m2q=q1q2.\mathbf Q=\frac{m_1\mathbf q_1+m_2\mathbf q_2}{m_1+m_2}\qquad\mathbf q=\mathbf q_1-\mathbf q_2. Now, given any diffeomorphism ff from a manifold QQ to itself, we can lift it to a diffeomorphism on the cotangent bundle by setting f(q,p)=(f(q),(f1)(p))f^\sharp(q,p)=(f(q),(f^{-1})^*(p)). We call ff^\sharp the cotangent lift of ff. It turns out that a diffeomorphism on a cotangent bundle has the form of a cotangent lift if and only if it preserves the canonical 1-form θ\theta. To compute ff^\sharp in coordinates, first note that (f1)(p)(v)=p((f1)(v))=p((f)1(v))(f^{-1})^*(p)(v)=p((f^{-1})_*(v))=p((f_*)^{-1}(v)) by definition, so the matrix for (f1)(f^{-1})^* is the transpose of the inverse of the Jacobian of ff.

So in particular, the cotangent lift gives us a natural way to turn any diffeomorphism on configuration space into a symplectomorphism on phase space. Once can check that doing this for our change of coordinates here gives us the momentum coordinates P=p1+p2p=m2p1m1p2m1+m2,\mathbf P=\mathbf p_1+\mathbf p_2\qquad\mathbf p=\frac{m_2\mathbf p_1-m_1\mathbf p_2}{m_1+m_2}, and our Hamiltonian becomes H=P22M+p22m+V(q),H=\frac{|\mathbf P|^2}{2M}+\frac{|\mathbf p|^2}{2m}+V(\mathbf q), where M=m1+m2M=m_1+m_2 and m=m1m2/(m1+m2)m=m_1m_2/(m_1+m_2). (The reader is encouraged to verify these computations; it’s good practice!)

The point of this change of coordinates was to “decouple” the two parts of the Hamiltonian. The coordinate Q\mathbf Q is called the center of mass of the system; what we’ve shown is that our original system is equivalent to one with a free particle of mass MM moving with the center of mass and a particle of mass mm moving under the influence of the potential VV.

Central Potentials

As one more example of the relationship between symmetries and conservation laws, let’s consider a particle moving in a potential that depends only on the distance of that particle from the origin. That is, H=p22m+V(q).H=\frac{|\mathbf p|^2}{2m}+V(|\mathbf q|). This is a good model for a planet moving around the sun under the influence of Newtonian gravity; in this case we’ll have V(r)=GMm/rV(r)=-GMm/r, where MM is the mass of the sun and GG is the gravitational constant.

But no matter what VV is, the fact that it depends only on the length of q\mathbf q means that the physics is preserved by any rotation about the origin. It’s worth being precise about what we mean by this: rotation about the origin is a diffeomorphism on configuration space, and to extend it to a symplectomorphism on phase space we need to take its cotangent lift.

If RθR_\theta is the rotation by θ\theta around the zz axis, then (Rθ)(x,y,z,px,py,pz)=(cosθxsinθy, sinθx+cosθy, z,cosθpxsinθpy, sinθpx+cosθpy, pz).\begin{aligned} (R_\theta)^\sharp(x,y,z,p_x,p_y,p_z)=(&\cos\theta x-\sin\theta y,\ \sin\theta x+\cos\theta y,\ z,\\ &\cos\theta p_x-\sin\theta p_y,\ \sin\theta p_x+\cos\theta p_y,\ p_z).\end{aligned} (The transpose of the inverse of RθR_\theta is just RθR_\theta itself, since RθR_\theta is orthogonal.) This is the map that has to preserve the Hamiltonian if our analysis is to go through, which means it’s important that HH depends only on p|\mathbf p| and q|\mathbf q|.

To get the vector field whose flow produces this symmetry, we take the derivative of this with respect to θ\theta at θ=0\theta=0. We get X=yx+xypypx+pxpy,X=-y\partial_x+x\partial_y-p_y\partial_{p_x}+p_x\partial_{p_y}, which is XLzX_{L_z} where Lz=xpyypxL_z=xp_y-yp_x.

We call LzL_z the angular momentum of our particle about the zz axis, and this analysis shows that any physics arising from a Hamiltonian which is symmetric under rotations about the zz axis.

A Note

It is easy to construct Hamiltonians which aren’t invariant under rotations or translations. Indeed, the one from the last example isn’t preserved by translations, and correspondingly we shouldn’t expect momentum to be conserved by, say, Newtonian gravity. Nonetheless, it’s believed by most physicists that the fundamental laws that the universe runs on, whatever they are, do have these two symmetries — the results of a physical experiment don’t depend on where you do it or which way you were facing — and that therefore conservation of linear and angular momentum hold in general.

If you are presented with a Hamiltonian that doesn’t have this symmetry, like the one in the last example, the assumption is that there’s some part of the physics that you’re neglecting, and that if you included it the symmetry would appear again. For example, if we imagine the last example to be about a planet moving around the sun, we are neglecting the influence of the planet’s gravity on the sun, and if we included it we would be in the situation from the previous example about the two-body problem.

There is another symmetry that classical physics obeys: it also shouldn’t matter when an experiment is performed. Under our formalism, time translation comes from flowing along the vector field given by HH itself, so this symmetry corresponds to the conservation of energy. This example is a bit different from the others, though, because the relationship is true by definition! This is an artifact of the way we set up the Hamiltonian formalism: it picks out time translation as “special,” as the flow that corresponds to the Hamiltonian, and specifying the Hamiltonian is the way we specify the laws of physics.

Like with momentum, it is possible to “break” the time translation symmetry (and therefore energy conservation) by using a Hamiltonian that depends explicitly on time. This is useful when the forces acting on the particles or the constraints of the physical system change over time. (An example of the latter that’s often trotted out in physics classes is a bead attached to a spinning circle of wire.) I’ve chosen not to consider the case of time-dependent Hamiltonians or Lagrangians in this article for simplicity, but the theory does continue to work just fine in that setting.

Lagrangian Mechanics

Recall that Hamilton’s equations are given by dqidt=Hpi,dpidt=Hqi.\frac{ d q_i}{ d t}=\frac{ \partial H}{ \partial p_i},\qquad\frac{ d p_i}{ d t}=-\frac{ \partial H}{ \partial q_i}. As I mentioned briefly in that section, the first equation can give us a sort of answer to the question we had earlier about the relationship between momentum and velocity: it supplies, for every point (q,p)TQ(q,p)\in T^*Q a tangent vector vTqQv\in T_qQ, and it tells us to interpret that tangent vector as a velocity. Provided that this procedure is invertible, which it will be in all of the cases we care about, we can think of it as giving us a “change of coordinates” from the cotangent bundle to the tangent bundle. This will turn out to give us another formulation of mechanics, called Lagrangian mechanics, which, while formally equivalent to everything we’ve done so far, sheds light on different aspects of mechanics than the Hamiltonian picture.

The Legendre Transform

We’ll take this assignment of tangent vectors to points in phase space as our starting point. If the tangent vector vv comes from (q,p)(q,p) in this way, our goal will be to rewrite Hamilton’s equations in a way that depends on qq and vv rather than on qq and pp. Put another way, the Hamiltonian gave us a way to turn paths in the cotangent bundle into paths in the tangent bundle, and we’d like to see what restrictions Hamilton’s equations impose directly on these new paths.

Hamilton’s first equation was, in some sense, “used up already” in the definition of vv. By switching coordinates to vv and interpreting vv as a velocity, this equation tells us simply that v=dq/dtv= d q/ d t, that is, at every point (q,v)=γ(t)(q,v)=\gamma(t) along our path, we should have v=γ(t)v=\gamma'(t). This means that we might as well just talk about paths in configuration space rather than its tangent bundle; we can identify such a path with its lift to the tangent bundle and do away with one of our two equations.

So it remains to translate the second equation into something to do with vv. Note that the qq coordinate has very little to do with our goal here: we’re trying to turn a statement about the cotangent bundle into a statement about the tangent bundle, and all of the action is happening in the fibers of these two bundles. So it will be cleaner to ignore qq for now and examine how our coordinate change procedure behaves on a general vector space.

Suppose we have a smooth function HH on a vector space UU. (UU will end up being the cotangent space at a point of configuration space.) Then for each point pUp\in U, dHd H gives a linear map from the tangent space TpUT_pU to R\mathbb R. But since UU is a vector space, there is a canonical identification of each of its tangent spaces with UU itself, so we can in fact think of dHd H as giving us a way of assigning, to each pUp\in U, a linear map from UU to R\mathbb R, that is, an element of UU^*.

So HH gives us a map WH:UUW_H:U\to U^*. (It’s important to emphasize that WHW_H has no reason to be linear, so this is not any sort of inner product.) Geometrically, WHW_H takes pp to the element of UU^* corresponding to the linear part of the linear approximation to HH at pp.

Suppose now that WHW_H is invertible. (This will happen, for example, if HH is a constant plus a nondegenerate quadratic form in pp, as is the case for our mechanical Hamiltonians from the last section.) Then it turns out that WH1W_H^{-1} arises in the same way as WHW_H: there is a function LL so that WL=WH1W_L=W_H^{-1}. We call LL the Legendre transform of HH.

In fact, LL can be computed explicitly: one can show that when WHW_H is invertible, WL=WH1W_L=W_H^{-1} if and only if L(WH(p))+H(p)=WH(p),pL(W_H(p))+H(p)=\langle W_H(p),p\rangle up to an additive constant, where ,\langle\cdot,\cdot\rangle is the pairing between UU^* and UU. (Usually we let the constant be zero.) In particular, this makes it clear that the Legendre transform is an involution: HH is also the Legendre transform of LL.

To see all this, first note that by definition, for any pUp\in U, we have WH(p),p=dHp(p)\langle W_H(p),p'\rangle=dH_p(p'), where on the right hand side we think of pp' as living in the tangent space at pp. So, taking the derivative of both sides and pairing with an arbitrary pp', we get that they are equal if and only if (DWH)p(p),WL(WH(p))=(DWH)p(p),p,\langle(DW_H)_p(p'),W_L(W_H(p))\rangle=\langle(DW_H)_p(p'),p\rangle, where (DWH)p(DW_H)_p is the total derivative of WHW_H at pp. Since WHW_H is invertible, the left side of this pairing can be anything, so this is true if and only if WL(WH(p))=pW_L(W_H(p))=p. The reader is encouraged to fill in the missing steps of this argument; it’s a good exercise in following all the relevant definitions.

So what did this giant mess of symbols get us? To any function HH on UU we’ve associated a new function LL on UU^* so that their “coordinate-change functions” are inverses of each other. In coordinates p1,,pnp_1,\ldots,p_n on UU and v1,,vnv_1,\ldots,v_n on UU^*, this means that vi=Hpi,pi=Lvi,v_i=\frac{ \partial H}{ \partial p_i},\qquad p_i=\frac{ \partial L}{ \partial v_i}, and L=v,pH.L=\langle v,p\rangle-H. In the case we’re interested in, when we’re doing this in every fiber of the cotangent bundle and HH is the Hamiltonian of some physical system, we call LL the Lagrangian of that same system.

What does this look like for the mechanical Hamiltonians we were working with before? There we had H=T+V=ipi22mi+V(q).H=T+V=\sum_i\frac{|\mathbf p_i|^2}{2m_i}+V(q). So we get vi=H/pi=pi/mi\mathbf v_i= \partial H/ \partial \mathbf p_i=\mathbf p_i/m_i and v,p=ipi2mi=2T\langle v,p\rangle=\sum_i\frac{|\mathbf p_i|^2}{m_i}=2T. This means the Lagrangian turns out to be L=2T(T+V)=TVL=2T-(T+V)=T-V.

It’s worth stressing again that despite the fact that we’ve arrived at an expression for LL that looks very similar to HH, there is an additional important difference between the two aside from the fact that the sign on VV has flipped: HH is a function on the cotangent bundle and LL is a function on the tangent bundle! The relationship between momenta and velocities — that is, between pp and vv — depends entirely on the physics being modeled, so unless you’ve picked a Hamiltonian or a Lagrangian this relationship remains unspecified. Only when this relationship has been established does it even make sense to write something like L+H=2TL+H=2T; if we were being more careful we would actually write something like L(q,WH(p))+H(q,p)=2T(p)L(q,W_H(p))+H(q,p)=2T(p).

Recall that Hamilton’s first equation now just tells that the tangent vector we pick at every point of our path should be the time derivative of the path at that point, so we are just left with translating the second into a statement about Lagrangians. That equation was dpidt=Hqi.\frac{ d p_i}{ d t}=-\frac{ \partial H}{ \partial q_i}. Now, since we performed our Legendre transform just in the fibers of the cotangent and tangent bundles, nothing interesting happened to derivatives with respect to qq coordinates, so the fact that L=v,pHL=\langle v,p\rangle-H means that L/qi=H/qi\partial L/ \partial q_i=- \partial H/ \partial q_i. This, combined with the fact that pi=L/vip_i= \partial L/ \partial v_i gives us Lagrange’s equation: ddt(Lvi)=Lqi.\frac{d}{ d t}\left(\frac{ \partial L}{ \partial v_i}\right)=\frac{ \partial L}{ \partial q_i}.

It is more common for authors to write q˙i\dot q_i where I’ve written viv_i here, using the usual physicists’ convention of dots to indicate time derivatives. The reason I didn’t do this was to avoid a common confusion: when you write expressions like L/q˙i\partial L/ \partial \dot q_i it’s tempting to assume that one is supposed to compute q˙i\dot q_i from qiq_i or something. But the Lagrangian is a function on the tangent bundle, not just on configuration space, and L/vi\partial L/ \partial v_i is just a derivative with respect to one of the coordinates on the tangent bundle. Given a smooth path in configuration space there is a natural way to lift it and obtain a path in the tangent bundle, and the physical assumption we are making is that the paths that happen physically are exactly the ones whose lifts satisfy Lagrange’s equation.

Note that we end up with half as many equations as before — we have one for every position coordinate, rather than one for every position or momentum coordinate. But because the Lagrangian depends on both qq and vv and we fix viv_i to be the time derivative of qiq_i, these end up being second-order differential equations rather than the first-order Hamilton’s equations, so we still need the same amount of information in our initial conditions to solve them as before.

An Example

Let’s look at a concrete example of an at least somewhat nontrivial mechanics problem and see how to solve it using the Lagrangian formalism. This problem still would be feasible to tackle using the techniques from a Newtonian mechanics class, but the Lagrangian approach makes it quite straightforward.

Consider a two-dimensional world with a mass hanging from a very light spring. The end of the spring without the mass is fixed in place and the other end is free to swing around. We’ll pick coordinates r,θr,\theta for our configuration space, where rr is the current length of the spring and θ\theta is the angle the spring makes with the vertical. We’ll write the corresponding time derivatives as vrv_r and vθv_\theta. These coordinates have the nice property that their time derivatives are always perpendicular, so the speed of the particle is vr2+(rvθ)2\sqrt{v_r^2+(rv_\theta)^2}. Therefore, the kinetic energy is simply T=12mvr2+12mr2vθ2T=\frac12mv_r^2+\frac12mr^2v_\theta^2.

There are two contributions to the potential energy: gravity and the restoring force from the spring. Springs are well modeled by potentials of the form Vspring=12k(r)2V_{\mathrm{spring}}=\frac12 k(r-\ell)^2 for some constant kk, where \ell is the “natural length” of the spring. We encountered a potential of this form when we discussed the harmonic oscillator. Gravity (at least in cases like this where we can neglect the varying distances from the center of the earth) produces a constant acceleration in all falling bodies, and I encourage you to check that this is the same as asserting that Vgravity=mghV_{\mathrm{gravity}}=mgh where gg is that constant acceleration and hh is the height of a particle above an arbitrary reference height.

Putting this all together, we get L=12m(vr2+r2vθ2)12k(r)2+mgrcosθ.L=\frac12m(v_r^2+r^2v_\theta^2)-\frac12k(r-\ell)^2+mgr\cos\theta. We can plug this into Lagrange’s equation and get, simplifying a bit and using physicists’ dot notation for derivatives, mr¨=mrθ˙2+mgcosθk(r)m\ddot{r}=mr\dot\theta^2+mg\cos\theta-k(r-\ell) and mrθ¨+2mr˙θ˙=mgrsinθ.mr\ddot{\theta}+2m\dot r\dot\theta=-mgr\sin\theta.

The Calculus of Variations

There is another, very different way to obtain the Lagrange’s equation involving a technique called the calculus of variations. The calculus of variations is a sort of infinite-dimensional calculus performed on spaces of functions rather than finite-dimensional vector spaces. It provides tools for doing things like finding local minima or maxima of some functional on such a space of functions. I’ll sketch the part of this story that produces Lagrangian mechanics.

Suppose we are given an arbitrary function L:R×TQRL:\mathbb{R}\times TQ\to\mathbb{R}, where we think of the first R\mathbb{R} as representing a time coordinate. Given any path γ:[0,1]Q\gamma:[0,1]\to Q, we consider the following quantity, called the action: S(γ)=01L(t,γ(t),γ(t))dt.S(\gamma)=\int_0^1L(t,\gamma(t),\gamma'(t)) d t. (Here again t(γ(t),γ(t))t\mapsto (\gamma(t),\gamma'(t)) is the natural lift of γ\gamma to a path in the tangent bundle of QQ; we are abusing notation slightly writing the components the way we are here.) We want to find, out of all paths γ\gamma with a fixed starting and ending point, the ones that locally minimize or maximize SS.

We’ll answer this by considering smooth homotopies h:(ϵ,ϵ)×[0,1]Qh:(-\epsilon,\epsilon)\times[0,1]\to Q for which h(0,t)=γ(t)h(0,t)=\gamma(t) and which leave the endpoints fixed. As usual, we’ll write hu(t)=h(u,t)h_u(t)=h(u,t) If γ\gamma minimizes the action, then it ought to in particular be a local minimum within any such homotopy. That is, 0 should be a critical point of the function uS(hu)u\mapsto S(h_u).

For ease of notation from here on out, I’ll write γˉ(t)=(t,γ(t),γ(t))\bar\gamma(t)=(t,\gamma(t),\gamma'(t)) for the argument to SS, so S(γ)=01L(γˉ(t))dtS(\gamma)=\int_0^1 L(\bar\gamma(t)) d t.

So we need dduu=001L(t,h(u,t),ht(u,t))dt=0,\left.\frac{d}{ d u}\right|_{u=0}\int_0^1 L\left(t,h(u,t),\frac{ \partial h}{ \partial t}(u,t)\right) d t=0, and we can turn this into a condition which doesn’t mention hh with a bit of computation. We can pull the derivative inside the integral sign and use the chain rule to get that the left side is 01i=1n[hiu(0,t)Lqi(γˉ(t))+2hiut(0,t)Lvi(γˉ(t))]dt\int_0^1\sum_{i=1}^n\left[\frac{ \partial h_i}{ \partial u}(0,t)\frac{ \partial L}{ \partial q_i}(\bar\gamma(t))+\frac{\partial^2 h_i}{ \partial u \partial t}(0,t)\frac{ \partial L}{ \partial v_i}(\bar\gamma(t))\right] d t for any coordinate system (q1,,qn,v1,,vn)(q_1,\ldots,q_n,v_1,\ldots,v_n) on the tangent bundle; here hih_i is the qiq_i component of hh. Then, using integration by parts, we can transform the second term a bit more, turning it into: i=1n[(hiu(0,t)Lvi(γˉ(t)))t=0101hiu(0,t)ddt(Lvi(γˉ(t)))dt].\sum_{i=1}^n\left[\left.\left(\frac{ \partial h_i}{ \partial u}(0,t)\frac{ \partial L}{ \partial v_i}(\bar\gamma(t))\right)\right|_{t=0}^1-\int_0^1\frac{ \partial h_i}{ \partial u}(0,t)\frac{d}{ d t}\left(\frac{ \partial L}{ \partial v_i}(\bar\gamma(t))\right) d t\right]. The fact that the endpoints are fixed means that the boundary term on the left is zero: at t=0t=0 and t=1t=1, hu(t)h_u(t) doesn’t vary as I change uu, so h/u=0\partial h/ \partial u=0. So we’re left with just the second term which we combine with the missing term above to get that we want 01i=1nhiu(0,t)[Lqi(γˉ(t))ddt(Lvi(γˉ(t)))]dt=0.\int_0^1\sum_{i=1}^n\frac{ \partial h_i}{ \partial u}(0,t)\left[\frac{ \partial L}{ \partial q_i}(\bar\gamma(t))-\frac{d}{ d t}\left(\frac{ \partial L}{ \partial v_i}(\bar\gamma(t))\right)\right] d t=0.

Finally, we note that, since h/u\partial h/ \partial u could be any vector at all at each tt, the only way that integral is zero for every hh is if, for each ii, the part in square brackets is identically zero as a function of tt. But that is exactly the same as saying that γ\gamma must satisfy Lagrange’s equation!

So we get a description of the laws of physics that looks much more “global” than the one from symplectic geometry above: a physical system gets from one point in configuration space to another by following a path that is a critical point of the action functional SS. The original description told us how the position and momentum (or velocity) evolve through time if we know them at one moment, whereas this new description puts a restriction on the entire path at once. We’ll have more to say about this in the next section.

A Little Philosophy

The first time I encountered Noether’s theorem gave me a particular feeling about physics that I’ll do my best to explain here. There is a sense in which, for example, the fact that the laws of physics are invariant under rotation is “obvious” and fact that they conserve angular momentum isn’t. After all, an expression like Lz=xpyypxL_z=xp_y-yp_x looks a little arbitrary. Why should it be a fundamental of nature that that’s conserved rather than, say, xpy+ypxxp_y+yp_x?

Now that we’ve built up the machinery that connects rotation and angular momentum, it’s very tempting to say that we’ve “reduced” the fact that angular momentum is conserved to “merely” the fact that physics is rotationally symmetric. This is, of course, not true: we have reduced it to the fact that physics is rotationally symmetric and the fact that physics can be described using Hamiltonian mechanics. (There is of course a Lagrangian version of Noether’s theorem too, and in fact this is the form in which it was originally stated.) Indeed, trying to prove Noether’s theorem from Hamiltonian mechanics gets things a bit backwards: the fact that something like Noether’s theorem is true is baked into the fact that all the flows we consider are along Hamiltonian vector fields.

This is emphatically a physical and not a mathematical assertion — it is certainly possible to imagine a universe where physics doesn’t work this way, but this does not seem to be the universe we live in. We can’t hope to prove that conservation of angular momentum is “the only way it could logically have been” without making some assumptions about the nature of the laws of physics. What we can do, and what in some sense is the main goal of all of physics, is to try to find a way to state those laws with as few moving parts as possible, and I think that at least is something that this whole discussion manages to do. In either math or physics, any time you can take an arbitrary-looking algebraic expression and show that it falls out naturally from less arbitrary-looking mathematical objects you’ve made some progress toward making the world more intelligible.

This is partly why I took the particular path through the material that I did. It’s actually much more common in mechanics texts to do the Lagrangian version first and derive the Hamiltonian picture from it using a Legendre transform. I always had some trouble making this approach fit in my head without seeming like magic. Until you’ve seen how it fits into the story it’s unclear why the Lagrangian has anything to do with anything, and especially unclear why one would want to minimize the action. Even if, as I keep saying, it’s necessary to make up some physical assumption at some point to make this all go, I feel like the assumptions made here feel at least a little less made up; the reader is of course free to disagree.

As I sort of said when discussing it above, it isn’t quite as “magical” as it might first appear — the claim is not that the universe searches over all possible paths for the global minimum of the action, but rather that the one that occurs is a critical point of the action under any perturbation. Sometimes a path with this property is called “stationary.” So the rule is not quite as global as it might be — it is global in the sense of pertaining to the whole path at once rather than a particular point along it, but not in the sense of pertaining to all possible paths, even those far away from the one under consideration.

Still, it was very striking to people in the early 19th century that Newtonian mechanics could be recast in such a strange way, and I think this reaction makes a lot of sense. The fact that the action-principle version is mathematically equivalent to the more local description shouldn’t detract from this; even in pure mathematics one can often learn a great deal from expressing the same fact in two different ways. A lot of this work happened around the end of the 18th century, and, in keeping with the Enlightenment-era times, many authors took the action principle as evidence of the benevolent guiding hand of Nature making sure the particles don’t waste too much of their precious action on frivolous non-stationary paths or something.

This is, of course, not the way most modern physicists talk, but there is a different sense in which we now understand the story told in this article to be offering us a hint about something beyond Newtonian mechanics. Basically all the theories of modern physics, including quantum mechanics, quantum field theory, general relativity, and more speculative extensions to them like string theory, are most naturally expressed not in terms of forces and equal and opposite reactions but in terms of Lagrangians and Hamiltonians.