This article is also available as a PDF.

Introduction

This article is part of a series on physics for mathematicians, and the second in a sub-series on quantum field theory. It’s a direct sequel to the first quantum field theory article, so if you plan to read this article I recommend being familiar with the content of that one first.

In the previous article, we discussed in detail how to build a quantum version of a Klein–Gordon field theory, and how the result can be interpreted as a theory of noninteracting massive scalar particles. In that story, we were able to build a Hilbert space, find an operator to serve as the Hamiltonian, and explicitly write down a basis of eigenstates of the Hamiltonian, essentially “solving” the entire theory.

In this article, we’re going to turn to the topic of interacting quantum field theories, which are the ones that are meant to describe the way that the elementary particles of the universe actually behave. Specifically, we’ll examine scattering processes, which are processes in which some number of particles collide with each other, producing some new collection of particles which are then measured by some experimental apparatus.

As we will see, it is far from clear how to describe states with even one particle in an interacting QFT, so the bulk of this article will be spent on the question of how to define the states involved in a scattering process. We will then close with a formula that expresses the probability amplitudes resulting from a scattering process in terms of an object we’ll call the “time-ordered \(n\)-point function” of a quantum field operator. The advantage of this formula is that this latter quantity is something we can explicitly compute, although the actual computation will have to wait until the next article in this series.

One big conceptual difficulty in making the transition to interacting theories comes from the fact that free theories are essentially the only ones that admit exact solutions of the type we were able to contruct last time. As we’ll see momentarily, this means we’ll have to take a less direct approach. There is a lot to be said about how a particle description can still emerge in this new context and how to describe the way particles interact, but it won’t come in the form of an explicit description of the eigenstates of the Hamiltonian.

In addition, it is at this point that the mathematical difficulties inherent to quantum field theory start to show up in a truly unavoidable way. The problem is in fact quite severe: the mathematical objects that we’re going to be manipulating here do not actually exist, or at least no one has come up with a way to construct them.

Because of this, I think it’s best to regard the computations we’ll be doing not as statements that could be turned into formal proofs but as descriptions of how a hypothetical working theory of quantum fields ought to behave. Even though we can’t build the objects we want, it is often clear how they would behave if they did exist, and we can often get good quantitative predictions out at the end. But, in the absence of a formal construction, you can’t take these computations too literally.

Some mathematical physicists working in this area have developed an axiomatic approach, where the properties we might want a quantum field theory to have are codified in a formal list of axioms, and the physical results we want can (sometimes, partially) be shown to follow from the axioms. The problem then expresses itself in the fact that no one can prove that an object satisfying the axioms actually exists.

I have chosen not to take such a formal approach here for a couple reasons. First, and most importantly, on one’s first pass through the material I think it’s best to aim to get a feel for the physical content of the theory, and it’s easy to lose sight of “the point” when you’re spending a lot of time manipulating inequalities. Second, writing the article in this way would make it much longer and less interesting, and frankly I don’t feel like I can improve on the existing books on axiomatic quantum field theory.

My approach instead is going to be closer to that of the physicists — I’ll treat distributions like they’re functions, not worry too much about convergence, and occasionally rely on physical rather than mathematical arguments for why some result ought to hold. My hope is that even readers who are interested in the rigorous approach might still find it useful to see this rough map of the trail before traversing it more carefully themselves.

In the many years I’ve been preparing to work on this series, I’ve read a lot of quantum field theory books. I relied heavily on three of them for this article in particular. They are Anthony Duncan’s The Conceptual Framework of Quantum Field Theory, Huzihiro Araki’s Mathematical Theory of Quantum Fields, and the book Quantum Field Theory Lectures of Sidney Coleman, which was compiled after Coleman’s death from lecture notes and videos of his courses.

The Coleman book is probably overall the best QFT textbook I’ve come across among the ones written by and for physicists, and Duncan is a great source if you would like a somewhat more formal and rigorous treatment than the one I’ve given here. Araki’s book takes a very rigorous, axiomatic approach to the entire subject, with lots of integrals and inequalities and epsilons. It is somewhat challenging to get through (and to get used to his notation) but I found that the parts I spent time with clarified a lot of things for me.

I am grateful to Harry Altman, Mithuna Yoganathan, and Jordan Watkins for looking over earlier versions of this article.

Notation and Conventions

Our conventions are mostly the same as in the previous article in this series; I’ll repeat the relevant ones here for ease of reference.

We will use the physicists’ “bra-ket” notation for elements of a Hilbert space. In this convention, a “ket” like \(|\psi\rangle\) denotes an element of the Hilbert space, and \(\langle\psi'|\psi\rangle\) is used for the inner product of \(\psi'\) and \(\psi\). A “bra” like \(\langle\psi|\) can be thought of as an element of the dual space.

For an operator \(A\), you will sometimes see expressions like \(\langle\psi'|A|\psi\rangle\). This can be thought of as either the inner product of \(\psi'\) with \(A\psi\) or as the inner product of \(A^\dagger\psi'\) with \(\psi\), where we use the physicists’ convention of writing \(A^\dagger\) for the adjoint. In particular, if \(A\) is an observable, then \(\langle\psi|A|\psi\rangle\) is the expected value of \(A\) in the state \(|\psi\rangle\). It will be useful to remember that the complex conjugate of \(\langle\psi'|A|\psi\rangle\) can be written \(\langle\psi|A^\dagger|\psi'\rangle\), and that the dual of the ket \(A|\psi\rangle\) is the bra \(\langle\psi|A^\dagger\).

Unlike in the previous article where we switched between the Schrödinger and Heisenberg pictures, from now on we are working purely within the Heisenberg picture, where observables depend on time and states don’t. This means you should think of a state as specifying the entire history of the system in question, not a snapshot of it in time.

We will always use units where \(c=\hbar=1\), and our inner product on spacetime follows the “mostly minus” convention, where \[(t,x,y,z)\cdot(t',x',y',z')=tt'-xx'-yy'-zz'.\] In particular, for any \(x\in\mathbb{R}^4\), we’ll write \(x^2=x\cdot x\). We will extend this notation to partial derivative operators, in particular writing \[\partial^2 = \partial_t^2 - \partial_x^2 - \partial_y^2 - \partial_z^2.\]

We will reserve italic letters for scalars, operators, and points in \(\mathbb{R}^4\), and follow the convention that vectors in \(\mathbb{R}^3\) are denoted by boldface letters like \(\mathbf{x}\). Squared norms of vectors in \(\mathbb{R}^3\) will be denoted \(|\mathbf{x}|^2\).

We will follow the convention, more common for physicists than mathematicians, of writing integrals over spacetime as \(\int d^4x\) and integrals over space as \(\int d^3\mathbf{x}\).

The symmetries of spacetime are given by elements of the Poincaré group, which is the connected 10-dimensional Lie group generated by spacetime translations, spatial rotations, and boosts. The subgroup of the Poincaré group consisting of those elements that fix the origin is called the Lorentz group. It is also connected, it has dimension 6, and it is generated by spatial rotations and boosts. The Lorentz group can also be described as the group of linear automorphisms of \(\mathbb{R}^4\) which (i) preserve the inner product, (ii) are orientation-preserving, and (iii) preserve the forward time direction; the Poincaré group can then be described as a semidirect product of the Lorentz group with \(\mathbb{R}^4\).

In the last article, we had occasion to talk about both a classical field \(\phi\) and its operator-valued counterpart, which we called \(\widehat\phi\). In this piece, the classical field plays essentially no role, so for ease of reading we will drop the hat from the field operators going forward.

Interacting Quantum Field Theories

In the last article, we focused on a free scalar field theory. The theory we examined arises from the Lagrangian \[L(\phi,\partial_t\phi) = \frac12\int d^3\mathbf{x} \left[(\partial\phi)^2 - m^2\phi^2\right],\] which gives rise to classical equations of motion described by the Klein–Gordon equation, \((\partial^2+m^2)\phi=0\). After a decent amount of work, we were able to build a quantum version of this theory, and we saw that it looked like a theory of free relativistic particles of mass \(m\). We were able to give a nice, explicit description of the Hilbert space this theory takes place in — as the Fock space \(\mathcal{F}(\mathcal{L}^2(\mathbb{R}^3))\) — and give a correspondingly explicit description of how the Hamiltonian acts on this Hilbert space, along with the generators of spatial translations, rotations, and boosts.

This theory is, of course, not a great description of the universe, since nothing physically interesting happens in it. It is, as we said when we constructed it, a free quantum field theory, which means that every state is a superposition of \(n\)-particle states in which each particle moves in the same direction forever without interacting with anything. The theories which actually have interesting physics are called interacting theories.

“Interacting” in this context just means “not free,” and, conceptually at least, we arrive at an interacting quantum field theory by following the same process we went through in the last article except starting from a different Lagrangian. Like the one that gave rise to our free theory, all of the Lagrangians that we’ll consider will have the form of an integral over space, that is, we will have something like \[L(\phi,\partial_t\phi)=\int d^3\mathbf{x}\ \mathcal{L}(\phi(\mathbf{x}), \partial_t\phi(\mathbf{x})),\] where \(\mathcal{L}\) is a scalar-valued function called a Lagrangian density. (There are deep reasons, having to do with relativistic causality, for restricting attention to Lagrangians that come from a density, but we’re going to postpone any further discussion of this for now.)

It’s common to arrive at a Lagrangian density for an interacting theory by starting from one for a free theory and adding a small “perturbation” term to it. For example, a very common example in textbooks starts from the Klein–Gordon field, whose Lagrangian density is \(\frac12((\partial\phi)^2 - m^2\phi^2)\), and adds on a small term proportional to \(\phi^4\), giving \[\mathcal{L}=\frac12((\partial\phi)^2 - m^2\phi^2)-\frac{\lambda}{4!}\phi^4.\] (This example is called the “\(\phi^4\) theory.” The \(4!\) is there for later convenience, and the minus sign is to make sure the energy is still bounded below. While it might be useful for concreteness to keep this Lagrangian in the back of your mind while reading the rest of this piece, nothing we’re about to do will actually depend on any details about its form.)

It’s certainly not the case that these are the only Lagrangians one could possibly write down, but when studying Lagrangians like this, the hope is that our knowledge of the free theory will help us to get at least approximate results for the interacting theory when \(\lambda\) is sufficiently small. In pratice, these are often perturbative results, meaning that the relevant quantities are expressed in terms of a power series in \(\lambda\) which we can, with enough computational effort, compute to any desired order in \(\lambda\). While it’s certainly not true that every quantitative result in quantum field theory is perturbative, I think it’s fair to say that the best-understood and most precise ones are, and that is where our focus will be for most of this series.

In particular, it will basically never be possible in an interacting quantum field theory to prove any results nearly as nice as the ones we got for the free theory. We will see later in this article that there will often be one- and multi-particle states in an interacting theory but, unlike in the free theory, we won’t be able to get a simple closed-form expression for them in terms of field operators, or even to prove that states with the relevant properties exist.

Informal Axioms of Relativistic QFT

Given this, a natural question arises: which features of the free theory carry over to this more general setting? As discussed in the introduction, we are not taking a completely rigorous axiomatic approach to this story, but it is still useful to discuss our foundational assumptions on a less formal level. We will assume:

  • Our states live in a separable Hilbert space \(\mathcal{H}\). There is an action of the Poincaré group — that is, the group of spacetime translations, rotations, and boosts — on \(\mathcal{H}\) which describes how transformations of spacetime affect our states. (Recall throughout this discussion that we are using the “Heisenberg picture” of states and observables, where observables depend on time and states don’t.)
  • There is a collection of field operators on \(\mathcal{H}\) indexed by points in spacetime. Conceptually, you should think of these as observables in the sense of ordinary quantum mechanics, corresponding to measuring the value of the field in question at the specified point in spacetime.

    • In our example so far, there’s been just one field, which we called \(\phi(x)\). In general, there might be several.
    • The single field in our Klein–Gordon example was a scalar field. In the perspective we’re introducing right now, the scalar-ness of the field amounts to a rule for how the field operators interact with the Poincaré group action: if \(U(g)\) is the action of a spacetime transformation \(g\), we have \(U(g)\phi(x)U(g)^{-1}=\phi(g.x)\). In future installments we’ll consider vector- and spinor-valued fields, for which the corresponding formula is somewhat more complicated.
  • Any two field operators \(\phi_i(x)\) and \(\phi_j(y)\) commute if \(x\) and \(y\) are spacelike separated. (When some \(\phi\) satisfies \([\phi(x),\phi(y)]=0\) whenever \(x-y\) is spacelike, we call \(\phi\) a local field operator. This is the generalization of one of the equal-time canonical commutation rules we imposed in the Klein–Gordon example. We’ll discuss the other commutation rule in a moment. This is the version of the condition for boson fields, which are all we’ll be studying for a while; for fermion fields, the commutation is replaced by anticommutation.)
  • We have a Lagrangian density which is a function of the fields and their derivatives. From our initial Lagrangian, we can extract a Hamiltonian by performing the usual procedure of finding the conjugate momentum variables and performing a Legendre transform. The Hamiltonian will have the form \(\widehat H=\int d^3\mathbf{x}\ \widehat{\mathcal{H}}(\phi(\mathbf{x}), \partial_t\phi(\mathbf{x}))\), and we call \(\widehat{\mathcal{H}}\) a Hamiltonian density. As always in quantum mechanics, the time translation part of the action of the Poincaré group is given by \(e^{-it\widehat H}\).
  • Again as in ordinary quantum mechanics, just as the Hamiltonian (that is, energy) is the generator of time translation, we identify the components of (spatial) momentum with the generators of space translations. Just as we did in the Klein–Gordon example, it’s convenient to combine all four of these operators into a four-vector \(P=(\widehat H,P_x,P_y,P_z)\). This, combined with the previous bullet point, means that the element of the Poincaré group corresponding to translation by \(a\in\mathbb{R}^4\) is given by \(e^{-iP\cdot a}\).
  • We will assume there is a single simultaneous eigenstate \(|\Omega\rangle\) of all four components of \(P\) with eigenvalue 0, and we’ll call this state the vacuum. This condition is equivalent to requiring that \(|\Omega\rangle\) is preserved by all spacetime translations; we will further assume that it is also preserved by the rest of the Poincaré group. (You might recall that when we discussed the free theory in the last article we called the vacuum \(|0\rangle\). In this series we’ll follow the common convention in physics of reserving the notation \(|0\rangle\) for vacua of free quantum field theories and use \(|\Omega\rangle\) for interacting theories.)
  • Also as in the Klein–Gordon example, the fact that space and time translations all commute means that all the components of \(P\) are simultaneously diagonalizable, so we can talk about eigenvalues of \(P\) as a whole. Because \(P\) is the observable corresponding to the total energy-momentum of the state, its eigenvalues ought to all be relativistically valid energy-momentum vectors. We will therefore assume that the spectrum of \(P\) is entirely contained in the (closed) forward light cone, that is, all eigenvalues of \(P\) are timelike and have a nonnegative time component.

Understanding the Axioms

It’s worth taking some time to reflect on what picture of quantum field theory is suggested by these axioms. With all the discussion of the mathematical difficulties involved in building the theory, it’s easy to lose sight of the fact that, on a conceptual level, a quantum field theory is just a quantum mechanical system with infinitely many degrees of freedom, and that is a very useful mental picture to hold onto. States are still represented by vectors in a Hilbert space, observables are still self-adjoint operators on that Hilbert space, and time translation is still governed by a Hamiltonian. The conceptual leap from quantum mechanics to quantum field theory is in this sense much smaller than the leap from classical to quantum mechanics.

It can even be useful to imagine it as a limit of a sequence of ordinary quantum-mechanical systems, each of which has a large but finite number of degrees of freedom. This is in fact the approach taken in lattice quantum field theory, in which the points of spacetime are replaced by points of a finite grid. Lattice QFT has the advantage of being possible, if very cumbersome, to put on a computer. This has turned out to be a good way to analyze theories like quantum chromodynamics, the theory of the strong force, which are not amenable to the sort of perturbative computations we’re building toward here.

Those mathematical difficulties still exist, of course. It is possible, at the expense of a lot more hand-wringing about functional analysis, to be a lot more careful than we are going to be here. The axioms in this last are, in fact, a very stripped-down version of what are known as the Wightman axioms. Probably the biggest difference between the Wightman axioms and our list involves an issue that we spent a bit of time on in the previous article: in the more formal version, fields are represented by operator-valued distributions rather than simply operators indexed by points in spacetime. Indeed, we saw that even in the free theory an object like \(\phi(x)\) has a delta-function-like singularity near \(x\), and so doesn’t actually represent a well-defined operator on the Hilbert space.

(A very attentive reader might notice a potential problem here: our example interacting Lagrangian above contained a term proportional to \(\phi^4\), which, if we are to be fastidious about treating \(\phi\) like a distribution, involves multiplying distributions, which is complete nonsense. This is indeed a big problem, and is a major contributor to the somewhat famous fact that the integrals that arise in QFT computations have a tendency to diverge. We will deal with this problem much later in this series, when we take up the theory of renormalization.)

In this article, we are going to brush issues like this under the rug. Our goal is not to prove rigorous theorems about interacting quantum field theories from a precise list of axioms. It is possible to prove the result we’re aiming for in this piece rigorously from the Wightman axioms, but in my opinion doing this on one’s first exposure to the theory obscures more than it helps. Our attitude is instead going to be closer to the one you might take when reading a physics book: what follows should be thought of as a plausibility argument, or as an informal case for the way a quantum field theory ought to behave, rather than as a proof. (The utility of proving everything formally from the Wightman axioms is also lessened by the fact that no one has been able to rigorously construct a quantum field theory that both obeys the Wightman axioms and also models anything physically realistic.)

Finally, perusing this list and comparing it to what we did last time might also lead to a question about the canonical commutation relations, which in our discussion of free fields had the form \([\phi(t,\mathbf{x}),\phi(t,\mathbf{y})]=0\) and \([\phi(t,\mathbf{x}),\pi(t,\mathbf{y})]=i\delta(\mathbf{x}-\mathbf{y})\). The first of these appears above as our locality assumption, but we never mentioned the second.

It is possible in this framework to define the conjugate momentum variable \(\pi(x)\) — in fact, you need to in order to get an expression for the Hamiltonian out of the Lagrangian — so one could easily add on the second commutation relation as an additional assumption. The simple answer to why we have not done this here is that we won’t need it. In addition, it will often be convenient to (for example) rescale field operators like \(\phi\), which would slightly change the form of this commutation relation, and therefore having it in our list of axioms would be inconvenient. It is absent from the formal Wightman axioms as well for similar reasons.

Finding Particle States

The list of properties above that we are postulating for an arbitrary quantum field theory covers most of what we were able to establish for the free field theory in the previous article in this series. There is one missing piece, though: where did the particle states go? The claim is that we’re in the process of describing the framework that, among other things, gives rise to the Standard Model of particle physics, and if this program is going to be successful it must be possible to identify some states in the Hilbert space that correspond to individual particles.

The One-Particle Subspace

In order to do this, it will be helpful to note a couple features of the particle states that we were able to find in the free theory. Recall that, in the free theory, we were able to identify the Hilbert space \(\mathcal{H}\) with a Fock space, which was an orthogonal direct sum of “\(n\)-particle” subspaces. Each of these subspaces is separately preserved by the Hamiltonian and the momentum operators (and therefore by all spacetime translations), and the \(n\)-particle subspace is generated by states of the form \(a^\dagger(\mathbf{p}_1)\cdots a^\dagger(\mathbf{p}_n)|0\rangle\) for arbitrary momentum 3-vectors \(\textbf{p}_i\).

These states give a complete set of eigenstates of the four-vector-valued operator \(P=(\widehat H,P_x,P_y,P_z)\); the eigenvalue of such a state is \(p_1+\cdots+p_n\), where we write \(p=(\omega_{\mathbf{p}},\mathbf{p}_x,\mathbf{p}_y,\mathbf{p}_z)\) and \(\omega_{\mathbf{p}}=\sqrt{|\mathbf{p}|^2+m^2}\). All together, this means that the spectrum of \(P\) looks like this:

Here the \(x\)-axis stands in for the three spatial components of \(p\), and the \(y\)-axis for the single time/energy component. The point on the bottom, indicating an eigenvalue of \(0\), corresponds to the vacuum. The red hyperbola partway up the graph — which is the hyperbola \(p^2=m^2\) — corresponds to the one-particle subspace, and the region on the top corresponds to all the \(n\)-particle subspaces for \(n\ge 2\). Note, in particular, that we do not just get a stack of disconnected hyperbolas in the spectrum for the multi-particle states. This is because any \(p\) with \(p^2\ge (2m)^2\) can be written as a sum of \(p_i\)’s with \(p_i^2=m^2\), not just the ones whose norms are integer multiples of \(m\).

The existence of an action of the Poincaré group on the Hilbert space, along with the fact that the operators appearing in \(P\) are the generators of the spacetime translation portion of this action, imply that the spectrum of \(P\) is preserved by rotations and boosts. That is, if \(B\) is a boost or rotation and \(p\) is an eigenvalue of \(P\), then \(Bp\) is also an eigenvalue of \(P\). (It’s worth trying to prove this to yourself if it’s not obvious.) Therefore, all of the information we’d want to extract from the spectra of the four components of \(P\) can actually be extracted just from the single operator \(P^2=\widehat H^2-P_x^2-P_y^2-P_z^2\). This operator has an isolated eigenvalue at \(0\) corresponding to the vacuum, an isolated eigenvalue at \(m^2\) corresponding to all the one-particle states, and a continuous spectrum starting at \((2m)^2\).

Now imagine we have an interacting quantum field theory whose Lagrangian is a small perturbation of the Klein–Gordon Lagrangian. (For concreteness, you can imagine the \(\phi^4\) theory described in the previous section with a very small value of \(\lambda\).) If the perturbation is small enough, you might imagine that it would have a correspondingly small effect on the spectrum of \(P^2\), and in particular that there might still be an isolated eigenvalue of \(P^2\) sitting between \(0\) and the continuous portion of the spectrum. (There’s no reason to suppose this eigenvalue will still be \(m^2\), though, and no reason to suppose the continuous portion still starts at \((2m)^2\).)

When we talk about a “one-particle subspace” in an interacting quantum field theory, this will be what we mean: an eigenspace of \(P^2\) with an isolated eigenvalue. If the eigenvalue is \(\mu^2\), then we’ll say \(\mu\) is the mass of the particle. (By the assumption in the previous section about the spectrum of \(P\) lying in the forward light cone, the eigenvalue will always be nonnegative.) Any state in this subspace will be called a “one-particle state.” We can take the eigenspace and split it up further according to the eigenvalues of the four-vector-valued operator \(P\); then, if \(|\psi\rangle\) is a one-particle state with \(P|\psi\rangle=p|\psi\rangle\), then \(|\psi\rangle\) is the state of a single particle with energy-momentum \(p\). We will of course then have \(p^2=\mu^2\), and our assumptions about the action of the Poincaré group imply that it will act on the energy-momentum of the state in the expected way.

We will, for now, also assume that our one-particle states are scalar particles. For our purposes, this means that a one-particle state is determined just by its momentum (justifying the notation \(|p\rangle\) with no extra adornments) and that, for any boost or rotation \(B\) in the Lorentz group, \(B.|p\rangle\) is a scalar multiple of \(|Bp\rangle\). Like the corresponding restriction to scalar field operators mentioned earlier, we’ll examine this assumption when we come around to discussing spin in a future installment in this series.

Some Caveats

Our picture of particle states in the free theory was quite straightforward, and the preceding discussion implies that many of its features might be replicated in an interacting theory. It’s therefore important that we spend some time on what is not the same.

First, we are not claiming that we can somehow prove that \(P^2\) will have an isolated eigenvalue in the right place. We are saying merely that, if it does have such an eigenvalue, then we have states in the Hilbert space that have the properties we expect from one-particle states. Indeed, as I’ve said before and will say again, essentially no interacting quantum field theories have exact solutions of the type we were able to produce for the Klein–Gordon field. A lot of the structure we’ll use to get quantitative results from interacting quantum field theories will have to just be postulated in this way; this is true for the properties listed in the previous section as well as the one under discussion now. In particular, there won’t be a nice formula for creation and annihilation operators in terms of the field operators (although in the final section of this article we’ll get part of the way there).

In fact, there are some interacting quantum field theories where it’s believed that there aren’t one-particle states in the interacting theory corresponding to the ones from the free theory. The theory of the strong force, where there are (conjecturally) no one-quark or one-gluon states in the interacting theory even though there are such states if you turn the interaction off, is probably the most famous example of this.

The second point worth emphasizing is that even if we have the conditions in place for this description of the one-particle state space to exist, there is no reason to expect the multi-particle portion of the state space to resemble the situation for the free theory nearly as closely. Indeed, if each of the \(n\)-particle state spaces were still generated by eigenstates of \(P\), that would mean they were invariant under spacetime translations, which completely removes the possibility of interaction: any state that started off with some collection of particles with some specified momenta would have to remain that way forever, since that’s what it means to be preserved by time translations. The description of this portion of the state space — and, indeed, the reason it is even appropriate to call these states “multi-particle” states at all — is going to be much subtler, and we’ll discuss it more a bit later on in the article.

Third, the requirement that the eigenvalue of \(P^2\) corresponding to our particle is separated from \(0\) is only appropriate for massive particles. There are theories with massless particles — photons are the most famous example — and handling them in this framework is a lot more delicate. We will, for now, avoid this problem by assuming that all our particles are massive, but we might or might not come back to this question in a future installment.

Finally, a one-particle state that is an eigenstate for \(P\) is necessarily corresponds to a stable particle, because the state is preserved by spacetime translations and so in particular persists for all time. It is possible, but more complicated, to handle unstable particles in this framework, and again we will not worry about this possibility for now.

Scattering and the LSZ Formula

With the place of one-particle states in the theory established (or, at least, as established as it’s ever going to be) we now turn to the question of multi-particle states. As mentioned in the last section, these states represent a much larger conceptual break with the free theory: if particles are meant to interact with each other in interesting ways, multi-particle states shouldn’t be eigenstates of \(P\).

Because our goal is partly to describe these particle interactions, though, we’re going to need some way to describe states with more than one particle. The specific situation we’ll aim to analyze in detail is called scattering. In a scattering experiment, some number of particles (usually two) start out very far away from each other with some specified momenta and collide at the origin around time \(t=0\). After a while, another collection of particles can be found having emerged from this mess, again very far away from each other. The goal is to compute the probability amplitude for a particular set of particles with a particular set of momenta to be the ones that result from the scattering.

Our ultimate goal in this section will be something called the LSZ formula (named after the paper by Lehmann, Symanzik, and Zimmermann where it was first written down) which relates these scattering amplitudes to certain expectation values of products of field operators in the vacuum state. The big advantage of this is that the latter quantities are what it will be possible to compute, at least approximately; as we will see in the next article in this series, it is at this point that the famous “Feynman diagrams” show up.

Finally, I want to make a note about mathematical precision. As mentioned earlier, while it’s possible to prove most of what follows rigorously from the Wightman axioms, we’re not going to do that here. Our approach is instead modelled on the one you’ll find in physics texts (in particular, I took a lot from Chapters 13 and 14 of Sidney Coleman’s book) because I think it is much easier to understand the physical content of the argument when you aren’t busy proving bounds. It’s perhaps worth noting that all of the more rigorous treatments will use a more complicated definition of all the objects we’ve about to discuss.

If you are interested in a more rigorous version, I recommend Chapter 5 of Araki’s book or Chapter 9 of Duncan’s book and the references he cites there. There is also a recent paper by John Collins called “A new approach to the LSZ reduction formula” which stays closer to the “standard” LSZ story than either of the books I just mentioned while also being much more careful than I am about to be with the analytic details.

Interpolating Fields

In the free theory, the field operators and the particle states were very closely related to each other: the creation operators \(a^\dagger(\mathbf{p})\), which make one-particle states when you apply them to the vacuum, were expressible in terms of the field operators \(\phi(x)\), and vice versa. (If we had a free field theory with more than one type of field operator in it, exactly the same logic would produce more than one type of creation operator, and therefore more than one type of particle state, but we’ll stick to the case of one particle type for simplicity.) We therefore have a very tight relationship between the field operators and the particle states.

In an interacting theory, the situation is not as nice. We no longer have a formula for creation operators in terms of the fields appearing in the Lagrangian. Unfortunately, the only objects we actually know anything about are these field operators and the vacuum state, so if we are expecting to build particle states that we can actually use to perform a computation, these are the tools we’ll have to use. We can turn to the free theory for some inspiration. In the free theory, we have \[\phi(x)|0\rangle = \int \frac{d^3\mathbf{p}}{(2\pi)^{3/2}\sqrt{2\omega_{\mathbf{p}}}} [a(\mathbf{p})e^{-ip\cdot x} + a^\dagger(\mathbf{p})e^{ip\cdot x}]|0\rangle,\] which lives entirely within the one-particle subspace of the Hilbert space, since each \(a^\dagger(\mathbf{p})|0\rangle\) is a one-particle state and each \(a(\mathbf{p})|0\rangle\) is zero.

This might inspire us to look for particle states in an interacting theory by applying a field operator to the interacting theory’s vacuum state \(|\Omega\rangle\), which leads to the following definition. Let’s suppose we have some field operator \(A(x)\) and a one-particle state with energy-momentum \(p\), which we’ll label \(|p\rangle\). We will say that \(A\) is an interpolating field for this state if applying \(A(x)\) to the vacuum results in a state with nonzero overlap with \(|p\rangle\), that is, if \[\langle p|A(x)|\Omega\rangle\ne 0.\]

In the free theory, our argument that \(\phi(x)|0\rangle\) was a pure one-particle state relied on the expansion of \(\phi(x)\) in terms of creation and annihilation operators, so there’s no reason to suppose this will still happen in an interacting theory. In general, even if \(A\) is an interpolating field for some particle, you should imagine that \(A(x)|\Omega\rangle\) is an unmanageably complicated superposition of one- and multi-particle states.

You definitely shouldn’t imagine that there is some algorithm for inspecting a one-particle state and producing an interpolating field operator. Our results will instead all have the following form: given a field operator \(A\), if there happens to be a one-particle state \(|p\rangle\) for which \(A\) is an interpolating field, then we can deduce that some quantity pertaining to these particles can be computed by some expression involving \(A\) and the vacuum state.

Furthermore, the relationship between fields and particles is very far from being one-to-one; a given particle state will in general be interpolated by many different fields, and it’s entirely possible for a given field to not interpolate for any particles at all. (This is the expected situation for the quark and gluon fields in the theory of the strong force, for example.) There is also no need for \(A\) to be one of the “elementary” fields appearing in the Lagrangian; we may, for example, want to use a polynomial in those fields instead.

If a given particle can be interpolated by one of the fields appearing in the Lagrangian, then it’s called an elementary particle. If it instead can only be interpolated by some more complicated polynomial in those fields, it’s a composite particle. In the Standard Model, the composite particles include things like the proton along with (assuming one could somehow prove they were actually stable) more complicated “bound states” like atoms and molecules.

Our presentation won’t really be rigorous enough to see in detail where this assumption gets used, but we will also require that \(A\) be a local field operator in the sense discussed above, that is, \([A(x),A(y)]=0\) when \(x-y\) is spacelike. This will happen if, for instance, \(A(x)\) is a polynomial in the (also local) fields appearing in the Lagrangian all evaluated at \(x\), which is a typical situation.

Some Conventions

Going forward it will be somewhat helpful to insist on another couple of properties of \(A\) and \(|p\rangle\); while they are not part of the definition of interpolating fields, they are part of the hypotheses of the LSZ formula, and now is as good a time as any to introduce them.

First, there is a question about how to normalize the one-particle states \(|p\rangle\). Near the end of the previous article, we defined a version of the creation and annihilation operators that played more nicely with Lorentz transformations by setting \[\alpha(p)=(2\pi)^{3/2}\sqrt{2\omega_{\mathbf{p}}}a(\mathbf{p}).\] If we use this operator to create one-particle states in the free theory and denote those states by \(|p\rangle\), then for any Lorentz transformation \(B\) we have \(B.|p\rangle = |Bp\rangle\), with no extra \(\omega\)’s hanging around.

This does come at the cost of slightly more complicated inner product relations: we have \[\langle p'|p\rangle = (2\pi)^3(2\omega_{\mathbf{p}})\delta(\mathbf{p}-\mathbf{p}'),\] but the tradeoff is more than worth it. We will choose to normalize the one-particle states in our interacting theory in the same way.

Next, pick any point \(x\in\mathbb{R}^4\), and let \(T(x)\) be the unitary operator that translates states by \(x\). Because of the translation-invariance of the vaccum state, we have that \[\langle \Omega | A(x) | \Omega \rangle = \langle \Omega | T(x) A(0) T(x)^{-1} | \Omega \rangle = \langle \Omega | A(0) | \Omega\rangle,\] that is, it’s a constant that doesn’t depend on \(x\). We are going to ask for this constant to be \(0\), which can be accomplished after possibly subtracting a constant from \(A\). (This doesn’t change anything about which particles \(A\) is an interpolating field for.)

Finally, I encourage you to verify that the quantity \[\langle p | A(0) | \Omega\rangle\] is a Lorentz-invariant function of \(p\). (The argument uses our postulates about how the Lorentz group acts on both \(|p\rangle\) and \(A\).) This means that it actually can depend only on \(p^2\) and the sign of the time component of \(p\). But for all our one-particle states, \(p^2=\mu^2\), and its time component is always positive, so in fact once again this quantity is actually a constant. This constant will be given the name \(Z^{1/2}\). (This slightly weird convention originates from the relationship between particle states and poles in the two-point function, which we’ll explore in the next article in this series.) Note that this implies that \[\langle p | A(x) | \Omega \rangle = Z^{1/2} e^{ip\cdot x}\] for all \(x\).

A Replacement for the Creation Operator

The point of the interpolating field concept for our present purposes is that it will help us to create the states that appear in a scattering experiment, and which will give rise (after some computation) to the promised LSZ formula.

In the free theory, again, we built multi-particle states by repeatedly applying creation operators to the vacuum. Though we never wrote this formula explicitly, it is a straightforward but tedious computation to show that the Lorentz-invariant version of the free creation operator is related to the field operator via the following formula, which holds for any choice of \(t\): \[\begin{aligned} \alpha^\dagger(p) &= i\int d^3\mathbf{x} \left[-i\omega_{\mathbf{p}}\phi(x)-\partial_t\phi(x)\right] e^{-ip\cdot x}\\ &= i\int d^3\mathbf{x} \left[ \phi(x)(\partial_t e^{-ip\cdot x}) - e^{-ip\cdot x}(\partial_t\phi(x)) \right].\end{aligned}\] It’s easy to get confused about what’s being asserted here. Note that the expression in square brackets on the right refers to the four-vector \(x\), even though the integral is only over the spatial components \(\mathbf{x}\), so, after performing the integral, \(t\) is still a free variable on the right-hand side. The assertion is that the result is actually a constant function of \(t\), and that it’s equal to \(\alpha^\dagger(p)\) for all choices of \(t\). And, as usual when both \(p\)’s and \(\mathbf{p}\)’s are around, the time component of \(p\) is always assumed to be \(\omega_{\mathbf{p}}\).

This formula will be our inspiration for a sort of cheap replacement creation operator in the interacting theory. Just copying this expression unchanged will not work. There is no reason for the right side of that expression to be time-independent anymore. More importantly, as we said earlier, \(A(x)|\Omega\rangle\) is no longer just a one-particle state. Finally, in the free theory, this creation operator creates a state whose wavefunction looks like a plane wave \(e^{-ip\cdot x}\), which fills up all of spacetime, whereas for our scattering problem we are interested in keeping our particles far apart from each other at times far away from \(t=0\).

This last problem, about localizing the particles, is relatively straightforward to solve. Suppose we have a quantum field theory — either free or interacting — and a collection of one-particle states \(|p\rangle\) as above which span the one-particle subspace. Given some arbitrary function \(F(\mathbf{p})\) of momentum, let’s define \[f(x) = \int \frac{d^3\mathbf{p}}{(2\pi)^3(2\omega_{\mathbf{p}})} F(\mathbf{p}) e^{-ip\cdot x}\] and \[|f\rangle = \int \frac{d^3\mathbf{p}}{(2\pi)^3(2\omega_{\mathbf{p}})} F(\mathbf{p}) |p\rangle,\] where \(\omega_{\mathbf{p}} = \sqrt{|\mathbf{p}|^2+\mu^2}\). Our assumptions imply that every one-particle state is of this form for some \(F\).

Inspired by how particle states work in ordinary quantum mechanics, it is useful to think of \(F\) as serving the role of the momentum-space wavefunction of the particle state. In particular, it is straightforward to check that, for any \(k\) with \(k^2 = \mu^2\), we have \(\langle k|f\rangle = F(\mathbf{k})\). This intuition then also suggests that \(f\) serves the role of the position-space wavefunction of the state. (It might be interesting to the reader check that \(f(x)\) is the Fourier transform of the function \(\delta(p^2-\mu^2)\theta(p_t)F(\mathbf{p})\). Here \(\theta\) is the Heaviside function.)

When physicists describe this situation, they say that we’ve restricted our one-particle state to a wave packet given by \(f\). Note that \(f\) is a solution of the Klein–Gordon equation \((\partial^2 + \mu^2)f=0\), where \(\mu\) is the mass of the particle. Given any \(f\) with this property, its inverse Fourier transform will be supported on the hyperboloid \(p^2=\mu^2\), which has two connected components. It is not hard to show that the \(f\)’s that arise as wave packets in the way we’ve described here are precisely the ones whose inverse Fourier transforms are supported just on the sheet of the hyperboloid with positive time coordinate. Such functions \(f\) are called positive-energy solutions to the Klein–Gordon equation.

I encourage the sufficiently bold reader to check that, in the free theory, if we write \[\begin{aligned} \alpha^\dagger_{f,\phi}(t) &= \int \frac{d^3\mathbf{p}}{(2\pi)^3(2\omega_{\mathbf{p}})} F(\mathbf{p}) \alpha^\dagger(p)\\ &= i\int d^3\mathbf{x} \left[ \phi(x)(\partial_t f(x)) - f(x)(\partial_t\phi(x)) \right],\end{aligned}\] then \(\alpha^\dagger_{f,\phi}(t)|\Omega\rangle = |f\rangle\), again independently of \(t\). (Note that this is just the second expression above for \(\alpha^\dagger\) with \(f(x)\) swapped in for \(e^{-ip\cdot x}\).) This perhaps adds some physical credibility to the ideal that our state should be thought of as localized to \(f\): we are only “disturbing” the vacuum at spacetime points where \(f\) or \(\partial_tf\) is large.

It therefore perhaps makes sense to investigate how an operator defined like this behaves in an interacting theory. So, if \(A\) is an interpolating field for our one-particle states of mass \(\mu\), let’s define \[\alpha^\dagger_f(t) = i\int d^3\mathbf{x} \left[ A(x)(\partial_t f(x)) - f(x)(\partial_t A(x)) \right],\] and see what happens when we apply it to the vacuum.

The normalization assumptions we made about \(A\) earlier imply that \[\langle \Omega | \alpha^\dagger_f(t) | \Omega \rangle = 0,\] \[\langle p | \alpha^\dagger_f(t) | \Omega \rangle = Z^{1/2} \langle p | f \rangle,\] and \[\langle \Omega | \alpha^\dagger_f(t) | p \rangle = 0.\] (Work these out for yourself!) That is, at least as far as inner products with \(|\Omega\rangle\) and \(|p\rangle\) are concerned, \(\alpha^\dagger_f(t)\) does create something proportional to \(|f\rangle\) from the vacuum, even in the interacting theory.

This is promising! But now suppose we have some state, say \(|\psi\rangle\), which is orthogonal to the vacuum and the one-particle subspace. We can split up \(|\psi\rangle\) into eigenstates of \(P\), writing \[|\psi\rangle = \int d^4p\ \sigma(p) |\psi_p\rangle,\] where \(\sigma\) is some density function and each \(|\psi_p\rangle\) is an eigenstate of \(P\) with eigenvalue \(p\). Our assumptions about the spectrum of \(P\) imply that we can assume \(\sigma(p)=0\) unless \(p^2>0\) and \(p^2\ne\mu^2\). (In fact, since the eigenvalue \(\mu^2\) of \(P^2\) was assumed to be isolated, we even know that \(|p^2-\mu^2|\) is bounded away from zero.) If we write \(E_p\) for the time/energy component of the four-vector \(p\) and \(\omega_{\mathbf{p}}=\sqrt{|\mathbf{p}|^2+\mu^2}\), then a more or less straightforward computation which I encourage you to work out shows that \[\langle \psi | \alpha^\dagger_f(t) | \Omega \rangle = \int d^4p\ \overline{\sigma(p)} \frac{\omega_{\mathbf{p}} + E_p}{2\omega_{\mathbf{p}}} F(\mathbf{p}) \langle \psi_p | A(0) | \Omega \rangle e^{i(E_p-\omega_{\mathbf{p}})t}.\]

Despite our desire for \(\alpha^\dagger_f(t)|\Omega\rangle\) to be just equal to the one-particle state \(|p\rangle\), this overlap is under no obligation to be zero. But the exponential factor at the end of this expression can get us something almost as good. On the support of \(\sigma\), \(|E_p-\omega_{\mathbf{p}}|\) is bounded away from zero. This means that, for very large positive or negative \(t\), the exponential factor \(e^{i(E_p-\omega_{\mathbf{p}})t}\) causes the integrand to oscillate arbitrarily rapidly, which means the integral gets arbitrarily small. (This is essentially the content of the Riemann–Lebesgue lemma.)

So, even though we couldn’t make the overlap zero independently of \(t\), we can conclude that \[\lim_{t\to\pm\infty} \langle \psi | \alpha^\dagger_f(t) | \Omega \rangle = 0,\] and a nearly identical computation will similarly show that \[\lim_{t\to\pm\infty} \langle \Omega | \alpha^\dagger_f(t) | \psi \rangle = 0.\] While \(\alpha^\dagger_f(t)|\Omega\rangle\) isn’t quite equal to \(|f\rangle\) independently of \(t\), like we had in the free theory, this behavior as \(t\to\pm\infty\) will turn out to be enough.

If we take complex conjugates of the equations we’ve established so far, we get corresponding facts about the “annihilation operator” \(\alpha_f(t)\). Specifically, \[\langle \Omega | \alpha_f(t) | \Omega \rangle = \langle p | \alpha_f(t) | \Omega \rangle = \lim_{t\to\pm\infty} \langle \psi | \alpha_f(t) | \Omega \rangle = 0,\] and so, in the limit as \(t\to\pm\infty\), \(\alpha_f(t)\) annihilates the vacuum as we might expect.

So where does this leave us in our hunt for a creation operator? We’ve taken the inner product of \(\alpha^\dagger_f(t)|\Omega\rangle\) with a collection of states that collectively form a basis for the Hilbert space, and seen that with respect to those inner products it behaves just like \(Z^{1/2}|f\rangle\) after the limit \(t\to\pm\infty\). We can therefore conclude that if \(|\psi\rangle\) is any state, then \[\lim_{t\to\pm\infty}\langle\psi | \alpha^\dagger_f(t) | \Omega\rangle = Z^{1/2} \langle \psi | f \rangle.\] In functional analysis language, this means that \(\alpha^\dagger_f(t)|\Omega\rangle\) converges to \(Z^{1/2} |f\rangle\) in the weak topology.

Analytic details aside, the facts we’ve gotten so far can be turned into a decent mental picture of what happens when we apply \(\alpha^\dagger_f(t)\) to the vacuum. We’re only “directly” messing with the state at time \(t\), and we’re doing this by applying a bunch of shifted copies of \(A\) and \(\partial_t A\). The resulting state is orthogonal to the vacuum, and its projection onto the one-particle subspace is always exactly \(Z^{1/2} |f\rangle\) independently of \(t\).

The state as a whole does depend on \(t\), but in such a way that, if you focus on the overlap with any particular multiparticle state, everything but \(Z^{1/2}|f\rangle\) “washes away” after we wait for a sufficiently long time. Focusing the overlap with a particular state is a bit like focusing your attention on a bounded region of spacetime; saying the limit is just a weak limit rather than a strong limit is like saying that the rate at which the multiparticle contributions fade away can depend on which region you are looking at. (Because of the low level of rigor we’re operating at in this discussion, we will now completely stop worrying about which topology our limits converge in.)

In and Out States

Our ultimate goal is to build scattering states, that is, states that can serve as descriptions of either the beginning or end of a scattering experiment. Such a state will almost always have more than one particle in it, and it is in fact for this reason that we went to all the work of building a particle-creation operator rather than just working with the states \(|f\rangle\) on their own.

The recipe is, in fact, not especially complicated given the work we’ve already done. Suppose \(F_1\) and \(F_2\) are two different functions of momentum with non-overlapping supports. We can then build a “two-particle state” by setting \[|f_1,f_2\rangle^{\mathrm{in}} = Z^{-1/2} \lim_{t\to-\infty}\alpha^\dagger_{f_2}(t)|f_1\rangle.\] We can similarly create states with any number of particles by doing the same thing recursively: \[|f_1,\ldots,f_{n-1},f_n\rangle^{\mathrm{in}} = Z^{-1/2} \lim_{t\to-\infty}\alpha^\dagger_{f_n}(t)|f_1,\ldots,f_{n-1}\rangle^{\mathrm{in}}\] as long as the supports in momentum space of the \(F_i\)’s are all disjoint.

In the more rigorous version of this story, it is possible to argue (at least insofar as anything we’re doing here refers to any actual mathematical objects at all) that these limits exist and that the result is independent of the order in which the operators are applied, but doing so involves a long series of (in my opinion) unenlightening inequalities, and this article is certainly not the place for it. We’ll have to content ourselves with an argument that it is physically plausible that such a state should exist and that it is a suitable description of the beginning of a scattering experiment.

The argument goes like this. If we look at the wave packets \(f_1\) and \(f_2\), the assumption that \(F_1\) and \(F_2\) have nonoverlapping support tells us that the regions where \(|f_1|\) and \(|f_2|\) are large should be moving away from zero in different directions as \(t\) goes to \(\pm\infty\). This means that, for very large negative \(t\), the region of spacetime on which \(\alpha^\dagger_{f_2}(t)\) is acting looks a lot like the vacuum, and so it is plausible that it should affect that region in much the same way that it would affect the vacuum, that is, it should produce something that looks locally like the one-particle state \(|f_2\rangle\).

(When working from the Wightman axioms, this fuzzy physical intuition about states “looking locally like the vacuum” can be formalized in terms of a clustering principle. Schematically, if \(A_1\) is an operator defined in terms of an integral of field operators evaluated in some region \(D_1\) in spacetime, and similarly for \(A_2\) and \(D_2\), then a clustering principle is a result that says that \(||\langle \Omega|A_1A_2|\Omega\rangle - \langle \Omega|A_1|\Omega\rangle\langle \Omega|A_2|\Omega\rangle||\) decays exponentially as \(D_1\) and \(D_2\) get further apart in a spacelike direction. If you’re interested in learning about this in detail, I recommend the books by Araki and Duncan mentioned in the introduction. Araki’s book in Section 5.4 also discusses a precise sense in which the states we’re constructing here “look like” several particles moving with constant velocity as \(t\to-\infty\).)

Now, as we look at different time slices of the resulting state away from \(t\), this resemblance to \(|f_2\rangle\) should hold up as long as \(f_1\) and \(f_2\) still have very little overlap on that time slice. This is perfectly fine if we move further into the past, which will make the particles move even further away from each other, but not if we move toward the future, where the particles have the potential to move close to each other. Once this happens, and for all times thereafter, we are no longer justified in assuming that our state looks like \(|f_2\rangle\).

Our picture of the state \(|f_1,f_2\rangle^{\mathrm{in}}\) should therefore be that it describes a situation in which two particles, with wave packets \(f_1\) and \(f_2\), start out separated from each other in the distant past, potentially collide with each other somewhere around \(t=0\), resulting in who knows what after much more time has passed. This is the meaning of the label “in” on the state: we are specifying which particles come in to the scattering experiment, not which ones come out.

These are, unsurprisingly, called in states. We could instead have taken all the limits in this discussion to be limits as \(t\) goes to positive \(\infty\). Then everything would be the same, except that we would be specifying which particles come out at the end of the experiment. The states built in this way are called out states and denoted \(|f_1,\ldots,f_n\rangle^{\mathrm{out}}\). Note that if there is only one particle, then — since we are assuming our particle is stable — there is no potential for any interaction, and therefore no need to distinguish between in and out states. That is, we have \(|f\rangle^{\mathrm{in}} = |f\rangle^{\mathrm{out}} = |f\rangle\). (This is the reason we included the factor of \(Z^{-1/2}\) to cancel the \(Z^{1/2}\) we would get from looking at \(\alpha_f^\dagger(t)|\Omega\rangle\).)

Now, consider a Fock space \(\mathcal{H}^{\mathrm{in}}\), of the type we constructed in the last article, for particles of mass \(\mu\). (Note that this is the mass of the particle, not the “mass” parameter appearing in the Lagrangian!) To each in state \(|f_1,\ldots,f_n\rangle^{\mathrm{in}}\), we can associate a corresponding vector in \(\mathcal{H}^{\mathrm{in}}\) by taking \(|f_1,\ldots,f_n\rangle = \alpha_{f_1}^\dagger\cdots\alpha_{f_n}^\dagger|\Omega\rangle\), where \(\alpha_{f_i}^\dagger\) now refers to the (time-independent) Fock space creation operator.

In the rigorous version of this construction, one now shows that the in states have the same inner product structure as the corresponding states in \(\mathcal{H}^{\mathrm{in}}\). This means that the construction outlined in this section gives an injective, inner-product-preserving linear map \(\Omega^-:\mathcal{H}^{\mathrm{in}}\to\mathcal{H}\). In a similar way, we get another such map \(\Omega^+:\mathcal{H}^{\mathrm{out}}\to\mathcal{H}\). (One might object that we’ve only defined these maps in the cases where the states are built out of wave packets that don’t overlap in momentum space, but such states are actually dense in the Fock space, so this is not a problem.) The spaces \(\mathcal{H}^{\mathrm{in}}\) and \(\mathcal{H}^{\mathrm{out}}\) are collectively called the asymptotic Fock spaces of the theory. The operator we’ve been calling \(\alpha_f^\dagger(t)\) should then be thought of as approaching \(Z^{1/2}\) times the creation operator on \(\mathcal{H}^{\mathrm{in}}\) as \(t\to-\infty\), and similarly for \(\mathcal{H}^{\mathrm{out}}\) and the \(t\to\infty\) limit.

It is common to assume that these maps from \(\mathcal{H}^{\mathrm{in}}\) and \(\mathcal{H}^{\mathrm{out}}\) to \(\mathcal{H}\) are also surjective, that is, that every state evolves into a superposition of in or out states if we run time to \(\pm\infty\). This property is called asymptotic completeness, and it is the main reason that it makes sense to refer to the states in \(\mathcal{H}\) that are orthogonal to the vacuum and one-particle states as “multi-particle states.”

The assumption of asymptotic completeness becomes more physically plausible when we remember the rather expansive definition of “particle” we adopted earlier — any stable state corresponding to an isolated eigenvalue of \(P^2\) counts, whether or not it’s interpolated by one of the “elementary” fields appearing in the Lagrangian. We will follow the physicists in adopting this assumption going forward.

Proving the LSZ Formula

Scattering experiments involve taking some number of particles, allowing them to collide with each other, and computing the probability amplitude for this to result in some other collection of particles after the collision has ended. In the language we’ve just finished developing, this can be expressed as the overlap between an in state and an out state, that is, a quantity of the form \[{}^{\mathrm{out}}\langle g_1,\ldots, g_m|f_1,\ldots,f_n\rangle^{\mathrm{in}}.\] These are called scattering amplitudes, and they are what we are aiming to compute.

Scattering amplitudes can alternatively be described using the asymptotic Fock spaces \(\mathcal{H}^{\mathrm{in}}\) and \(\mathcal{H}^{\mathrm{out}}\). Under the assumption of asymptotic completeness, we get a unitary map \(S = (\Omega^+)^{-1}\Omega^-:\mathcal{H}^{\mathrm{in}}\to\mathcal{H}^{\mathrm{out}}\). Physicists call this map the S-matrix, and you can think of scattering amplitudes as being like matrix entries of it.

The LSZ formula expresses scattering amplitudes in terms of a quantity which we’ll call the time-ordered \(n\)-point function of our field operator \(A\). Given points \(x_1,\ldots,x_n\) in spacetime, we define \[G^{(n)}_A(x_1,\ldots,x_n)=\langle \Omega|T[A(x_1)\cdots A(x_n)]|\Omega\rangle.\] The \(T\) in front of the product is called a time-ordering symbol; it means to order all of the factors appearing in the product by their \(t\) coordinates, with earlier times appearing on the right and later times on the left. (This may seem like it breaks Lorentz-invariance, but actually it doesn’t, because \(A(x)\) commutes with \(A(y)\) whenever \(x\) and \(y\) are spacelike separated!)

The LSZ formula then states that the scattering amplitude \({}^{\mathrm{out}}\langle g_1,\ldots, g_m|f_1,\ldots,f_n\rangle^{\mathrm{in}}\) is equal to \[\frac{i^{n+m}}{Z^{(n+m)/2}} \int d^4x_1\cdots d^4x_{n+m} \prod_{i=1}^n f_i(x_i) \prod_{j=1}^{m}\overline{g_{j}(x_{n+j})} \prod_{k=1}^{n+m}(\partial_{x_k}^2+\mu^2) G^{(n+m)}_A(x_1,\ldots,x_{n+m}).\] We will eventually be able to get this into a somewhat more readable form, which will enable us to get a better sense of what it means, but unfortunately this is the form in which it’s easiest to prove, so we’re stuck with it for the moment.

Our argument will be inductive. Specifically, we’ll show that, for any points \(x_1,\ldots,x_k\) and any wave packets \(f_1,\ldots,f_n,g_1,\ldots,g_m\) satisfying our assumption about not overlapping in momentum space, we have \[\begin{aligned} &{}^\mathrm{out}\langle g_1,g_2,\ldots,g_m | T[A(x_1)\cdots A(x_k)] | f_1,f_2,\ldots,f_n\rangle^{\mathrm{in}}\\ &\hspace{3em}= \frac{i}{Z^{1/2}} \int d^4x\ f_1(x) (\partial_x^2+\mu^2)\ {}^\mathrm{out}\langle g_1,g_2,\ldots,g_m | T[A(x_1)\cdots A(x_k)A(x)] | f_2,\ldots,f_n\rangle^{\mathrm{in}} \\ &\hspace{3em}= \frac{i}{Z^{1/2}} \int d^4x\ \overline{g_1(x)} (\partial_x^2+\mu^2)\ {}^\mathrm{out}\langle g_2,\ldots,g_m | T[A(x_1)\cdots A(x_k)A(x)] | f_1,f_2,\ldots,f_n\rangle^{\mathrm{in}}.\end{aligned}\] By applying this repeatedly, using the first equality for each incoming particle and the second for each outgoing particle, we’ll get our result.

We’ll only do the first equality, since the second one is very similar. We can write the left side of the equation as \[Z^{-1/2} \lim_{t\to-\infty} {}^\mathrm{out}\langle g_1,g_2,\ldots,g_m | T[A(x_1)\cdots A(x_k)] \alpha_{f_1}^\dagger(t) | f_2,\ldots,f_n\rangle^{\mathrm{in}}.\] To save on space, we’ll need to introduce a bit of physicists’ notation. For any two functions \(p\) and \(q\), define \(p\overset{\leftrightarrow}{\partial_t}q = p(\partial_t q) - q(\partial_tp)\). Then the result of plugging the definition of \(\alpha_{f_1}^\dagger\) into this expression can be written \[\frac{i}{Z^{1/2}} \lim_{t\to-\infty} \int d^3\mathbf{x}\ f_1(x) \overset{\leftrightarrow}{\partial_t}\ {}^\mathrm{out}\langle g_1,g_2,\ldots,g_m | T[A(x_1)\cdots A(x_k)]A(x) | f_2,\ldots,f_n\rangle^{\mathrm{in}}.\]

Now, because the \(t\) coordinate of \(x\) is going to \(-\infty\), the \(A(x)\) factor is actually already in the position it would be placed in by the time-ordering. We can therefore pull it into the time-ordered product without changing anything: \[\frac{i}{Z^{1/2}} \lim_{t\to-\infty} \int d^3\mathbf{x}\ f_1(x) \overset{\leftrightarrow}{\partial_t}\ {}^\mathrm{out}\langle g_1,g_2,\ldots,g_m | T[A(x_1)\cdots A(x_k)A(x)] | f_2,\ldots,f_n\rangle^{\mathrm{in}}.\] The purpose of this seemingly irrelevant move is that it enables our next trick. First, note that if we were to replace this new version of the limit by one in which \(t\) goes to positive \(\infty\), then the time ordering would place the \(A(x)\) on the left, giving \[\begin{aligned} &\frac{i}{Z^{1/2}} \lim_{t\to\infty} \int d^3\mathbf{x}\ f_1(x) \overset{\leftrightarrow}{\partial_t}\ {}^\mathrm{out}\langle g_1,g_2,\ldots,g_m | A(x) T[A(x_1)\cdots A(x_k)] | f_2,\ldots,f_n\rangle^{\mathrm{in}}\\ &\hspace{3em}= Z^{-1/2} \lim_{t\to\infty} {}^\mathrm{out}\langle g_1,g_2,\ldots,g_m | \alpha_{f_1}^\dagger(t) T[A(x_1)\cdots A(x_k)] | f_2,\ldots,f_n\rangle^{\mathrm{in}}.\end{aligned}\]

Now, the bra expression \({}^\mathrm{out}\langle g_1,g_2,\ldots,g_m | \alpha_{f_1}^\dagger(t)\) is the dual of the ket \(\alpha_{f_1}(t)|g_1,g_2,\ldots,g_m\rangle^{\mathrm{out}}\), and in the limit \(t\to\infty\), \(\alpha_{f_1}(t)\) becomes an “out” annihilation operator. So, because we assumed that \(f_1\) has no overlap with the \(g_j\)’s in momentum space, this whole expression is zero. (If this fact about Fock space annihilation operators isn’t clear, it’s worth convincing yourself of it now.)

This means, in our original limit with \(t\to-\infty\), we are free to subtract the limit with \(t\to\infty\) without affecting the result, giving us \[\begin{aligned} &\frac{i}{Z^{1/2}} \left[\lim_{t\to-\infty} - \lim_{t\to\infty}\right] \int d^3\mathbf{x}\ f_1(x) \overset{\leftrightarrow}{\partial_t}\ {}^\mathrm{out}\langle g_1,g_2,\ldots,g_m | T[A(x_1)\cdots A(x_k)A(x)] | f_2,\ldots,f_n\rangle^{\mathrm{in}}\\ &\hspace{3em}= \frac{i}{Z^{1/2}}\int_{-\infty}^\infty dt\ \partial_t \left\{ \int d^3\mathbf{x}\ f_1(x) \overset{\leftrightarrow}{\partial_t}\ {}^\mathrm{out}\langle g_1,g_2,\ldots,g_m | T[A(x_1)\cdots A(x_k)A(x)] | f_2,\ldots,f_n\rangle^{\mathrm{in}} \right\}.\end{aligned}\]

It is a nice exercise to show that this is equal to \[\frac{i}{Z^{1/2}}\int d^4x\ f_1(x) (\partial_x^2+\mu^2)\ {}^\mathrm{out}\langle g_1,g_2,\ldots,g_m | T[A(x_1)\cdots A(x_k)A(x)] | f_2,\ldots,f_n\rangle^{\mathrm{in}}.\] I encourage you to work this out yourself. The argument uses the fact that \(f_1\) is a solution of the Klein–Gordon equation along with some integration by parts. (In the rigorous version of this story, one can show that the boundary term in the integration by parts is zero; feel free to just assume this.)

This is exactly the expression we were looking for, so this completes the proof.

Understanding the LSZ Formula

Let’s examine the formula that LSZ gives us for the scattering amplitude in a bit more detail. For ease of reference, it was \[\frac{i^{n+m}}{Z^{(n+m)/2}} \int d^4x_1\cdots d^4x_{n+m} \prod_{i=1}^n f_i(x_i) \prod_{j=1}^{m}\overline{g_{j}(x_{n+j})} \prod_{k=1}^{n+m}(\partial_{x_k}^2+\mu^2) G^{(n+m)}_A(x_1,\ldots,x_{n+m}).\]

In the discussion that preceded the proof of the formula, we made a big deal out of the fact that the particles were confined to these non-overlapping wave packets, so that we could assume they were far apart from each other at times far away from \(t=0\). Despite this, it’s common to state the LSZ formula in the limit where these wave packets approach plane waves, replacing the \(f_i\) and \(g_i\) labels on the in and out states with momentum labels. If we do this, replacing \(f_i(x)\) and \(g_i(x)\) with \(e^{-ip_i\cdot x}\), we get \[\begin{aligned} &{}^{\mathrm{out}}\langle p_{n+1},\ldots, p_{n+m}|p_1,\ldots,p_n\rangle^{\mathrm{in}}\\ &\hspace{2em}=\frac{i^{n+m}}{Z^{(n+m)/2}} \int d^4x_1\cdots d^4x_{n+m}\ e^{-i(\sum_{i=1}^n p_i\cdot x_i - \sum_{j=n+1}^{n+m} p_j\cdot x_j)} \prod_{k=1}^{n+m}(\partial_{x_k}^2+\mu^2) G^{(n+m)}_A(x_1,\ldots,x_{n+m}).\end{aligned}\] (If you find the violation of our non-overlapping wave packet assumption troubling, it might be helpful either to think of this version of the formula as just indicating the thing you need to integrate the wave packets against to get the “real” formula, or to think of it as an approximation to what happens if our original functions of momentum are very sharply peaked around a particular value.)

This allows for a nicer way to think about what the LSZ formula is saying. Let’s define the momentum-space time-ordered \(n\)-point function to be the Fourier transform of \(G^{(n)}_A\), that is, \[\widetilde{G}^{(n)}_A(p_1,\ldots,p_n) = \int d^4 x_1\cdots d^4x_n\ e^{-i\sum_{i=1}^n p_i\cdot x_i} G^{(n)}_A(x_1,\ldots,x_n).\] Then, pleasingly, the Fourier transform turns each derivative operator in the LSZ formulas into \(i\) times a multiplication by \(p\). So we get \[\begin{aligned} &{}^{\mathrm{out}}\langle p_{n+1},\ldots, p_{n+m}|p_1,\ldots,p_n\rangle^{\mathrm{in}}\\ &\hspace{3em}= \frac{(-i)^{n+m}}{Z^{(n+m)/2}} \prod_{k=1}^{n+m}(p_k^2-\mu^2) \widetilde{G}^{(n+m)}_A(p_1,\ldots,p_n,-p_{n+1},\ldots,-p_{n+m}).\end{aligned}\]

This version of the formula, in my opinion, makes it quite a bit easier to see what’s going on. Note first that \(\widetilde{G}^{(n+m)}_A\) is defined on the entirety of \((\mathbb{R}^4)^{n+m}\), that is, for any choice of \(p\)’s. Despite this, we are only ever evaluating this expression “on the mass shell,” that is, when each \(p_k^2=\mu^2\). This, of course, means that the factor \((p_k^2-\mu^2)\) appearing in our formula is zero! Since the result is supposed to be finite and nonzero, we conclude that \(\widetilde{G}^{(n+m)}_A\) must have a simple pole at 0 in the variables \((p_k^2-\mu^2)\); the residue at that pole is the scattering amplitude we’re interested in. The behavior of \(\widetilde{G}^{(n+m)}_A\) on the rest of its domain is irrelevant to the scattering amplitude.

The fact that the \((n+m)\)-point function carries much more information than the scattering amplitude perhaps makes more sense when we remember that it depends on our choice of interpolating field \(A\), and this choice is far from unique. In fact, different choices of \(A\) can absolutely change the behavior of \(\widetilde{G}^{(n+m)}_A\) off the mass shell, but the LSZ formula implies that the behavior at the on-mass-shell pole will always be the same.

Finally, it’s worth emphasizing once again that the benefit of having done all this work is that the \(n\)-point functions are actually computable, at least approximately, and so the LSZ formula provides the bridge between the quantities we care about and the computations we are able to perform. It is in the computation of the \(n\)-point functions that the famous Feynman diagrams show up. We’ll take up that part of the story in the next article in this series.