This article is also available as a PDF.

Introduction

This article is part of a series on physics for mathematicians, and the start of what I imagine will be a three- or four-article sequence on quantum field theory. This topic is, in some sense, the main reason I started the broader physics series in the first place. Learning quantum field theory is a project I’ve been engaged in off and on for years, and a lot of the rest of the series was written, at least in part, to serve as sort of prerequisites for it. It took a very long time for me to get my own understanding of quantum field theory up to a point where I felt like I could write about it for this series, and I’m excited to get started.

Quantum field theory has a reputation for being very difficult to learn, and based on my experience I would say that reputation is deserved. Even when compared to ordinary quantum mechanics, the relationship between the formalism and the physical world can be pretty opaque; when I first started learning the topic I would constantly be asking questions like “how are we supposed to interpret the points in this Hilbert space” and “which state is supposed to be the one-particle state,” and I found it very difficult to get clear answers. More than anything else, what I want to do with this series is to try to explain the answers to questions like these in a way that I would have found helpful to me when I was starting out.

As we will mention many, many times in this series, quantum field theory presents a lot of mathematical difficulties, to the point that no one has yet managed to construct a quantum field theory that is both completely rigorous and physically realistic. (One of the Millennium Problems amounts to finding a completely rigorous treatment that accounts for just one feature of the Standard Model.) In this series, I have not been too concerned with filling in all the formal details involved in building a mathematically rigorous models of physics. There are other sources that do that much better than I could, and also I think that when you are learning a physical concept for the first time it’s better to get a sense of how the model is “supposed to work” before worrying about how to prove everything. But quantum field theory presents the additional difficulty that, for the most part, this rigorous model doesn’t exist at all; the physicists’ plausibility arguments and cavalier attitude toward divergent quantities are actually all we have.

There is a lot to say about how far we can get rigorously and what the precise nature of the difficulty is, but in attempting to understand this material myself I’ve concluded that it’s not worth diving into these questions until one understands, on an informal level, what kind of object we are actually attempting to construct. To that end, we’ll start by going through an example in which most of the mathematical issues won’t get in our way, and so we will be able to focus on understanding on a more intuitive level what all the objects are and what their physical significance is supposed to be.

The prerequisites for understanding this piece are necessarily broader than some of the earlier articles in this series. You should have a good understanding of ordinary, nonrelativistic quantum mechanics, as well as special relativity. It might also be helpful, especially for the later articles in the quantum field theory sequence, to have some understanding of how to describe classical field theories in terms of a Lagrangian. This is done to some extent in the first half of the article in this series on classical electromagnetism, but not quite in the form that would be most useful for this piece. I may make a supplement on this at some point, but also I will do my best to explain what from that group of ideas is necessary in each moment to follow what is going on.

I’ve gotten in the habit of ending the introductory section of each of these articles with a list of books I found useful when learning the topic at hand. Because of how much time I’ve spent reading quantum field theory books, I think it would be best to break that off into a separate “bibliography” article rather than list them all here. For now, I will mention that a lot of the story we are going to tell in this article is explained very well in Chapter 5 of Gerald Folland’s book Quantum Field Theory: A Tourist Guide for Mathematicians. That chapter is a great place to look if you would like a more careful and rigorous treatment than the one you’re about to read here.

I am grateful to Harry Altman, Grant Sanderson, Jordan Watkins, and Mithuna Yoganathan for looking over earlier versions of this article.

Notation and Conventions

Throughout this article, we’ll use a calligraphic \(\mathcal{H}\) to represent the Hilbert space that the states of a quantum theory live in. We’ll use the physicists’ convention \(\langle \psi'|\psi\rangle\) to denote the inner product on \(\mathcal{H}\), and we’ll also often use the “ket” notation \(|\psi\rangle\) to refer to individual elements of \(\mathcal{H}\). Whenever there is an observable that we want to discuss in both its classical and quantum versions, we’ll use a hat to denote the quantum version. For example \(H\) will be the classical Hamiltonian while \(\widehat H\) will be the quantum Hamiltonian. We’ll follow the physicists’ convention of writing \(O^\dagger\) for the adjoint of an operator \(O\).

We’ll have occasion to think about quantum states and operators in both the Schrödinger picture, where the states depend on time and the operators don’t, and the Heisenberg picture, where it’s the other way around. As a reminder, in the Schrödinger picture, states evolve in time according to \(|\psi(t)\rangle = e^{-itH}|\psi(0)\rangle\); in the Heisenberg picture, operators evolve in time according to \(O(t) = e^{itH} O(0) e^{-itH}\).

The theory we will be considering in our running example is relativistic, which means we need to set some conventions for the mathematical objects arising from special relativity. We’ll use the symbol \(\cdot\) for both the Euclidean inner product on \(\mathbb{R}^3\) and the Lorentzian inner product on \(\mathbb{R}^4\). We’ll use the “mostly minus” convention for the inner product on \(\mathbb{R}^4\), where \[(t,x,y,z)\cdot(t',x',y',z')=tt'-xx'-yy'-zz'.\] For \(\mathbf{x}\in\mathbb{R}^3\) we will write \(|\mathbf{x}|^2=\mathbf{x}\cdot\mathbf{x}\), but for \(x\in\mathbb{R}^4\), we’ll write \(x^2=x\cdot x\), where the lack of absolute value signs serves as a reminder that this quantity can be negative.

We will follow the common convention of using boldface letters for vectors in \(\mathbb{R}^3\) and normal italic letters for scalars or for vectors in \(\mathbb{R}^4\); in particular, we will sometimes write \(x=(t,\mathbf{x})\) for \(x\in\mathbb{R}^4\), \(t\in\mathbb{R}\), and \(\mathbf{x}\in\mathbb{R}^3\). We’ll use the convention, somewhat more common among physicists than mathematicians, of writing \(\int d^3\mathbf{x} f(\mathbf{x})\) for integrals over \(\mathbb{R}^3\) and \(\int d^4x f(x)\) for integrals over \(\mathbb{R}^4\).

The conventions we’ve already described entail choosing units where \(c=\hbar=1\); we’ll do this throughout.

What is a Quantum Field Theory, Vaguely

Before starting on our central example, it’s worth briefly discussing, on a high level, what it is that we’re trying to do when we set out to build a quantum field theory. The discussion in this section is going to be vague and qualitative; we will make each of these points much more precise in the context of our example.

The transition from quantum mechanics to quantum field theory requires a much smaller conceptual leap than the transition from classical to quantum mechanics. At least conceptually — whether or not this program can be carried out rigorously is a separate question — most of the basic structure carries over more or less unchanged: the states of a system are represented by nonzero vectors in a separable Hilbert space \(\mathcal{H}\) (up to a scalar multiple) and observables are represented by self-adjoint operators on \(\mathcal{H}\). Just as in quantum mechanics, we can describe the laws of physics either with the “canonical” framework, where we pick a Hamiltonian and use it to construct time translation operators, or with the path integral framework, where we instead start from a Lagrangian.

The thing that makes it a quantum field theory is that the physical system we are attempting to quantize is, well, a field theory. In ordinary quantum mechanics, we usually look at systems with finitely many degrees of freedom, like the three coordinates of a particle. (The Hilbert space that we construct when we quantize the theory might still be infinite-dimensional, but the classical theory that we’re quantizing has a finite-dimensional state space.) In a field theory, the degrees of freedom are indexed by a continuous variable, even in the classical setting.

There are a couple of reasons one might want to do this. The first is quite straightforward: there are a lot of classical physical phenomena, most prominently electromagnetism, which are best described in terms of fields, and it seems sensible to want a quantum version of them. The second is a bit subtler, but also quite important. Special relativity famously forbids any situation where a signal can travel faster than light. The quantum version of this restriction ought then to imply that measurements performed at two points \(x\) and \(y\) in spacetime ought not to interfere with each other if no slower-than-light signal could get from \(x\) to \(y\), that is, if the vector \(x-y\) is spacelike. In order to formalize this restriction, we need some way to say which observables — in the form of self-adjoint operators on the Hilbert space — “take place at” a particular point \(x\). This alone ought to encourage us to try to cast as many physical situations as possible in terms of fields; after all, a field is precisely composed of an association between points in spacetime and observable quantities.

Just as each position coordinate in ordinary quantum mechanics gives rise to a different observable, we will need an observable for the value of the field at every point in spacetime. In the example we are about to consider, we will be looking at a real scalar field, which in the classical setting amounts to a real-valued function \(\phi\) on \(\mathbb{R}^3\) which changes over time. At a given moment in time, specifying the state amounts to specifying the value of \(\phi\) at each point \(\mathbf{x}\) in space (as well as its time derivative, because the equation of motion will be given by a second-order differential equation). Therefore, in the quantum version of this theory, we will want an observable \(\widehat\phi(\mathbf{x})\) for each point \(\mathbf{x}\in\mathbb{R}^3\), which will correspond to the question “what is the value of the field \(\phi\) at the point \(\mathbf{x}\) right now?”

In ordinary quantum mechanics, when the states of the classical system correspond to points in some \(\mathbb{R}^N\), it’s common to use \(\mathcal{L}^2(\mathbb{R}^N)\) as the Hilbert space for the quantum version of the system. It’s natural to imagine doing something similar for a quantum field theory, where rather than assigning a complex number to each point in \(\mathbb{R}^N\), we would assign a complex number to each field configuration, that is, to each function \(\phi:\mathbb{R}^3\to\mathbb{R}\). This object is sometimes called a wavefunctional; our Hilbert space might then consist of all wavefunctionals which are \(\mathcal{L}^2\) with respect to some suitably chosen measure on the space of field configurations \(\phi\).

This can be a decent model to hold onto if you would like a more concrete mental picture of what the states in a quantum field theory represent, but it’s not the standard approach. This is for a couple of reasons. First, while it might seem like the most direct way to build a state space for a quantum field, actually constructing the space of wavefunctionals rigorously is very analytically tricky, and, like all approaches to formalizing quantum field theory, to my knowledge it can only be made to work in the case of a free field. Second, even if it could be constructed (which, after all, physicists are happy to assume whether or not mathematicians agree), it isn’t an especially useful perspective for actually computing anything physical. The states that correspond to particles colliding with each other — that is, the states that correspond to experiments we can actually run — are not usefully described in terms of assigning amplitudes to field configurations. Additionally, all the field theories we’re going to look at will respect the symmetries of special relativity, and it is difficult to describe the action of Lorentz boosts in terms of wavefunctionals as well.

All this means that the wavefunctional approach is basically never used by physicists, and we will go along with them in not emphasizing it here. If you’d like a wavefunctional-oriented exposition from a nonrigorous perspective, there is a paper by Roman Jackiw called “Analysis on infinite-dimensional manifolds: Schrödinger representation for quantized fields,” and it is also covered in Chapter 10 of the book Quantum Field Theory of Point Particles and Strings by Brian Hatfield.

Instead, especially in the interacting theories that we will consider after the example presented in this article, we will treat the Hilbert space much more abstractly. We will care about the existence of certain operators on it, their commutation relations, and their relationship to a few special states in the Hilbert space, but we will never care how any individual state might be represented as a wavefunctional. If we can manage to construct a Hilbert space and some operators on it with the right properties, we will consider ourselves to have successfully built ourselves a quantum field theory, even if it is difficult to interpret the states directly in terms of field configurations.

The Klein–Gordon Field

As we said in the introduction, most of this article will involve going through one example in detail. The example we’ll use is also the one that a lot of textbooks choose to start with: that of a free real massive scalar field (the meaning of all of these adjectives should be apparent soon). As the name “free” suggests, nothing all that physically interesting happens in the theory we’re about to construct.

Nevertheless, this construction is nice to go through for two reasons. First, unlike for basically any physically realistic quantum field theory, in the free case it’s possible to actually construct the Hilbert space and all the relevant operators rigorously and solve all the resulting equations analytically, which should make it easier to understand what role all the pieces play. Second, because we won’t be able to solve anything analytically in the so-called “interacting” case, a lot of our analysis there will come from viewing it as a small perturbation of the free case, where we expand the relevant quantities as a power series in some parameter \(\lambda\) in which \(\lambda=0\) corresponds to the free theory. It will therefore be very useful to understand the free theory even if you’re only interested in the interacting theory, since it will form the center of this power series expansion.

We’ll start by saying a bit about the classical field theory we’ll be quantizing. We’ll say that a function \(\phi:\mathbb{R}^4\to\mathbb{R}\) is a Klein–Gordon field if it satisfies the Klein–Gordon equation: \[(\partial^2+m^2)\phi=0,\] where \(\partial^2=\partial_t^2-\partial_x^2-\partial_y^2-\partial_z^2\). The fact that \(\phi\) takes values in \(\mathbb{R}\) is what makes it a real scalar field, and the \(m^2\phi\) term is what makes it massive — we will see once we have finished the whole derivation that \(m\) will be the mass of a particle.

It’s important to remember at the outset that we’re thinking of \(\phi\) as a classical field. It is not in any sense a wavefunction — nothing quantum has happened yet! It might help instead to imagine it as a simplified version of the (classical) electromagnetic field, where the field takes scalar rather than vector values and where the differential equation has an extra \(m^2\phi\) term on the end. A good mental picture should be that there is a real number associated with every point in space and these numbers change as we run time forward. Because the Klein–Gordon equation is a second-order differential equation, the entire history of the field is determined if we specify \(\phi\) and \(\partial_t\phi\) on a “time slice” in \(\mathbb{R}^4\), for example the set of points for which \(t=0\).

One nice feature of the Klein–Gordon equation is that it’s Lorentz-invariant: if \(\phi\) is a solution, then so is any translation, rotation, or Lorentz boost of \(\phi\). Unfortunately, because we are about to quantize this theory, we’re going to have to describe the physics in terms of a Hamiltonian, which basically requires us to break the Lorentz symmetry, since the entire machinery of Hamiltonian mechanics requires us to choose a forward time direction in which to evolve our states. (Equivalently, we could say the symmetry is broken by the fact that the Hamiltonian represents energy, which is the time component of energy-momentum.) For now, this is unavoidable; we’ll see how the Lorentz symmetry reemerges at the end.

What would it mean to quantize this theory? Suppose we had a classical system with finitely many degrees of freedom, say \(x_1,\ldots,x_d\), whose dynamics were governed by a Lagrangian \(L(x_1,\ldots,x_d,\dot{x}_1,\ldots,\dot{x}_d)\). We can construct a Hamiltonian \(H(x_1,\ldots,x_d,p_1,\ldots,p_d)\), which is a function of both the original \(x_j\)’s and their conjugate momentum variables \(p_j=\partial L/\partial\dot{x}_j\); the formula for the Hamiltonian is \(H=\sum_{j=1}^dp_j\dot{x}_j-L\). To quantize this system, we want to find some Hilbert space \(\mathcal{H}\) and some self-adjoint operators \(X_1,\ldots,X_d,P_1,\ldots,P_d\) for which (after setting \(\hbar=1\)) \([X_j,P_k]=i\delta_{jk}\) and \([X_j,X_k]=[P_j,P_k]=0\); these are the canonical commutation relations. We then write the Hamiltonian in terms of these \(X_j\)’s and \(P_j\)’s, look at the Schrödinger equation, and so on.

Rather than finitely many degrees of freedom, the state of a Klein–Gordon field at any instant in time is given by specifying the value of \(\phi\) at every point in space. That is, we have a whole continuum of degrees of freedom, and they’re indexed by points in space rather than by the numbers \(1,\ldots,d\). But we can still try to carry out something like the program from the previous paragraph and see where we end up.

Our first job will be to write our dynamical law in terms of a Lagrangian. By analogy with the discrete situation, the Lagrangian should be a function of both the values and time derivatives of \(\phi\) at a fixed point in time — that is, it should eat two real-valued functions on \(\mathbb{R}^3\) and spit out a number. The Lagrangian that produces the Klein–Gordon equation turns out to be \[\begin{aligned} L(\phi,\partial_t\phi) &= \frac12\int d^3\mathbf{x} \left[(\partial\phi)^2 - m^2\phi^2\right]\\ &= \frac12\int d^3\mathbf{x} \left[(\partial_t\phi(\mathbf{x}))^2 - |\nabla\phi(\mathbf{x})|^2 - m^2\phi(\mathbf{x})^2\right]\end{aligned}\] (The proof that this is the right Lagrangian is probably out of scope for this particular article, although it may end up appearing in a later version of it. The final result is more important than the derivation here, so for now we will skip this and a few other steps.)

The next step is to write down the conjugate momentum variables to the \(\phi(\mathbf{x})\)’s, which turn out to be \(\pi(\mathbf{x})=\partial_t\phi(\mathbf{x})\). This will let us write our Hamiltonian: \[\begin{aligned} H &= \left(\int d^3\mathbf{x}\ \pi(\mathbf{x})\partial_t\phi(\mathbf{x})\right) - L\\ &= \frac12\int d^3\mathbf{x} \left[\pi(\textbf{x})^2 + |\nabla\phi(\mathbf{x})|^2 + m^2\phi(\mathbf{x})^2\right]\end{aligned}\] It’s worthwhile to clear up a common point of confusion about \(\pi\) right away. The variable \(\pi(\mathbf{x})\) is like a “momentum” in the sense that it arises as a conjugate momentum variable to \(\phi(\mathbf{x})\) in this process of going from Lagrangian to Hamiltonian mechanics. It is certainly not a momentum in the usual physical sense of the conserved quantity corresponding to spatial translations; that quantity will appear later, but it doesn’t have much to do with \(\pi\).

We can now state our goal very explicitly. We would like to find a Hilbert space \(\mathcal{H}\) with self-adjoint operators \(\widehat\phi(\mathbf{x})\) and \(\widehat\pi(\mathbf{x})\) for each \(\mathbf{x}\in\mathbb{R}^3\). (We will, for the moment, work in the Schrödinger picture, where the states depend on time and the operators don’t.) Everywhere we had a sum over the indices on the \(x\)’s and \(p\)’s in the discrete case we now have an integral over \(\mathbf{x}\), so the canonical commutation relations become \[[\widehat\phi(\mathbf{x}),\widehat\pi(\mathbf{y})] = i\delta(\mathbf{x}-\mathbf{y}),\qquad [\widehat\phi(\mathbf{x}),\widehat\phi(\mathbf{y})] = [\widehat\pi(\mathbf{x}),\widehat\pi(\mathbf{y})] = 0.\] Finally, we want to take the above expression for the Hamiltonian and translate it into an operator on our Hilbert space. If we manage to do all that, we will have a complete quantum version of our field theory.

In the discrete case, where we start with \(d\) degrees of freedom, it’s relatively straightforward to construct the right Hilbert space and operators: we can take \(\mathcal{H}=\mathcal{L}^2(\mathbb{R}^d)\), on which \(X_j\) is multiplication by \(x_j\) and \(P_j=-i(\partial/\partial x_j)\). Indeed, there is a result called the Stone–von Neumann Theorem which implies that, under an appropriate set of hypotheses, any representation of the canonical commutation relations on a Hilbert space is isomorphic to this one.

The situation is much more complicated in field theory. There is not a particularly obvious analogue of the recipe we just described in which all the integrals can be made to actually converge. Furthermore, there is no Stone–von Neumann Theorem in the continuous setting; there are lots of non-isomorphic representations of the canonical commutation relations.

Finally, even in the free-field setting where we will be able to construct everything, there is an additional difficulty owing the presence of delta functions in this whole discussion. Naively treating \(\widehat\phi(\mathbf{x})\) as an operator on \(\mathcal{H}\) will turn out to produce an operator that, when applied to essentially any vector in \(\mathcal{H}\) at all, would produce a vector of infinite norm. The solution (in the cases where it can be made to work at all) will be to treat \(\widehat\phi(\mathbf{x})\) as an operator-valued distribution, that is, \(\int d^3\mathbf{x}\ f(\mathbf{x})\widehat\phi(\mathbf{x})\), rather than \(\widehat\phi(\mathbf{x})\) itself, will be a well-defined operator on \(\mathcal{H}\) for sufficiently well-behaved functions \(f\). Just as with ordinary delta functions, this introduces difficulties when we try to multiply distributions; we will, for example, need to be very careful about what we mean by \(\widehat\phi(\mathbf{x})^2\) in our expression for the Hamiltonian.

With the discrete version of this story in mind, it’s tempting to wonder why can’t just stuff everything into some giant \(\mathcal{L}^2\) as usual and call it a day. So I mention all these difficulties now not because the details of the issues involved are supposed to be especially clear at this point, but rather to impress on the reader why it is important to be careful how we carry out this quantization procedure. We will instead take a somewhat more roundabout route to building the Hilbert space that our quantum theory will take place in. In addition to being much easier to construct, this method will make it very easy to find all the eigenstates of the Hamiltonian, which means the physical content of the theory will be especially transparent. Let’s see how this works.

Harmonic Oscillators and Fock Space

It will be simpler if we start by confining everything to a cube \(A=[-\frac12L,\frac12L]^3\subseteq\mathbb{R}^3\) and impose periodic boundary conditions; we’ll look at what happens when \(L\to\infty\) at the very end. With that restriction in place, the functions \(f_\mathbf{p}(\mathbf{x}) = L^{-3/2}e^{i\mathbf{p}\cdot\mathbf{x}}\) form an orthonormal basis of (complex-valued) eigenfunctions for the operator \(\nabla^2=\partial_x^2+\partial_y^2+\partial_z^2\), where \(\mathbf{p}\) ranges over the lattice \(\Lambda\) of all vectors in \(\mathbb{R}^3\) whose coordinates are integer multiples of \(2\pi/L\). The eigenvalue associated to \(f_{\mathbf{p}}\) is \(|\mathbf{p}|^2\).

Take an arbitrary function \(\phi:\mathbb{R}\times A\to\mathbb{R}\) and, for each time \(t\), expand the corresponding time slice of \(\phi\) in terms of this basis. That is, write \[\phi(t,\mathbf{x})=L^{-3/2}\sum_{\mathbf{p}\in\Lambda}q_{\mathbf{p}}(t)e^{i\mathbf{p}\cdot\mathbf{x}}.\] (The fact that \(\phi\) is real-valued means that \(q_{\mathbf{p}}\) and \(q_{-\mathbf{p}}\) have to be complex conjugates.) I then encourage you to check that \(\phi\) satisfies the Klein–Gordon equation equation if and only if the \(q_{\mathbf{p}}\)’s satisfy \[q_{\mathbf{p}}''(t)+\omega_{\mathbf{p}}^2q_{\mathbf{p}}(t)=0,\] where \(\omega_{\mathbf{p}}=\sqrt{|\mathbf{p}|^2+m^2}\).

In other words, a Klein–Gordon field in a box is the same as a collection of countably many uncoupled harmonic oscillators, one for each \(\mathbf{p}\in\Lambda\), where the period of the oscillator corresponding to \(\mathbf{p}\) is \(2\pi/\omega_{\mathbf{p}}\). This will be the key observation that will guide us on our quest to quantize the Klein–Gordon equation: our ultimate goal will be to replace each of these classical harmonic oscillators with a quantum harmonic oscillator, and the fact that this is possible will turn out to be the reason free fields are so much easier to analyze than interacting fields.

The first question we need to address is which Hilbert space the states of our quantized theory should live in. To answer this, suppose for just a moment that we had only a finite number of harmonic oscillators, say \(N\) of them, with periods \(2\pi/\omega_1,\ldots,2\pi/\omega_N\). Then we could take the Hilbert space to be \(\mathcal{L}^2(\mathbb{R}^N)\), with the Hamiltonian \[\widehat H=\frac12\sum_{j=1}^N\left(P_j^2+\omega_j^2X_j^2\right).\]

The reader should recall that it is possible to find a basis of eigenfunctions for this Hamiltonian by using annihilation and creation operators, defined as \[A_j=\sqrt{\frac{\omega_j}{2}}X_j + i\frac{1}{\sqrt{2\omega_j}}P_j,\qquad A_j^\dagger=\sqrt{\frac{\omega_j}{2}}X_j - i\frac{1}{\sqrt{2\omega_j}}P_j,\] which makes \[X_j = \frac{1}{\sqrt{2\omega_j}}(A_j+A_j^\dagger),\qquad P_j=-i\sqrt{\frac{\omega_j}{2}}(A_j-A_j^\dagger).\] Every eigenfunction of \(\widehat H\) is then of the form \[\psi_{n_1,\dots,n_N}(x) = \frac{(A_1^\dagger)^{n_1}\cdots (A_N^\dagger)^{n_N}}{\sqrt{n_1!\cdots n_N!}}\psi_0(x),\] where \[\psi_0(x)=\prod_{j=1}^N\left[\left(\frac{\omega_j}{\pi}\right)^{\frac14}\exp\left(-\frac12\omega_jx_j^2\right)\right]\] is the lowest-energy eigenfunction. The Hamiltonian itself can then be written as \(\widehat H=\sum_j\omega_j(A_j^\dagger A_j+\frac12)\).

This suggests an alternative way of describing our Hilbert space, one that will turn out to generalize nicely to the case of infinitely many oscillators. Consider the Hilbert space \(\mathcal{F}_N\) with an orthonormal basis indexed by all \(N\)-tuples of nonnegative integers. Writing \(|n_1,\ldots,n_N\rangle\) for one of these basis vectors, define operators \(A_j\) and \(A_j^\dagger\) on \(\mathcal{F}_N\) via the rules \[\begin{aligned} A_j|n_1,\ldots,n_N\rangle =& \sqrt{n_j}\,|n_1,\ldots,n_j-1,\ldots,n_N\rangle,\\ A_j^\dagger|n_1,\ldots,n_N\rangle =& \sqrt{n_j+1}\,|n_1,\ldots,n_j+1,\ldots,n_N\rangle,\end{aligned}\] with the understanding that if any of the \(n_j\)’s becomes negative then the corresponding vector is zero. I encourage you to check that, as the notation suggests, these two operators are indeed adjoints of each other, and that sending \(\psi_{n_1,\ldots,n_N}\) to \(|n_1,\ldots,n_N\rangle\) gives an isomorphism of Hilbert spaces from \(\mathcal{L}^2(\mathbb{R}^N)\) to \(\mathcal{F}_N\) which takes the operators \(A_j\) and \(A_j^\dagger\) on \(\mathcal{L}^2(\mathbb{R}^N)\) to the operators on \(\mathcal{F}_N\) of the same names. This just amounts to checking that the relations in the two equations above are also true of the \(\psi\)’s, for which it will probably be helpful to first prove the commutation relations \([A_j,A_k^\dagger]=\delta_{jk}\) and \([A_j,A_k]=[A_j^\dagger,A_k^\dagger]=0\).

This space \(\mathcal{F}_N\) is a special case of an object called a bosonic Fock space, which it’s worth taking a bit of time to define abstractly.

Let \(\mathcal{H}\) be any separable Hilbert space, and, for any \(k\ge 0\), consider the symmetric power \(\mathrm{Sym}^k\mathcal{H}\), that is, the quotient of \(\mathcal{H}^{\otimes k}\) by the relations \[v_1\otimes\cdots\otimes v_i\otimes v_{i+1}\otimes\cdots\otimes v_k = v_1\otimes\cdots\otimes v_{i+1}\otimes v_i\otimes\cdots\otimes v_k.\] We will follow the usual convention of omitting the tensor product symbol when talking about elements of \(\mathrm{Sym}^k\mathcal{H}\), for example writing \(v_1v_2\cdots v_k\) instead of \(v_1\otimes v_2\otimes\cdots\otimes v_k\). We can use the inner product on \(\mathcal{H}\) to define one on \(\mathrm{Sym}^k\mathcal{H}\) by setting \[\langle v_1\cdots v_k|w_1\cdots w_k\rangle = \sum_{\sigma\in S_k}\langle v_1|w_{\sigma(1)}\rangle\cdots\langle v_k|w_{\sigma(k)}\rangle,\] where the sum is over all permutations of \(\{1,\ldots,k\}\). If \(e_1,e_2,\ldots\) is an orthonormal basis for \(\mathcal{H}\), then the set of all monomials \((1/\sqrt{n_1!n_2!\cdots})e_1^{n_1}e_2^{n_2}\cdots\) with \(\sum n_i=k\) forms an orthonormal basis for \(\mathrm{Sym}^k\mathcal{H}\).

The bosonic Fock space is then defined to be the Hilbert space \[\mathcal{F}(\mathcal{H})=\bigoplus_{k=0}^\infty \mathrm{Sym}^k\mathcal{H}.\] This should be taken as an orthogonal direct sum, that is, vectors in different summands are orthogonal to each other. The first summand \(\mathrm{Sym}^0\mathcal{H}\) is taken to be \(\mathbb{C}\). Recall that the Hilbert space direct sum is defined as the completion of the vector space direct sum; that is, an arbitrary element of \(\mathcal{F}(\mathcal{H})\) is a formal sum \(\sum_{k=0}^\infty s_k\) where each \(s_k\in\mathrm{Sym}^k\mathcal{H}\) and \(\sum_{k=0}^\infty |s_k|^2<\infty\).

The space \(\mathcal{F}_N\) where we put the states of our finite system of harmonic oscillators can then be recovered as \(\mathcal{F}(\mathbb{C}^N)\). If \(e_1,\ldots,e_N\) is a basis for \(\mathbb{C}^N\), then the state we called \(|n_1,\ldots,n_N\rangle\) above can be identified with \((1/\sqrt{n_1!\cdots n_N!})e_1^{n_1}\cdots e_N^{n_N}\in \mathrm{Sym}^{n_1+\cdots+n_N}\mathbb{C}^N\subseteq \mathcal{F}(\mathbb{C}^N)\).

We can define annihilation and creation operators from this perspective. For any \(v\in\mathcal{H}\), we can define \(A^\dagger_v:\mathrm{Sym}^k\mathcal{H}\to\mathrm{Sym}^{k+1}\mathcal{H}\) by the rule \[A^\dagger_vw_1\cdots w_k=vw_1\cdots w_k.\] The adjoint \(A_v\) is then given by \[A_vw_1\cdots w_{k+1}=\sum_{i=1}^{k+1}\langle v,w_i\rangle w_1\cdots\widehat w_i\cdots w_{k+1},\] where the hat on a \(w_i\) means we leave that factor out of the product. I encourage you to check that this agrees with the rules we gave above.

This setup is straightforward to generalize to the case of infinitely many harmonic oscillators: we can simply consider the Fock space \(\mathcal{F}(\mathcal{H})\), where \(\mathcal{H}\) is a Hilbert space with a countable orthonormal basis with one vector for each oscillator. This will be the state space we’ll want to use for the system of countably many harmonic oscillators arising from our quantum Klein–Gordon field.

Putting the Klein–Gordon Field in Fock Space

Our next task will be to relate this description to the one in terms of fields that we started with. Earlier we wrote an arbitrary solution of the classical Klein–Gordon equation in the form \[\phi(t,\mathbf{x})=L^{-3/2}\sum_{\mathbf{p}\in\Lambda}q_{\mathbf{p}}(t)e^{i\mathbf{p}\cdot\mathbf{x}},\] where each \(q_{\mathbf{p}}(t)\) is a harmonic oscillator with period \(2\pi/\omega_{\mathbf{p}}\). Our goal, one again, will be to take each of these classical harmonic oscillators and replace it with a quantum harmonic oscillator.

There are two small wrinkles in this process that we’ll need to be somewhat careful about. First, \(q_{\mathbf{p}}\) is a complex-valued function, whereas the thing we know how to quantize is a real-valued harmonic oscillator. Second, the fact that \(\phi\) is real-valued means that \(q_{-\mathbf{p}}=\overline{q_{\mathbf{p}}}\), and so when we index the degrees of freedom by \(\mathbf{p}\) we are double-counting.

We can deal with the first issue by splitting \(q_{\mathbf{p}}\) into real and imaginary parts, writing \(q_{\mathbf{p}}=r_{\mathbf{p}}+is_{\mathbf{p}}\). Note that \(r_{\mathbf{p}}\) and \(s_{\mathbf{p}}\) are then independent real-valued harmonic oscillators with the same period as \(q_{\mathbf{p}}\). It will be more convenient to deal with the second issue about double-counting after quantizing than before; for now, we’ll just need to remember that \(r_{\mathbf{p}}=r_{-\mathbf{p}}\) and \(s_{\mathbf{p}}=-s_{-\mathbf{p}}\).

So we ought to work in the Fock space \(\mathcal{F}(\mathcal{H})\), where \(\mathcal{H}\) is a Hilbert space with an orthonormal basis consisting of one vector \(v_{\mathbf{p}}\) for each \(r_{\mathbf{p}}\) and one vector \(w_{\mathbf{p}}\) for each \(s_{\mathbf{p}}\), modulo the relations \(v_{\mathbf{p}}=v_{-\mathbf{p}}\) and \(w_{\mathbf{p}}=-w_{-\mathbf{p}}\). Let’s write \(B_{\mathbf{p}},B_{\mathbf{p}}^\dagger\) for the annihilation and creation operators corresponding to \(v_{\mathbf{p}}\) and \(C_{\mathbf{p}},C_{\mathbf{p}}^\dagger\) for the ones corresponding to \(w_{\mathbf{p}}\). By looking back at our earlier expressions relating \(X_j\) and \(P_j\) to \(A_j\) and \(A_j^\dagger\), I encourage you to convince yourself that in terms of these new operators our double-counting takes the form \(B_{\mathbf{p}}=B_{-\mathbf{p}}\) and \(C_{\mathbf{p}}=-C_{-\mathbf{p}}\).

We can now write our field in terms of these creation and annihilation operators. Our \(r_{\mathbf{p}}\) and \(s_{\mathbf{p}}\) are the analogues of the \(x_j\)’s from the version with finitely many harmonic oscillators, so they should correspond to operators \(R_{\mathbf{p}}=\frac{1}{\sqrt{2\omega_{\mathbf{p}}}}(B_{\mathbf{p}}+B_{\mathbf{p}}^\dagger)\), and analogously for \(s\). So, if we start with the classical field \[\phi(t,\mathbf{x})=L^{-3/2}\sum_{\mathbf{p}\in\Lambda}\left[r_{\mathbf{p}}(t)+is_{\mathbf{p}}(t)\right]e^{i\mathbf{p}\cdot\mathbf{x}}\] and make these substitutions, we get the operator \[\widehat\phi(\mathbf{x})=L^{-3/2}\sum_{\mathbf{p}\in\Lambda}\frac{1}{\sqrt{2\omega_{\mathbf{p}}}}\left[B_{\mathbf{p}}+B_{\mathbf{p}}^\dagger+iC_{\mathbf{p}}+iC_{\mathbf{p}}^\dagger\right]e^{i\mathbf{p}\cdot\mathbf{x}}.\]

This expression will turn out to give us a nice way to solve our double-counting problem. If we write \(A_{\mathbf{p}}=B_{\mathbf{p}}+iC_{\mathbf{p}}\), note that there is no longer any redundancy — unlike the \(B\)’s and \(C\)’s, the \(A\)’s are all linearly independent. A bit of playing around with the sum above will turn it into \[\widehat\phi(\mathbf{x})=L^{-3/2}\sum_{\mathbf{p}\in\Lambda}\frac{1}{\sqrt{2\omega_{\mathbf{p}}}}\left[A_{\mathbf{p}}e^{i\mathbf{p}\cdot\mathbf{x}} + A_{\mathbf{p}}^\dagger e^{-i\mathbf{p}\cdot\mathbf{x}}\right],\] which is the expression we’ll want for \(\widehat\phi\).

This move corresponds to a change of basis in the Hilbert space \(\mathcal{H}\) that we used to build the Fock space we are working in: if we write \(e_{\mathbf{p}}=v_{\mathbf{p}}-iw_{\mathbf{p}}\), then the \(A\)’s are the annihilation and creation operators with respect to this new basis. This means that we can now completely forget about the \(B\)’s and \(C\)’s in favor of the \(A\)’s! Our state space still describes a system of countably many quantum harmonic oscillators, with the caveat that now they no longer correspond directly to the real-valued classical harmonic oscillators we started with.

There are two options for seeing what the Hamiltonian should look like. The first is to use the expression \(H = \frac12\int d^3\mathbf{x} \left[\pi(\textbf{x})^2 + |\nabla\phi(\mathbf{x})|^2 + m^2\phi(\mathbf{x})^2\right]\) for the classical Hamiltonian mentioned above. A derivation analogous to the one we just did for \(\widehat\phi\) will produce the expression \[\widehat\pi(\mathbf{x})=-iL^{-3/2}\sum_{\mathbf{p}\in\Lambda}\sqrt{\frac{\omega_{\mathbf{p}}}{2}}\left[A_{\mathbf{p}}e^{i\mathbf{p}\cdot\mathbf{x}} - A_{\mathbf{p}}^\dagger e^{-i\mathbf{p}\cdot\mathbf{x}}\right],\] which, after a long and tedious computation, gives us \[\widehat H_{\text{problematic}}=\frac12\sum_{\mathbf{p}\in\Lambda}\omega_{\mathbf{p}}\left[A_{\mathbf{p}}A_{\mathbf{p}}^\dagger + A_{\mathbf{p}}^\dagger A_{\mathbf{p}}\right].\]

The other option is simpler but a bit more disconnected from the field theory we started with: if we start from the expression \(\widehat H=\sum_j\omega_j(A_j^\dagger A_j+\frac12)\) for the Hamiltonian of our system of finitely many harmonic oscillators, we see that the analogous expression for us should be \[\widehat H_{\text{still problematic}}=\sum_{\mathbf{p}\in\Lambda}\omega_{\mathbf{p}}\left[A_{\mathbf{p}}^\dagger A_{\mathbf{p}}+\frac12\right].\]

These expressions are equivalent to each other — one application of the commutation rule \([A_{\mathbf{p}},A_{\mathbf{p}}^\dagger]=1\) will take one to the other — but, as my extremely subtle use of notation indicated, they both suffer from a problem: these sums diverge. Indeed, we have \(\omega_{\mathbf{p}}=\sqrt{|\mathbf{p}|^2+m^2}\ge m\), so the second term of that last sum is obviously trouble! The solution is quite simple, though. On the grounds that adding a constant to the Hamiltonian can never affect the physics, we simply redefine the Hamiltonian to be \[\widehat H=\sum_{\mathbf{p}\in\Lambda}\omega_{\mathbf{p}}A_{\mathbf{p}}^\dagger A_{\mathbf{p}}.\] (If the idea of subtracting off an “infinite constant” feels dodgy, it might make more sense to think of subtracting it all the way back in the finite system of harmonic oscillators, and only then let the number of oscillators go to infinity.)

With this final definition in place we have completed the task of quantizing the Klein–Gordon field, at least the version that’s confined to a box. The states live in the Fock space \(\mathcal{F}(\mathcal{H})\) where \(\mathcal{H}\) is (after the change of basis discussed above) a Hilbert space with an orthonormal basis consisting of one vector \(e_{\mathbf{p}}\) for each \(\mathbf{p}\in\Lambda\). We can then define observables for the value of the field at a point \(\widehat\phi(\mathbf{x})\), its conjugate momentum \(\widehat\pi(\mathbf{x})\), and the Hamiltonian \(\widehat H\) in terms of annihilation and creation operators using the formulas above. Every vector of the form \(e_{\mathbf{p}_1}e_{\mathbf{p}_2}\cdots e_{\mathbf{p}_k}\in\mathrm{Sym}^k\mathcal{H}\subseteq\mathcal{F}(\mathcal{H})\) is an eigenvector of the Hamiltonian, and I encourage you to check that the eigenvalue is \(\sum_{i=1}^k\omega_{\mathbf{p}_i}\). (In particular, it’s finite, so our redefinition of \(\widehat H\) was successful!)

The structure of the Hilbert space we ended up with suggests an interpretation we can attach to the states: we think of the states in \(\mathrm{Sym}^k\mathcal{H}\subseteq\mathcal{F}(\mathcal{H})\) as consisting of \(k\) particles. The unique state in \(\mathrm{Sym}^0\mathcal{H}\cong\mathbb{C}\) is written \(|0\rangle\) and is called the vacuum. Because every \(A_{\mathbf{p}}\) kills this state, it is an eigenvector of \(\widehat H\) with eigenvalue 0, which makes it the unique lowest-energy state in \(\mathcal{F}(\mathcal{H})\).

The one-particle states are spanned by vectors of the form \(e_{\mathbf{p}}=A^\dagger_{\mathbf{p}}|0\rangle\in\mathrm{Sym}^1\mathcal{H}\), and such a state is an eigenvector of \(\widehat H\) with eigenvalue \(\omega_{\mathbf{p}}\). This is in fact part of what makes the particle interpretation a natural one to use: because we defined \(\omega_{\mathbf{p}}=\sqrt{|\mathbf{p}|^2+m^2}\), if we combine \(\mathbf{p}\) and \(\omega_{\mathbf{p}}\) into a four-vector \(p=(\omega_{\mathbf{p}},\mathbf{p})\) then \(p\) is the energy-momentum of a particle of mass \(m\)! This interpretation also carries over to each \(\mathrm{Sym}^k\mathcal{H}\): a state of the form \(A^\dagger_{\mathbf{p}_1}\cdots A^\dagger_{\mathbf{p}_k}|0\rangle\) is an eigenvector of \(\widehat H\) with eigenvalue \(\omega_{\mathbf{p}_1} + \cdots + \omega_{\mathbf{p}_k}\), which is what we should get for the total energy of a collection of \(k\) noninteracting particles with those momenta.

The fact that these states are all eigenstates of \(\widehat H\) is the sense in which our field theory is a free field theory: any number of particles can coexist forever without interacting with each other. In the interacting field theories we will study later, we will still want the vacuum and the one-particle states to be eigenstates of \(\widehat H\) (at least the ones corresponding to stable particles) but we will not want this for states with more that one particle — we want the particles to interact with each other, which means a state involving at least two particles should change over time! This will turn out to complicate the story considerably.

Fields as Operator-Valued Distributions

There is one technical issue that it is probably worth talking about here. Let’s try to compute the norm of the state \(\widehat\phi(\mathbf{0})|0\rangle\). (Remember that the two zeroes here refer to different things! The former is the point \(\mathbf{0}\in\mathbb{R}^3\), and the latter is the name of the vacuum state.) Using the fact that all the \(A_{\mathbf{p}}\)’s kill the vacuum state, we see that \[\widehat\phi(\mathbf{0})|0\rangle=L^{-3/2}\sum_{\mathbf{p}\in\Lambda}\frac{1}{\sqrt{2\omega_{\mathbf{p}}}}A_{\mathbf{p}}^\dagger|0\rangle.\] Then, using the commutation relations for the \(A\)’s and \(A^\dagger\)’s, the norm can be computed as \[\begin{aligned} \langle 0|\widehat\phi(\mathbf{0})\widehat\phi(\mathbf{0})|0\rangle &= L^{-3}\sum_{\mathbf{p}\in\Lambda}\sum_{\mathbf{q}\in\Lambda}\frac{1}{2\sqrt{\omega_{\mathbf{p}}\omega_{\mathbf{q}}}}\langle 0|A_{\mathbf{p}}A^\dagger_{\mathbf{q}}|0\rangle \\ &= L^{-3}\sum_{\mathbf{p}\in\Lambda}\frac{1}{2\omega_{\mathbf{p}}}.\end{aligned}\]

Because \(1/\omega_{\mathbf{p}}=(|\mathbf{p}|^2+m^2)^{-1/2}\sim|\mathbf{p}|^{-1}\), this sum clearly diverges. That is, while it’s not as bad a divergence as the one that afflicted our first attempt at the Hamiltonian, the operator \(\widehat\phi(\mathbf{x})\) seems to be problematic. This problem is very closely related to the fact that, in ordinary quantum mechanics, position and momentum eigenstates aren’t honest elements of our Hilbert space — in both cases, we are trying to localize a state “infinitely finely” in space and are ending up with a delta-function-like object.

The solution is similar as well: rather than \(\widehat\phi(\mathbf{x})\) being an actual operator on \(\mathcal{F}(\mathcal{H})\), \(\widehat\phi\) is an operator-valued distribution, that is, a map that takes a Schwartz functions \(f\) to an actual operator, where \(\phi(f)\) is meant to be treated as though it were \(\int d^3\mathbf{x}\ f(\mathbf{x})\widehat\phi(\mathbf{x}).\) Our definition of \(\widehat\phi\) should then technically be \[\widehat\phi(f) = L^{-3/2}\sum_{\mathbf{p}\in\Lambda} \frac{1}{\sqrt{2\omega_{\mathbf{p}}}}\left[ A_{\mathbf{p}}\overline{\widetilde{f}(\mathbf{p})} + A^\dagger_{\mathbf{p}}\widetilde{f}(\mathbf{p}) \right],\] where \(\widetilde f\) means the Fourier transform of \(f\).

We will for the most part adopt the usual notational fiction that \(\widehat\phi(\mathbf{x})\) is an actual operator. This issue is still worth keeping in mind, though! It will have a couple of consequences later on in our discussion.

Removing the Box

We have one more task ahead of us: we need to remove the restriction that our field is confined to a box. In addition to getting us closer to the physical situation we’re actually modelling, this will also give us an opportunity to see how the Lorentz symmetry reemerges in the quantum version of the theory.

What Hilbert space should this limiting theory live in? Our Fock space \(\mathcal{F}(\mathcal{H})\) was built out of a Hilbert space \(\mathcal{H}\) with a basis vector for each point in the lattice \(\Lambda\) consisting of all points in \(\mathbb{R}^3\) whose coordinates are multiples of \(2\pi/L\), and we interpreted the vector in \(\mathrm{Sym}^1\mathcal{H}\subseteq\mathcal{F}(\mathcal{H})\) corresponding to \(\mathbf{p}\) as a one-particle state where the particle has momentum \(\mathbf{p}\). When we let \(L\) go to infinity, \(\mathcal{H}\) is going to need to be replaced by something that can represent one-particle states with arbitrary momenta.

Our operators will therefore act on the Fock space \(\mathcal{F}(\mathcal{L}^2(\mathbb{R}^3))\), where we think of the underlying \(\mathbb{R}^3\) as momentum space. The fact that one-particle states with definite momentum correspond to delta functions, which are distributions rather than honest elements of \(\mathcal{L}^2(\mathbb{R}^3)\), adds some complications if one wants to do everything rigorously (just as it does in ordinary quantum mechanics), but since that is not our primary goal here we will only touch on these issues briefly. Folland’s book mentioned in the introduction is an excellent source for a careful, rigorous version of this construction.

Our goal will be to express \(\widehat\phi\), \(\widehat\pi\), and \(\widehat H\) in terms of annihilation and creation operators corresponding to arbitrary momenta, which are usually written \(a(\mathbf{p})\) and \(a^\dagger(\mathbf{p})\). We will use the expressions we already have for the theory confined to a box as our starting point. There are a few choices one could make for how to relate these new operators to the old \(A_{\mathbf{p}}\) and \(A_{\mathbf{p}}^\dagger\) as we let \(L\) go to infinity, and unfortunately there isn’t a universally agreed-upon convention — different choices lead to factors of \(2\pi\) and/or \(\omega_{\mathbf{p}}\) showing up in various different places. Our choice will be to start with the goal of having the commutation relations look like \[[a(\mathbf{p}),a^\dagger(\mathbf{p}')]=\delta(\mathbf{p}-\mathbf{p}'),\qquad [a(\mathbf{p}),a(\mathbf{p}')] = [a^\dagger(\mathbf{p}),a^\dagger(\mathbf{p}')] = 0\] and allow this to dictate everything else.

With this in place, we can give at least a rough plausibility argument for how the continuous \(a(\mathbf{p})\)’s should relate to the discrete \(A_{\mathbf{p}}\)’s. Let’s suppose we have some function \(f\in\mathcal{L}^2(\mathbb{R}^3)\), and consider the state \[|f\rangle=\int d^3\mathbf{p}\, f(\mathbf{p})a^\dagger(\mathbf{p})|0\rangle\in\mathrm{Sym}^1(\mathcal{L}^2(\mathbb{R}^3))\subseteq \mathcal{F}(\mathcal{L}^2(\mathbb{R}^3)),\] where \(|0\rangle\) is the vacuum state. We can compute \[\begin{aligned} \langle f|f\rangle &= \iint d^3\mathbf{p}d^3\mathbf{q}\,\overline{f(\mathbf{p})}f(\mathbf{q})\langle 0|a(\mathbf{p})a^\dagger(\mathbf{q})|0\rangle \\ &= \int d^3\mathbf{p}\, |f(\mathbf{p})|^2,\end{aligned}\] where to get to the second line we’ve used the fact that \(a(\mathbf{p})a^\dagger(\mathbf{q})=a^\dagger(\mathbf{q})a(\mathbf{p}) + \delta(\mathbf{p}-\mathbf{q})\) and that \(a(\mathbf{p})|0\rangle=0\).

What state should \(|f\rangle\) correspond to in the discrete version of this story? If we pick some \(L\) and write down a Riemann sum approximation of the definition of \(|f\rangle\) over the corresponding lattice \(\Lambda\), we get that \[|f\rangle\approx \sum_{\mathbf{p}\in\Lambda} f(\mathbf{p})a^\dagger(\mathbf{p})|0\rangle \left(\frac{2\pi}{L}\right)^3,\] because \((2\pi/L)^3\) is the volume of one of the little cubes making up the lattice \(\Lambda\). On the other hand, we can build a similar state with approximately the same norm as \(|f\rangle\) out of \(A_{\mathbf{p}}^\dagger\)’s: if we set \[|\tilde f\rangle=\sum_{\mathbf{p}\in\Lambda} f(\mathbf{p})A_{\mathbf{p}}^\dagger|0\rangle\left(\frac{2\pi}{L}\right)^{\frac32},\] then \[\begin{aligned} \langle\tilde f|\tilde f\rangle &= \sum_{\mathbf{p}\in\Lambda}\sum_{\mathbf{q}\in\Lambda}\overline{f(\mathbf{p})}f(\mathbf{q})\langle 0|A_{\mathbf{p}}A_{\mathbf{q}}^\dagger|0\rangle \left(\frac{2\pi}{L}\right)^3 \\ &= \sum_{\mathbf{p}\in\Lambda} |f(\mathbf{p})|^2 \left(\frac{2\pi}{L}\right)^3 \\ &\approx \int d^3\mathbf{p}\, |f(\mathbf{p})|^2.\end{aligned}\] Comparing the expressions for \(|f\rangle\) and \(|\tilde f\rangle\), the conclusion from all of this is that, as we let \(L\) go to infinity, the operator we replace with \(a(\mathbf{p})\) should be \((L/2\pi)^{3/2}A_{\mathbf{p}}\).

Let’s look again at our expression for the operator \(\widehat\phi\): \[\widehat\phi(\mathbf{x})=L^{-3/2}\sum_{\mathbf{p}\in\Lambda}\frac{1}{\sqrt{2\omega_{\mathbf{p}}}}\left[A_{\mathbf{p}}e^{i\mathbf{p}\cdot\mathbf{x}} + A_{\mathbf{p}}^\dagger e^{-i\mathbf{p}\cdot\mathbf{x}}\right]\] Recall that this is a sum over the lattice \(\Lambda\) consisting of all points in \(\mathbb{R}^3\) whose coordinates are multiples of \(2\pi/L\). As \(L\to\infty\), this lattice gets finer and finer, and it is therefore tempting to read this equation as a Riemann sum for an integral over \(\mathbb{R}^3\). As in the computation we just did, it will be helpful to isolate the volume element \((2\pi/L)^3\) in this sum, and so we can write \[\widehat\phi(\mathbf{x})=\sum_{\mathbf{p}\in\Lambda}\frac{1}{\sqrt{2\omega_{\mathbf{p}}}}\left[\left(\frac{L}{2\pi}\right)^{\frac32}A_{\mathbf{p}}e^{i\mathbf{p}\cdot\mathbf{x}} + \left(\frac{L}{2\pi}\right)^{\frac32}A_{\mathbf{p}}^\dagger e^{-i\mathbf{p}\cdot\mathbf{x}}\right]\frac{(2\pi/L)^3}{(2\pi)^{3/2}}.\] With the conventions we’ve chosen, this gives us as our limiting expression \[\widehat\phi(\mathbf{x})=\int \frac{d^3\mathbf{p}}{(2\pi)^{3/2}}\frac{1}{\sqrt{2\omega_{\mathbf{p}}}} \left[ a(\mathbf{p})e^{i\mathbf{p}\cdot\mathbf{x}} + a^\dagger(\mathbf{p})e^{-i\mathbf{p}\cdot\mathbf{x}}\right].\] The corresponding expression for \(\widehat\pi\) is \[\widehat\pi(\mathbf{x}) = -i\int\frac{d^3\mathbf{p}}{(2\pi)^{3/2}}\sqrt{\frac{\omega_{\mathbf{p}}}{2}} \left[ a(\mathbf{p})e^{i\mathbf{p}\cdot\mathbf{x}} - a^\dagger(\mathbf{p})e^{-i\mathbf{p}\cdot\mathbf{x}}\right].\] I encourage you to check that, with these definitions, \(\widehat\phi\) and \(\widehat\pi\) satisfy the desired commutation relations.

Applying the same logic to the Hamiltonian produces the formula \[\widehat H=\int d^3\mathbf{p}\,\omega_{\mathbf{p}}a^\dagger(\mathbf{p})a(\mathbf{p}).\] Just as we discussed for the field operators \(\widehat\phi(\mathbf{x})\) in the last section, \(a(\mathbf{p})\) and \(a^\dagger(\mathbf{p})\) are, strictly speaking, not operators but operator-valued distributions. How, then, do we know that it’s okay to multiply them by each other in this way? One answer is that you can naively apply that operator to an honest, well-normalized state like \[\int\cdots\int d^3\mathbf{p}_1\cdots d^3\mathbf{p}_n f(\mathbf{p}_1,\ldots,\mathbf{p}_n) a^\dagger(\mathbf{p}_1)\cdots a^\dagger(\mathbf{p}_n) |0\rangle\] and see that the result still has finite norm. It is a nice exercise to define \(\widehat H\) directly as an operator without referring to \(a\)’s and \(a^\dagger\)’s, after which you are free to treat the above expression as merely a suggestive notational shorthand for the “real” definition.

In any case, as we did with the field operators, we are going to continue to use notation like \(a(\mathbf{p})\) even though this is a distribution.

Getting Symmetry Back

Everything up to this point has required us to ignore the fact that the classical theory we started with was invariant under all the symmetries of special relativity — we had to pick a privileged time direction to write down a Hamiltonian. Now that we have it in its final version, we are in a position to see how these symmetries act on our system.

The symmetries we care about are spacetime translations, rotations, and Lorentz boosts. Together they form a 10-dimensional Lie group called the Poincaré group. Because the Poincaré group acts on both the space and time coordinates, and we already have the three space coordinates as parameters to \(\widehat\phi\), it will be much more convenient if \(\widehat\phi\) can be thought of as a function of time as well. That is to say, the time has come for us to switch from the Schrödinger picture to the Heisenberg picture, allowing our operators to depend on time rather than our states.

Because we have an explicit expression for our Hamiltonian, and the Hamiltonian is the generator of time translations, it’s relatively straightforward to add this time dependence to our operators. (We could in fact have done this part even before removing the box.) I encourage you to use the commutation relations to verify that \[\frac{da(\mathbf{p};t)}{dt}=-i[a(\mathbf{p};t), \widehat H]=-i\omega_{\mathbf{p}}a(\mathbf{p};t),\qquad \frac{da^\dagger(\mathbf{p};t)}{dt}=-i[a^\dagger(\mathbf{p};t), \widehat H]=i\omega_{\mathbf{p}}a^\dagger(\mathbf{p};t).\] This means that \(a(\mathbf{p};t)=e^{-i\omega_{\mathbf{p}}t}a(\mathbf{p})\), and similarly with the opposite sign for \(a^\dagger\), which gives us \[\widehat\phi(t,\mathbf{x})=\int \frac{d^3\mathbf{p}}{(2\pi)^{3/2}}\frac{1}{\sqrt{2\omega_{\mathbf{p}}}} \left[ a(\mathbf{p})e^{-i\omega_{\mathbf{p}}t + i\mathbf{p}\cdot\mathbf{x}} + a^\dagger(\mathbf{p})e^{i\omega_{\mathbf{p}}t - i\mathbf{p}\cdot\mathbf{x}}\right].\] Pleasingly, the exponents in both terms of this expression contain the inner product of the four-vector \(x=(t,\mathbf{x})\) with the energy-momentum vector \(p=(\omega_{\mathbf{p}},\mathbf{p})\). With these definitions in place, we can write \[\widehat\phi(x)=\int \frac{d^3\mathbf{p}}{(2\pi)^{3/2}}\frac{1}{\sqrt{2\omega_{\mathbf{p}}}} \left[ a(\mathbf{p})e^{-ip\cdot x} + a^\dagger(\mathbf{p})e^{ip\cdot x}\right].\]

Applying this recipe to \(\widehat\pi\) produces \[\widehat\pi(x) = -i\int\frac{d^3\mathbf{p}}{(2\pi)^{3/2}}\sqrt{\frac{\omega_{\mathbf{p}}}{2}} \left[ a(\mathbf{p})e^{-ip\cdot x} - a^\dagger(\mathbf{p})e^{ip\cdot x}\right].\] Note that we in fact have, as operators, \(\partial_t\widehat\phi(x) = \widehat\pi(x)\). This has the effect of allowing us to eliminate \(\widehat\pi\) from playing an explicit role in our discussion of the action of the Poincaré group; this is very nice, because \(\pi\) depends on our chosen time direction which does not play nicely with Lorentz boosts.

Our goal is to define an action of the Poincaré group on our Hilbert space \(\mathcal{F}(\mathcal{H})\). Having the Hamiltonian in hand already tells us how time translations should work: translating forward by \(t\) corresponds to the action of \(e^{-it\widehat H}\). To determine the rest of the action, we’ll be guided by two principles. First, we know how \(\phi\) transforms in the classical theory; this gives us a prescription for how the Poincaré group should affect the operators \(\widehat\phi(x)\), and therefore (after solving for them in terms of \(\widehat\phi\) and its derivatives) how it affects the operators \(a(\mathbf{p})\) and \(a^\dagger(\mathbf{p})\). (If a group element \(g\) acts on the Hilbert space via some unitary map \(U(g)\), then it acts on an operator \(O\) by \(O\mapsto U(g)^{-1}OU(g)\).) Second, the vacuum state \(|0\rangle\) should be fixed by our action, which when combined with an idea for how the action affects the \(a^\dagger(\mathbf{p})\)’s will tell us what happens to every state in \(\mathcal{F}(\mathcal{H})\).

Because our classical theory was a scalar field theory, the action of the Poincaré group on \(\phi\) is especially simple: we simply have \((g.\phi)(x)=\phi(g^{-1}x)\). (If we were working with a vector or spinor field, which we may discuss in a later article in this series, this recipe would be more complicated.) We can therefore postulate that the same ought to true of the operator \(\widehat\phi\) — that is, \(U(g)^{-1}\widehat\phi(x)U(g)=\widehat\phi(g^{-1}x)\) — and ask what this means about what the group does to \(a(\mathbf{p})\) and \(a^\dagger(\mathbf{p})\).

We have already described the action of time translations above. For space translations and rotations, the answer is quite straightforward, because these transformations don’t mess with the direction of time derivatives. I encourage you to check that, if \(U(\mathbf{r})\) is the map corresponding to translation by \(\mathbf{r}\in\mathbb{R}^3\), then \[U(-\mathbf{r})a(\mathbf{p})U(\mathbf{r}) = e^{-i\mathbf{p}\cdot\mathbf{r}}a(\mathbf{p}), \qquad U(-\mathbf{r})a^\dagger(\mathbf{p})U(\mathbf{r}) = e^{i\mathbf{p}\cdot\mathbf{r}}a^\dagger(\mathbf{p}),\] and that \(U(R)\) is the map corresponding to some rotation \(R\in SO(3)\), then \[U(R)^{-1}a(\mathbf{p})U(R)=a(R^{-1}\mathbf{p}), \qquad U(R)^{-1}a^\dagger(\mathbf{p})U(R)=a^\dagger(R^{-1}\mathbf{p}).\]

Just as in ordinary quantum mechanics, if we know how to translate and rotate states, then we can define momentum and angular momentum observables as the infinitesimal generators of these transformations. That is, for example, if \(U(a\mathbf{e}_1)\) is operator that translates by \(a\) in the positive \(x\) direction, then the operator corresponding to the \(x\) component of momentum is \[P_x=i\left.\frac{dU(a\mathbf{e}_1)}{da}\right|_{a=0}.\] In the Poincaré group, all spatial and temporal translations commute with each other, which means that the operators \(\widehat H\), \(P_x\), \(P_y\), and \(P_z\) must commute with each other as well.

In the spirit of respecting the symmetries of special relativity, it’s often helpful to group these four observables together into a four-vector-valued operator which we call \(P\), writing \(P=(\widehat H,P_x,P_y,P_z)=(\widehat H,\mathbf{P})\). Because the components of \(P\) all commute with each other, they are simultaneously diagonalizable, and so we can talk about eigenvectors and eigenvalues of \(P\) as a whole, where we think of the eigenvalues as living in \(\mathbb{R}^4\). I encourage you to check that, in this language, the eigenstates of \(P\) are exactly the \(n\)-particle states \(a^\dagger(\mathbf{p}_1)\cdots a^\dagger(\mathbf{p}_n)|0\rangle\), and that the eigenvalue of such a state is \(p_1+\cdots+p_n\). (Here we again follow our usual convention that \(\omega_{\mathbf{p}}\) is always the time component of \(p\).) It’s a nice exercise to work out how to write \(\mathbf{P}\) in terms of annihilation and creation operators.

Lorentz boosts are a bit more involved, and we won’t work out all the details of the computation here. It turns out that, with the way we have chosen to normalize our operators, the \(a\)’s and \(a^\dagger\)’s don’t behave quite as nicely with respect to boosts as they do with respect to rotations. Suppose \(U(B)\) is the unitary map corresponding to a Lorentz boost \(B\), and we allow boosts to act on momentum vectors \(\mathbf{p}\in\mathbb{R}^3\) by treating \(\mathbf{p}\) as the spatial components of the energy-momentum vector \(p=(\omega_{\mathbf{p}},\mathbf{p})\). Then we have \[U(B)^{-1}a(\mathbf{p})U(B)=\sqrt{\frac{\omega_{B.\mathbf{p}}}{\omega_{\mathbf{p}}}}a(B.\mathbf{p}),\] and similarly for \(a^\dagger\). This amounts to a similarly ugly expression for how boosts act on particle states.

It is sometimes nice to have a more Lorentz-invariant version of the annihilation and creation operators. So let’s define, for any energy-momentum vector \(p\) with \(p^2=m^2\), the operator \[\alpha(p)=(2\pi)^{3/2}\sqrt{2\omega_{\mathbf{p}}}a(\mathbf{p}).\] We then have \[U(B)^{-1}\alpha(p)U(B)=\alpha(Bp),\] and similarly for \(\alpha^\dagger\). These versions of the operators play more nicely with Lorentz transformations than the original ones at the cost of having more complicated commutation relations. In terms of these operators, our expression for \(\widehat\phi(x)\) looks like \[\widehat\phi(x)=\int \frac{d^3\mathbf{p}}{(2\pi)^32\omega_{\mathbf{p}}} \left[ \alpha(p)e^{-ip\cdot x} + \alpha^\dagger(p)e^{ip\cdot x}\right].\]

This expression is in fact even more Lorentz-invariant than it might first appear! I encourage you to check that the measure \(d^3\mathbf{p}/2\omega_{\mathbf{p}}\) is itself preserved by Lorentz boosts. In fact, this is the surface area measure on the hypersurface \(\{p\in\mathbb{R}^4:p^2=m^2,\ p_t>0\}\). This hypersurface is sometimes called the mass shell; note that, per our earlier discussion, the mass shell is also the spectrum of the energy-momentum operator \(P\) when restricted to the one-particle states.

A Preview of Mathematical Difficulties to Come

It is possible, with a bit more work than we’ve expended here, to make this entire story of the free scalar field completely rigorous. The same cannot be said of the interacting theories we’re about to encounter.

The fact that our fields are represented by operator-valued distributions rather than honest operators causes more problems than you might expect. In interacting theories, the classical Hamiltonian will contain terms with degree higher than two. This makes it very difficult to write down a quantum version of the Hamiltonian, because doing so would involve multiplying distributions, which is not a sensible operation in general. In our free field theory, we were able to write down a sensible quantum version of our Hamiltonian even though it involved a \(\phi^2\) by taking advantage of the structure of our Fock space with its creation and annihilation operators.

All separable Hilbert spaces are isomorphic to each other, so there would be nothing stopping you from trying to stick your favorite interacting quantum field theory in Fock space as well. But, sadly, it’s just not possible to follow the program we laid out here in general if your goal is to end up with a well-defined operator to serve as your Hamiltonian with a unique lowest-energy state to serve as the vacuum.

The operator-valued distribution perspective lies behind probably the most prominent scheme for formalizing quantum field theory, the so-called Wightman axioms. Because this is not a series on attempts to build mathematically rigorous versions of quantum field theory, we won’t discuss these axioms in detail. I bring them up only to report the unfortunate fact that no one has managed to construct a fully rigorous version of a quantum field theory with interactions in four dimensions, and as far as I know this is equally true of every other such formalization program.

The fact of the matter is that the problem of building a formal mathematical model of a physically realistic quantum field theory is an open problem. Therefore, as we go forward with this series, we’re going to have to abandon the expectation that, every time we refer to an operator on a Hilbert space, someone somewhere has actually constructed a Hilbert space and an operator with the desired properties. Instead, we’ll have to adopt the physicist’s perspective that any computation that produces a number that can be checked by experiment must have something behind it, even if no one knows how to build the things that the symbols are meant to refer to.

This tends to make mathematicians somewhat uncomfortable, so if you are in this position, I hope that as this series continues you know that I share this discomfort and I will do my best to describe the symbols that appear in a way that at least acknowledges the hypothesized shape of the as-yet-nonexistent mathematical objects they’re supposed to represent. I find it helpful to imagine the foundations of quantum field theory as a series of conjectures rather than definitions, and the details of the field as the study of the consequences of these conjectures. There is still a very beautiful picture to see from this perspective, which I hope to share with you as this series continues.