Quantum Mechanics I - Foundations | Nicolas James Marks Ford

This article is also available as a PDF.

Introduction

This article is the second in a series about physics for a mathematically trained audience. I’m going to make an attempt to write an introduction to quantum mechanics. I remain a bit apprehensive about putting this article out in the world. Due, I think, to a combination of its unintuitiveness and its centrality to the modern conception of physics, this is a difficult subject to write well about. There are many overly confusing presentations of it out there and I hope this one manages to at least rise above the worst of them.

This article has fewer mathematical prerequisites than the previous one; Hilbert spaces are probably the most complicated mathematical objects we’ll be invoking. Still, I will be referencing some aspects of the Hamiltonian mechanics story, so it will be helpful if you’ve read it.

This is a topic that’s been tackled many times by many different people, including several writing for an audience of mathematicians. Some books in this category that I found useful when writing this are:

Quantum Theory, Groups and Representations: An Introduction by Peter Woit (available on his website)
Quantum Mechanics for Mathematicians by Leon Takhtajan
Lectures on Quantum Mechanics for Mathematics Students by L. D. Faddeev and O. A. Yakubovskiı̆
Quantum Field Theory: A Tourist Guide for Mathematicians by Gerald Folland
Quantum Theory for Mathematicians by Brian Hall.

I am thankful to Jake Levinson for reading and commenting on an many drafts of this article. Basically nothing in this article is original, including the complaint at the top about how badly this topic is usually explained. Still, after thinking about how to teach this material a couple times I’ve arrived at a presentation that I like, and I hope it’s helpful to you.

States and Observables

In order for a physical theory to produce predictions about the results of experiments and not just be a pile of math, it’s necessary to specify some correspondence between the theory’s mathematical objects and the world’s physical objects. There are many ways you could imagine doing this, but a common scheme in physics is to pick out three things:

The states of the theory, which represent all the information about a physical system that could affect the result of an experiment.
The observables, which represent experiments that could be performed on a physical system. We’ll think of observables as specifically referring to an experiment whose outcome is a single real number.
A function that takes a state and an observable and produces a probability measure on \(\mathbb{R}\), which we’ll interpret as giving the result of performing the corresponding experiment when the system is in the corresponding state.

In some theories, this probability measure will always just be concentrated at one point, that is, the theory predicts a definite result for any possible experiment. Such a theory is called deterministic. This is true, for example, of the description of classical mechanics given in the previous article in this series. In that theory, you might use points in phase space as your states and real-valued functions on phase space as your observables. Then the result of performing experiment \(f\) in state \(s\) is just given by a probability measure that has all its mass at \(f(s)\).

Even in classical physics it can be very useful to extend this picture to allow for nondeterminism. This is very useful, for example, in thermodynamics, where we’d like to be able to describe states in terms of quantities like temperature and pressure. Knowing the temperature of, say, some gas sitting in a chamber tells you something about the positions and momenta of all of its constituent molecules, but it’s completely infeasible to imagine knowing all those numbers.

So it’s crucial when thinking about these sorts of things to allow states that don’t specify definite positions and momenta for everything. One way to do this is to expand the set of states to include probability measures on phase space, leaving the definition of observables the same. If \(f\) is an observable and \(\mu\) is the measure on phase space corresponding to some state, the probability of seeing a result in some set \(E\subseteq\mathbb{R}\) is then given by \(\mu(f^{-1}(E))\). So if \(x\) and \(y\) were two points in phase space, we might imagine a state which has a \(\frac13\) probability of being \(x\) and a \(\frac23\) probability of being \(y\); measuring \(f\) in this state would then yield \(f(x)\) with probably \(\frac13\) and \(f(y)\) with probability \(\frac23\).

There are a lot of popular descriptions of quantum mechanics out there that sound sort of like this, and it has left a lot of people with the impression that quantum mechanics is what you get when you take classical mechanics and add in this sort of nondeterminism. But while the theory I just described can be useful, it’s not quantum mechanics.

In particular, in statistical classical mechanics, it’s possible to interpret a state as merely representing uncertainty on the part of the experimenter; if you knew what the state “really was,” then the theory would predict a definite outcome for every possible experiment. This is not true in quantum mechanics. In fact, for every quantum state \(s\), there is an observable which, when measured against \(s\), does not yield a deterministic result. We’ll have more to say about this and other differences once we have formal definitions in front of us.

The definitions of states and observables in quantum mechanics will definitely seem strange and arbitrary if you’ve never seen them. Nonetheless, I think it’s easier to see everything laid out at once and then spend some time talking about how they’re usually interpreted, so that’s what we’ll do here.

The state of a quantum mechanical system is described in terms of a complex Hilbert space \(\mathcal H\). (As a reminder, this is a complex vector space with a positive definite Hermitian inner product which is complete under the corresponding metric.) We’ll call \(\mathcal H\) the state space of the system. We’ll always write the inner product as \(\langle\cdot,\cdot\rangle\), and use the convention that it’s linear in the second coordinate and conjugate-linear in the first. For now, we’ll only worry about the case where \(\mathcal{H}\) is finite-dimensional.

A pure quantum state is then a point in the projective space \(\mathbb{P}\mathcal H\). That is, it’s a nonzero vector in \(\mathcal{H}\), except that we regard two vectors as representing the same state if one is a scalar multiple of the other. We’ll often go back and forth between a vector in \(\mathcal H\) and its equivalence class in \(\mathbb{P}\mathcal H\) without much comment.

It’s important to emphasize from the start that the physical interpretation of vectors in \(\mathcal H\) is very different from the way we interpreted points in phase space when discussing classical physics. You should not imagine points moving around in \(\mathcal H\) in a way that somehow corresponds to objects moving around in space.

The picture to have in mind instead is a bit more abstract. To any physical quantity you might measure, there is an associated orthonormal basis of \(\mathcal H\). (This is strictly true only in the finite-dimensional case.) For example, imagine that a particle might be found in any one of \(n\) different places depending on the outcome of some experiment. Every vector in the basis corresponds to one possible outcome of the corresponding experiment, so in keeping with our convention that observables correspond to experiments that produce a single real number as output, we also attach a real number to each basis vector. Whenever you perform this measurement, you’ll see one of those real numbers as the result. In our example, we might say the outcome of the experiment is \(i\) if the particle is found in the \(i\)’th location.

The vectors in the orthonormal basis should then be interpreted as the states in which our experiment will have a definite outcome: if our basis is \(v_1,\ldots,v_n\) and our chosen numbers are \(\lambda_1,\ldots,\lambda_n\), then performing this measurement on a system in state \(v_i\) will yield \(\lambda_i\) with probability 1. In general, if we have some state \[s=\alpha_1v_i+\cdots+\alpha_nv_n,\] the outcome of the experiment will be \(\lambda_i\) with probability \(|\langle v_i,s\rangle|^2/\langle s,s\rangle=|\alpha_i|^2/\langle s,s\rangle\). Note that these numbers necessarily sum to 1, and that multiplying \(s\) by a nonzero scalar leaves them unchanged. For this reason, we usually use our freedom to rescale the vector to pick a representative for \(s\) of norm 1, in which case the probability is just \(|\alpha_i|^2\).

In light of all this, we define a quantum observable to be a self-adjoint map \(A:\mathcal H\to \mathcal H\). Recall that the Spectral Theorem says that every such map has an orthonormal basis of eigenvectors with real eigenvalues. So, at least when all the eigenvalues are distinct, specifying \(A\) is the same as specifying an orthonormal basis of \(\mathcal H\), up to scalar multiples, together with a real number for each basis vector.

We’ll also allow operators with repeated eigenvalues. The corresponding experiment has fewer possible outcomes than the dimension of the state space and doesn’t distinguish any of the states in an eigenspace from each other. The rule in general, then, is that the probability of seeing the result \(\lambda_i\) when measuring \(A\) in state \(s\) (chosen to have norm 1 as before) is given by \(\langle P_is,P_is\rangle\), where \(P_i\) is the orthogonal projection onto the \(\lambda_i\) eigenspace. In particular, measuring any state in that eigenspace will produce \(\lambda_i\) with probability one.

Note that if the state space is \(n\)-dimensional, then any experiment can only have at most \(n\) different outcomes. So to talk about physical quantities like position and momentum, which can take on infinitely many values, we’ll need \(\mathcal{H}\) to be infinite-dimensional. The mathematical setup there is slightly more complicated, and we’ll discuss it in more detail in a later section. But there are many physical systems — we’ll see a few examples over the course of this article — which really are described by a finite-dimensional state space, and therefore really do display this sort of discreteness.

We should take a moment to note the similarities and differences between this picture and the statistical classical mechanics I told you we were going to reject. If we only ever considered one observable and we only measured it once, then the two pictures would be very similar. You would have your orthonormal basis \(v_1,\ldots,v_n\) of states in which your experiment has a definite outcome, and any other state could be interpreted as just a probability distribution over the \(v_i\)’s. In fact, if we had some general state \(s=\alpha_1v_1+\cdots+\alpha_nv_n\), then we could only observe the numbers \(|\alpha_i|^2\), and so our description would even be somewhat redundant, since I could multiply any \(\alpha_i\) by a unit complex number without changing anything. In particular, it would be possible to imagine that a state like \(s\) just reflects ignorance on the part of the experimenter, and that the state “really is” one of the \(v_i\)’s.

The thing that prevents this picture from working is that we are not restricted to worrying about just one observable at a time. As tempting as it is to think of \(s\) as merely representing uncertainty over which \(v_i\) you have, this interpretation doesn’t make sense when considering an observable in which \(s\) itself is an eigenvector. When you do this, you see that if the state is \(s\) we’ll get the same result every time, but if the state is “some \(v_i\), I just don’t know which,” then we’ll get some other result with probability \(1-|\alpha_i|^2\).

This is also the reason why it’s important to keep track of the individual \(\alpha_i\)’s and not just their squared absolute values. These \(\alpha_i\)’s are normally called amplitudes. Due to the scalar symmetry on our state vectors, we are free to multiply all the amplitudes by the same number without affecting anything physical. But the relative phase of two different amplitudes — the ratio \(\alpha_i/\alpha_j\) — definitely does matter.

This is all probably easiest to see in an example. Electrons have a property called “spin” which behaves in many ways like angular momentum. In particular, like classical angular momentum, the spin depends on which axis you measure it around; one way to perform this measurement is to shoot the electron through a magnetic field aligned with the axis in question and see which way it moves.

The spin of an electron can be described using a two-dimensional state space, meaning in particular that any measurement of spin can only have two possible outcomes. We call one of these outcomes “spin up” and the other “spin down.” (There is a way to use an orientation on \(\mathbb{R}^3\) to assign these labels in a meaningful and consistent way, but for the purposes of this example they’re just arbitrary labels.) We’ll write \((\uparrow_z)\) and \((\downarrow_z)\) for the states in which the electron has spin up or spin down measured around the \(z\)-axis. It then turns out that:

\[(\uparrow_x)=\frac1{\sqrt2}((\uparrow_z)+(\downarrow_z));\quad(\downarrow_x)=\frac1{\sqrt2}((\uparrow_z)-(\downarrow_z));\] \[(\uparrow_y)=\frac1{\sqrt2}((\uparrow_z)+i(\downarrow_z));\quad(\downarrow_y)=\frac1{\sqrt2}((\uparrow_z)-i(\downarrow_z)).\]

Note they only differ from each other in the coefficient on \((\downarrow_z)\) and that each up state is orthogonal to its corresponding down state. This second fact reflects the fact that they are mutually exclusive outcomes from the same experiment. But the inner product of two of these spin states corresponding to different axes is \(\frac12\). This tells you that, for example, if you start with an electron which you know has spin up around the \(x\) axis, you are equally likely to get either result when measuring its spin around the \(y\) axis.

This last fact is why the relative phases of amplitudes is actually important. Even though they look the same when measuring spin around the \(z\) axis, \((\uparrow_x)\) and \((\uparrow_y)\) are not the same state, because they don’t look the same when measuring spin around the \(x\) axis. We would have been free to pick a representative for \((\uparrow_y)\) that differs by a global factor, like \(\frac{1}{\sqrt2}(i(\uparrow_z)-(\downarrow_z))\), but the fact that those two coefficients differ by a factor of \(i\) does actually carry physical information.

Taking a linear combination of states, then, can’t just represent uncertainty on the part of the experimenter about the “true state” of the particle. One way to see this in our example is to note that \((\uparrow_z)=\frac1{\sqrt2}((\uparrow_x)+(\downarrow_x))\) and \((\downarrow_z)=\frac1{\sqrt2}((\uparrow_x)-(\downarrow_x))\). When we plug these back into the definition of \((\uparrow_x)\) we see that the coefficients on \((\downarrow_z)\) cancel out, something that obviously can’t happen with classical probabilities.

So amplitudes are not the same thing as classical probabilities. This is, again, the essential difference between quantum mechanics and the statistical classical mechanics we rejected earlier. Saying the state is \((\uparrow_x)\) does not just mean “it could be \((\uparrow_z)\) or \((\downarrow_z)\) and I don’t know which,” and the way to tell the difference is to measure the spin around the \(x\) axis, which will always give you the same answer.

So, to summarize:

To every physical system we associate a Hilbert space \(\mathcal H\). The states are elements of the projective space \(\mathbb{P}\mathcal H\).
An observable is represented a self-adjoint operator \(A:\mathcal H\to \mathcal H\). The possible results of performing the experiment are given by the eigenvalues of \(A\).
Suppose the system is in a state corresponding to a vector \(s\in \mathcal H\), which we may choose to have norm 1, and we measure the observable \(A\). If \(\lambda_i\) is one of the eigenvalues of \(A\) and \(P_i\) is the orthogonal projection onto the corresponding eigenspace, then the outcome of the experiment will be \(\lambda_i\) with probability \(\langle P_is,P_is\rangle\).

There is one element of the standard presentation of quantum mechanics that is not in this list; I want to mention it only briefly now and postpone a more detailed discussion until the end of this article. It might have occurred to you to wonder what happens after a measurement is performed. Suppose we measure the spin of an electron around the \(z\) axis and see that it’s spin up. Does that mean that it’s now in the state \((\uparrow)_z\), or is it still in whatever state it was in before the measurement?

The very surprising answer given by the standard formulation of quantum mechanics is the former: if you measure the observable \(A\) in the state \(s\) and get the result corresponding to \(\lambda_i\), the state is from then on the projection \(P_is\). This process is what’s being referred to in many popular accounts of how “quantum mechanics means that measurement changes the result of the experiment.”

Whatever philosophical interpretation one might want to add on top of this, it does in fact accurately predict the results of experiments. For example, if we measure our electron’s spin around the \(z\) again after seeing spin up the first time, we’ll also see spin up the second time. A philosophically cautious reader might want to treat this merely as an operational account — measurement causes the state to behave as if it has been projected onto the corresponding eigenspace — and hold off on worrying about what’s “really happening.” We’ll briefly discuss some of the more popular attempts to answer this question in the final section of this article.

The Schrödinger Equation

The quantum description of how physical systems evolve in time has a lot in common with the classical one presented in the Hamiltonian mechanics article. It will be helpful to understand that story in order to make sense of this one.

It’s not possible to formally derive the rules of quantum mechanics from the rules of classical mechanics in a completely rigorous way. (In fact, wanting to do so is somewhat backwards; quantum mechanics is supposed to be the more fundamental theory!) What we can do, though, is take the classical description and, starting from our description of quantum states and observables, try to find quantum analogues of the parts of the classical story that give rise to the physics. In fact, when we do this, we’ll see that the resulting recipe is actually more fundamental to the quantum setup than its classical counterpart.

The classical state space is phase space — the cotangent bundle of configuration space — and classical observables are real-valued functions on phase space. Using the natural symplectic structure on phase space, we can turn any classical observable \(f\) into a Hamiltonian vector field \(X_f\), which gives rise to a one-parameter group of symplectomorphisms that we called a Hamiltonian flow. For example, we saw that the one-parameter group of spatial translations in some direction arose from applying this procedure to the momentum observable in that same direction.

Our quantum states live in a projective Hilbert space \(\mathbb{P}\mathcal{H}\), so the analogue of a symplectomorphism should be an automorphism of \(\mathbb{P}\mathcal{H}\). The analogue of a Hamiltonian flow, then, is a one-parameter group of projective unitary maps, that is, a homomorphism from \(\mathbb{R}\) to the projective unitary group \(PU(\mathcal{H})=U(\mathcal{H})/\{zI:|z|=1\}\). Because \(\mathbb{R}\) is simply connected, such a map can always be lifted to a continuous map \(\mathbb{R}\to U(\mathcal{H})\), and it turns out that when \(\mathcal{H}\) is finite-dimensional, we can always choose this lift to be a homomorphism. (The simply-connected restriction does matter, though. In quantum mechanics it is often important to consider symmetries of \(\mathbb{P}\mathcal{H}\) that come from groups that aren’t simply connected — \(SO(3)\) is a very prominent example — and in this case not every projective unitary representation will lift to a unitary one. In this setting it is the projective representations which are important, whether or not they lift.)

So in fact we can restrict our attention to one-parameter groups of unitary maps on \(\mathcal H\). A map \(A:\mathcal H\to \mathcal H\) is self-adjoint if and only if \(e^{iA}\) is unitary, though, so up to a factor of \(i\) our observables are already infinitesimal generators of unitary maps. (If this fact is new to you I encourage you to prove it.) The family of symmetries we associate to \(A\) will be given by \(U_A(t)=e^{-iAt/\hbar}\). The choice of sign in the exponent is arbitrary, but this choice is pretty standard.

The number \(\hbar\) is a fundamental physical constant called Planck’s constant. It’s about \(1.055\times 10^{-34}\) Joule-seconds; we’ll say more about its presence here in a bit.

In fact, any one-parameter group \(U(t)\) of unitary operators which is strongly continuous, meaning \(\lim_{t\to 0}U(t)v=U(0)v\) for all \(v\), is of the form \(e^{iAt}\) for some self-adjoint operator \(A\). So if we impose this restriction on our families of symmetries we get a complete quantum version of the correspondence between symmetries and observables that we had classically: the observable \(A\) is associated the infinitesimal symmetry \(-iA/\hbar\), which we integrate to get the one-parameter family of symmetries \(e^{-iAt/\hbar}\), and every strongly continuous one-parameter group of symmetries arises in this way.

As in the classical picture, we declare that there is a special observable \(H\), which we still call the Hamiltonian and which still should be interpreted as the total energy, whose flow gives the actual dynamics of the system. That is, the way states evolve in time is through the rule \[\psi(t)=e^{-iHt/\hbar}\psi(0),\] or equivalently \[\frac{d\psi}{dt}=-\frac{i}{\hbar}H\psi.\] This second equation is called the Schrödinger equation, and it is the quantum analogue of Hamilton’s equations.

Classically, we can define the Poisson bracket \(\{f,g\}=\omega(X_f,X_g)\), which tells us how the values of \(f\) change as we move along \(g\)’s Hamiltonian flow using the rule \(df/dt=\{f,g\}\). Note that when we write the expression “\(df/dt\)” we are treating the observable \(f\) as a quantity that’s evolving in time. Physicists have names for these two perspectives: when the states evolve in time and the observables don’t we are using Liouville’s picture; when the observables evolve in time and the states don’t we are using Hamilton’s picture.

While Hamilton’s picture is maybe a bit more abstract, the two pictures are completely equivalent: they are two different ways of answering the question “what happens if I run time forward by this amount and then perform this measurement?” Classically, when observables are just functions, the translation is straightforward: if \(\phi^t\) is the map on phase space that moves time forward by \(t\), we can simply write \(f_t(x)=f(\phi^t(x))\).

If \(U_B(t)\) is a one-parameter group of unitary operators, then the way to evolve a quantum observable through time is to conjugate it, that is, \(A_t=U_B(-t)AU_B(t)\). There are many ways to see this; one is to note that the way to get data out of a quantum observable is to measure the inner product \(\langle\phi,A\psi\rangle\), where \(\psi\) is the state being measured and \(\phi\) is one of the eigenvectors of \(A\). But \[\langle U_B(t)\phi,AU_B(t)\psi\rangle=\langle\phi,U_B(-t)AU_B(t)\psi\rangle,\] so we see that \(U_B(-t)AU_B(t)\) is the observable corresponding to running time forward by \(t\) and then measuring \(A\). (Note that we used the fact that \(U_B(t)\) is unitary here.) We can use this to get the quantum analogue of the Poisson bracket. We define \[\{A,B\}_\hbar=\left.\frac{d}{dt}\right|_{t=0}(U_B(-t)AU_B(t))=\left.\frac{d}{dt}\right|_{t=0}(e^{iBt/\hbar}Ae^{-iBt/\hbar})=-\frac{i}{\hbar}(AB-BA).\]

Just as in the classical case, if \(\{A,B\}_\hbar=0\) we conclude that \(A\) is preserved by \(B\)’s flow and vice versa. This happens if and only if \(A\) and \(B\) commute. In particular, we get a quantum version of Noether’s theorem: an observable is preserved by the laws of physics if and only if it commutes with the Hamiltonian.

In summary:

A classical observable \(f\) has an associated vector field \(X_f\), and integrating the vector field gives a one-parameter family of symplectomorphisms. A quantum observable \(A\) gives rise to the infinitesimal symmetry \(-iA/\hbar\), which is integrated to give the one-parameter family of unitary maps \(U_A(t)=e^{-iAt/\hbar}\).
Conversely, given a one-parameter family of unitary maps \(e^{Bt}\) for some skew-Hermitian operator \(B\), the associated observable is \(i\hbar B\).
Classically, the Poisson bracket can be used to determine how the flow corresponding to one observable affects another — we have \(df/dt=\{f,g\}\) when we move along \(g\)’s Hamiltonian flow. The quantum analogue is given by \(\{A,B\}_\hbar=(-i/\hbar)[A,B]\).
In both settings there is a privileged observable \(H\) called the Hamiltonian whose flow tells us how the system evolves in time.

There are a couple more points I’d like to make about this story before concluding this section. First, the correspondence between observables and infinitesimal symmetries of the state space is “baked in” to the quantum story more deeply than in the classical story: to go between observables and symmetries one just multiplies or divides by \(i\hbar\). In fact, the factors of \(i\) and \(\hbar\) we introduced are less fundamental than the story I told here might suggest.

We could replace our Hermitian operators \(A\) with the skew-Hermitian operators \(iA\) and have an equivalent story about states and observables; the eigenvalues would then be purely imaginary instead of real, which is why I imagine this choice isn’t the one physicists make. But there is a sense in which the skew-Hermitian operators are the more mathematically fundamental objects: they are the elements of the Lie algebra \(\mathfrak{u}(n)\), and if we use them we can eliminate all the factors of \(i\) that appear in this section.

Also, \(\hbar\) is, in a certain sense, less essential than it might be. It has units of energy times time, which is exactly what’s needed to make the exponent unitless in \(U_H(t)=e^{-iHt/\hbar}\). Physicists often work in a system of units in which \(\hbar\) is equal to 1. From this perspective, the value of \(\hbar\) isn’t really a separate fact about the universe; it’s merely a conversion factor from units of energy to units of inverse time, with the same status as the factor of 100 used to convert centimeters to meters. If we choose to adopt this perspective — and combine it with the use of skew-Hermitian operators for observables — then quantum observables and their corresponding symmetries are literally the same mathematical objects.

Position and Momentum

The last aspect of the story that we haven’t talked about is the treatment of position and momentum. While I might have introduced this earlier, I wanted to wait until now because it will seem more natural in light of the analogy with classical Hamiltonian mechanics from the last section.

The theory of unbounded operators on an infinite-dimensional Hilbert space — which both the position and momentum observables will turn out to be — is one of the less well-covered aspects of functional analysis. Going through this story in detail here would take us too far afield, but I’ll mention a couple of the relevant aspects as they pertain to position and momentum when they come up. There will probably be an article in this series in the future about the infinite-dimensional spectral theorem, but if you see this sentence instead of a link to it then I haven’t written it yet.

Suppose we want to consider the observable corresponding to the position of a particle moving in \(\mathbb{R}\). (Following the classical case, we’ll write \(q\) for the coordinate on \(\mathbb{R}\) and call the observable \(Q\).) Since the result of measuring the particle’s position can be any real number, there’s no way to represent this situation with a finite-dimensional Hilbert space. In the finite-dimensional case, every state has an associated amplitude for each possible outcome the measurement can have, and taking the squared absolute values of these numbers gives a probability distribution on this set of possible outcomes.

For our position observable, we need a continuous version of this story: to every state we associate a complex-valued function \(\psi\) on \(\mathbb{R}\) (up to a global scalar multiple) which we call the wavefunction of the state. The values of the wavefunction should be interpreted slightly differently from our amplitudes from earlier, because if \(|\psi(q)|^2\) is going to give a probability distribution on \(\mathbb{R}\), then we should interpret its values as probability densities, not probabilities. So the probability that the particle will be found in some measurable set \(E\) is given by \(\int_E|\psi(q)|^2dq\), provided we’ve used our freedom to multiply by scalars to ensure that \(\int_{\mathbb{R}}|\psi(q)|^2dq=1\). Despite this distinction, it’s common to also refer to the values of \(\psi\) as amplitudes (no one says “amplitude densities”) and I will do so as well.

So at least as far as position is concerned, we can take our state space to be \(L^2(\mathbb{R})\). (Recall that this is the space of measurable functions \(f:\mathbb{R}\to\mathbb{C}\) for which \(\int|f|^2\) is finite, where we identify two functions if they agree except on a set of measure zero. The inner product is given by \(\langle f,g\rangle=\int\overline{f}g\).) The values of \(\psi\) serve the same role as the coefficients in a basis in which the position operator \(Q\) is diagonal. Since the value of \(\psi(q)\) is supposed to be the amplitude corresponding to the position \(q\), our diagonal operator should multiply it by the corresponding “eigenvalue,” which is \(q\). We therefore should take \(Q\) to be the multiplication-by-\(q\) operator, that is, \((Q\psi)(q)=q\psi(q)\).

Note that it is not the case for every \(\psi\in L^2\) that \(q\psi(q)\) is in \(L^2\). We have to expand our definition of “operator” to include maps like \(Q\) which are only defined on a dense subspace of \(\mathcal{H}\). This subspace will be called the domain of the operator. It is probably not surprising that there are quite a few mathematical subtleties involved when worrying about operator domains. For example, much more care needs to be taken with the definition of self-adjointness. We will not worry about most of those issues here; I will leave them to the future article on the spectral theorem.

One complication that arises here is that \(Q\) doesn’t have any eigenvectors in \(L^2(\mathbb{R})\). Still, physicists often speak as though it does, introducing the Dirac delta function \(\delta(q-\lambda)\) as the eigenvector of \(Q\) with eigenvalue \(\lambda\). The defining property of this fictional function is that taking the inner product with it is the same as evaluating a function at \(\lambda\), that is, \(\int\delta(q-\lambda)\psi(q)dq=\psi(\lambda)\). There is in fact mathematical machinery one could invoke to rigorously construct a definition of a “generalized eigenbasis” for a self-adjoint unbounded operator, but it is also fine to treat the delta function as just a cognitive crutch. It is possible to answer all the physical questions that will matter to us here without ever having to expand a vector in terms of an eigenbasis for \(Q\). Nothing of any physical consequence depends on whether the delta function “really exists” or not.

How should we account for momentum? Following our intuition from the classical case, momentum should be the observable corresponding to translation, that is, to the one-parameter group \(U_P(t)\) defined by \([U_P(t)\psi](q)=\psi(q-t)\). We see that \[\left.\frac{d}{dt}\right|_{t=0}\psi(q-t)=-\frac{d}{dq}\psi,\] so the infinitesimal generator of this unitary group is \(-d/dq\). Using our recipe for building observables from symmetries, we then see that our momentum observable is \[P=-i\hbar\frac{d}{dq}.\] (Alternatively, one can directly show that \([e^{-t(d/dq)}\psi](q)=\psi(q-t)\).)

A first guess might have been, remembering how we passed from configuration space to its cotangent bundle in the classical case, that we should introduce another coordinate and use, say, \(L^2(\mathbb{R}^2)\) as the state space and let \(P\) just be the multiplication-by-\(p\) operator. But this could not have been the answer if momentum were still going to be related to spatial translation in the same way. If \(Q\) and \(P\) were both multiplication operators then they would commute, which would mean that the symmetry corresponding to \(P\) — that is, spatial translation — preserves position, which it of course doesn’t. Instead, even when considering both position and momentum, the state space remains \(L^2(\mathbb{R})\), with \(P\) given by the formula above.

Just as \(Q\) is “diagonal” as an operator on \(L^2\), we can “diagonalize” \(P\) using a Fourier transform. We choose a convention for the Fourier transform that incorporates \(\hbar\), setting \[\hat\psi(p)=\int e^{-ipq/\hbar}\psi(q)dq.\] I encourage you to check for yourself that under this convention, we indeed have \(\widehat{P\psi}(p)=p\hat\psi(p)\), and that the appropriate measure to use in \(p\) space — that is, the one that makes our map unitary — is \(dp/(2\pi\hbar)\), so the probability of finding that the particle’s momentum is in the set \(E\) is given by \(\int_E|\hat f(p)|^2dp/(2\pi\hbar)\). It is useful to think of the Fourier transform as a continuous analogue of a change of basis, from the “position basis” \(\{\delta(q-q_0):q_0\in\mathbb{R}\}\) to the “momentum basis” \(\{e^{ip_0q/\hbar}:p_0\in\mathbb{R}\}\), but again, remember that none of these functions actually lives in \(L^2\).

This highlights a fact about position and momentum in quantum mechanics that I found confusing the first time I encountered it: knowing all the amplitudes for position — all the complex numbers \(\psi(q)\) — is in fact enough information to completely specify the state, and in particular the amplitudes for momentum. This is very different from the classical setting, where position and momentum were two totally independent variables. This doesn’t mean that knowing the probability distribution of positions — the real numbers \(|\psi(q)|^2\) — is enough; there are different ways to pick the phase for each \(\psi(q)\) and they’ll lead to different distributions of momentum. But for a given distribution of positions, not every distribution of momenta is achievable just by plugging in the right phases; the famous Heisenberg uncertainty principle, which we discuss below, gives one limitation.

This whole story extends straightforwardly to multiple dimensions. When dealing with an \(n\)-dimensional configuration space, we use \(L^2(\mathbb{R}^n)\) as our state space and introduce position operators \((Q_i\psi)(q)=q_i\psi(q)\) and momentum operators \(P_i\psi=-i\hbar(\partial\psi/\partial q_i)\). The Fourier transform \(\hat f(p)=\int e^{-ip\cdot q/\hbar}f(q)dq\) moves us from the generalized basis that simultaneously diagonalizes the position operators to one that simultaneously diagonalizes the momentum operators, and the right measure to use in momentum space is \(dp/(2\pi\hbar)^n\).

We can use this to write down our first quantum Hamiltonian. Recall that the classical Hamiltonian for a particle moving in a potential was given by \(H=|\mathbf{p}|^2/2m+V(\mathbf{q})\). The obvious quantum analogue of this is simply the function we get by plugging in our expressions for \(P_i\) and \(Q_i\). When we do this, the resulting operator is given by \[H=-\frac{\hbar^2}{2m}\nabla^2+V,\] where \(\nabla^2=\sum_i(\partial/\partial q_i)^2\) and \(V\) is the multiplication-by-\(V(q)\) operator. The corresponding Schrödinger equation is \[i\hbar\frac{\partial\psi}{\partial t}=-\frac{\hbar^2}{2m}\nabla^2\psi+V\psi.\] We’ll investigate a special case of this equation in some detail in the next section.

The question of how to turn a classical observable into a quantum one is called quantization. There isn’t a general recipe for quantization that works in all cases, and in fact one probably shouldn’t expect one: classical mechanics is a special case of quantum mechanics, so specifying the quantum version of a physical system should require providing strictly more information. The more one worries about issues like convergence and the domains of unbounded operators the fuzzier the classical-quantum analogy becomes, and in general the best thing one can hope for is to use the classical case as an intuition pump, like we did just now. But the question of whether the resulting physics is correct is not something you can hope to formally derive just from the knowledge that it’s true classically.

Some Quantum Phenomena

This article is not meant to be anything like a comprehensive treatment of quantum mechanics. Still, there are some topics which many readers might have heard something about that it would be a shame not to cover.

The Heisenberg Uncertainty Principle

The Heisenberg uncertainty principle is probably the thing that someone will have heard about quantum mechanics if they’ve heard nothing else. The form in which it’s commonly stated in popular accounts is usually something like “you can’t know the position and the momentum of a particle at the same time,” and it often comes with a story about measurement: measuring the position precisely, for example, would involve hitting the particle with something really big, which would destroy any information about its momentum. But this story is incomplete in a couple ways; the uncertainty principle is both more general and more fundamental than this measurement story would suggest.

The uncertainty principle is a statement about statistics, so we need to briefly discuss how to write down the relevant quantities using the formalism we’ve developed here. First, suppose we are measuring the observable \(A\) in the state \(\psi\). I claim that, assuming we’ve chosen \(\psi\) to have norm 1, the expected value of the result is given by \(\langle \psi,A\psi\rangle\). Assuming the state space is finite-dimensional for simplicity, expand \(\psi\) in terms of a basis of eigenstates of \(A\), say \(\psi=\sum_i\alpha_iv_i\). Then \[\langle\psi,A\psi\rangle=\sum_i\langle\psi,\lambda_i\alpha_iv_i\rangle=\sum_i\lambda_i|\alpha_i|^2,\] which is indeed the expected value, since \(|\alpha_i|^2\) is the probability of getting \(\lambda_i\) as the result of the measurement. Note that, as in classical probability, the expected value is a linear function of \(A\).

The uncertainty principle concerns the variance of an observable. If \(\mathbb{E} A\) is the expected value of \(A\), the variance of \(A\), which we’ll write \(\sigma_A^2\), is defined as the expected value of \((A-\mathbb{E} A)^2\). (Since there’s only going to be one state under consideration, we’ll often not include it in the notation, but it still definitely affects all the quantities we’re talking about.)

We’re now ready to state the theorem. Consider two observables \(A\) and \(B\) that might be measured in the state \(\psi\). Then \[\sigma^2_A\sigma^2_B\ge\frac14|\langle\psi,[A,B]\psi\rangle|^2.\] There are a couple of ways to prove this; I’ll sketch one here. Write \(\bar A=A-\mathbb{E} A\) and note that \([A,B]=[\bar A,\bar B]\). Then since \(A\) and \(B\) are self-adjoint, the right-hand side can be rewritten \[\frac14|\langle \bar A\psi,\bar B\psi\rangle-\langle \bar B\psi,\bar A\psi\rangle|^2=|\mathrm{Im}\langle\bar A\psi,\bar B\psi\rangle|^2\le|\langle\bar A\psi,\bar B\psi\rangle|^2\le||\bar A\psi||^2||\bar B\psi||^2\] which gives us the result, the last inequality following from Cauchy-Schwarz.

In the special case where \([A,B]\) is a scalar, this result gives us a bound on the product of the variances that’s independent of \(\psi\). This happens for position and momentum: if \([Q\psi](q)=q\psi(q)\) and \(P=-i\hbar(d/dq)\), then a quick computation shows that \([Q,P]=i\hbar I\), which means that our bound becomes simply \(\sigma_Q^2\sigma_P^2\ge\frac14\hbar^2\). (Once again, a very careful exposition would have to get around issues of operator domains, but this result does in fact survive.)

The position-momentum version of the uncertainty principle is, again, the one that’s talked about the most often, but I think a lot of the popular accounts miss the point a little bit. It’s often described in terms of a limitation on one’s knowledge — the more information you get about the position, the less you have about the momentum. But, at least if you take the identification of physical states with elements of the Hilbert space seriously, the problem is deeper than that: it’s not that you can’t get the knowledge about what the position or momentum “really is,” it’s that the information isn’t there to be had in the first place. There is no state at all for which the variances in position and momentum violate that inequality.

Entanglement

The examples we’ve discussed so far, if we’ve been explicit at all one way or the other, all involve just one particle. Suppose we want to extend the story to multiple particles. If the state of one particle is described by a Hilbert space \(\mathcal{H}\) and another by \(\mathcal{H}'\), how do we model a system consisting of both particles considered together?

Probably the easiest way to convince yourself of the right answer is to imagine measuring some observable \(A\) for the first particle and some observable \(B\) for the second. Again assuming everything is finite-dimensional for simplicity, these measurements each have associated orthonormal bases \(v_1,\ldots,v_n\) of \(\mathcal{H}\) and \(w_1,\ldots,w_m\) of \(\mathcal{H}'\). Since we could measure both observables, one for each particle, the combined state space should have one basis vector for each pair of outcomes and the corresponding probabilities should multiply.

The Hilbert space that accomplishes this is simply the tensor product \(\mathcal{H}\otimes\mathcal{H}'\), which comes with the inner product given by \(\langle v\otimes w,v'\otimes w'\rangle=\langle v,w\rangle\langle v',w'\rangle\). This is also the answer to the question of how to model two independent properties of the same particle. For example, as we mentioned near the beginning of this article, the spin of an electron can be represented as an element of (the projective space of) a two-dimensional Hilbert space. The state of an electron moving in space, then, ought to live in \(\mathbb{P}(L^2(\mathbb{R}^3)\otimes\mathbb{C}^2)\).

For the rest of this subsection, we’ll focus on the simplest nontrivial example, the tensor product \(\mathcal{H}\otimes\mathcal{H}\) where \(\mathcal{H}\) is two-dimensional. We’ll pick a basis \(e_1,e_2\) of \(\mathcal{H}\); you can think of two electrons, except that we are only keeping track of the spin. (Under this interpretation, \(e_1\) and \(e_2\) are the spin-up and spin-down states around some chosen axis.)

Some states come from taking two states \(s\) and \(s'\) from \(\mathcal{H}\) and considering them together — that is, they describe a system in which the first particle is in state \(s\) and the second is in state \(s'\). These states correspond to the pure tensors \(s\otimes s'\) and they’re called separable. Any other state is called entangled; a good example is the state \(e=\frac1{\sqrt{2}}(e_1\otimes e_1+e_2\otimes e_2)\).

What happens if we measure one of the particles when the pair of them is in a state like \(e\)? Suppose we have an observable \(A\) on \(\mathcal{H}\) for which \(e_1\) and \(e_2\) are eigenvectors, say with eigenvalues \(-1\) and \(1\) respectively. We can form an observable on \(\mathcal{H}\otimes\mathcal{H}\) by tensoring \(A\) with the identity; \(A\otimes I\) can be interpreted as measuring \(A\) just for the first particle. Any vector of the form \(e_1\otimes w\) will be an eigenvector of \(A\otimes I\) with eigenvalue \(-1\), and similarly for \(e_2\). In particular, if we measure \(A\otimes I\) in state \(e\), then the state collapses to \(e_1\otimes e_1\) with probability \(\frac12\) and to \(e_2\otimes e_2\) with probability \(\frac12\). Therefore, if we then measure \(A\) for the second particle we will always get the same result we got for the first particle, even if the two measurements are performed at the same time while the particles are very far apart.

On its own this doesn’t have to be so surprising. One could imagine that the process that produced the entangled particles simply either put them both in state \(e_1\) or both in state \(e_2\) with equal probability. If this were true, then when we measure one we are simply discovering which of these two things happened, which allows us to deduce the state of the other particle. But, as we saw at the very beginning when we attempted to interpret amplitudes this way, this interpretation doesn’t work. There is a famous example, called the CHSH game after the initials of the people who first wrote it down, that provides a good reason why not.

Alice and Bob will play a game. Each of them is assigned a separate room containing a coin and a button, but they have a chance to agree on a strategy before they’re separated. When they’re ready to play, they’ll go to their appointed rooms and flip the coin. After that, they’ll have a chance to either press the button or not.

The goal is as follows: they want exactly one of them to press the button if and only if both coins land heads. (So if at least one coin lands tails, then they want either to both press the button or both not press the button.) Notice that each player has only four strategies to choose between (the only choices are what to do if the coin lands heads and what to do if it’s tails) so there are 16 strategies total. It’s not difficult to check that no matter what they do, they can’t do better than a \(\frac34\) chance of winning. Furthermore, choosing their strategy randomly can’t help: this just amounts to randomly choosing one of the 16 possible strategies according to some probability distribution, and randomly choosing among strategies that can’t win more than \(\frac34\) of the time can’t result in a strategy that wins more than \(\frac34\) of the time.

Now suppose that Alice and Bob have a pair of particles that they’ve placed into the state \(e\) we discussed above. Then if Alice takes one of the particles and Bob takes the other, they have one more thing they can do after the coin has been flipped: each of them can choose which basis to use to measure the particle they have. This turns out to be enough to win the game with a probability greater than \(\frac34\). I’ll describe a strategy that does this below, but you might also enjoy trying to come up with one on your own before reading on.

It will be useful to have some notation to talk about these bases. Write \(\{e_1^\alpha,e_2^\alpha\}\) for the basis you get by rotating \(\{e_1,e_2\}\) counter-clockwise by \(\alpha\), so \[e_1^\alpha=(\cos\alpha)e_1+(\sin\alpha)e_2,\] and \[e_2^\alpha=(-\sin\alpha)e_1+(\cos\alpha)e_2.\] This notation makes it easy to take inner products: we get that \[|\langle e_1^\alpha,e_1^\beta\rangle|^2=|\langle e_2^\alpha,e_2^\beta\rangle|^2=\cos^2(\alpha-\beta)\] and \[|\langle e_1^\alpha,e_2^\beta\rangle|^2=|\langle e_2^\alpha,e_1^\beta\rangle|^2=\sin^2(\alpha-\beta);\] one quick way to see this is to rotate both bases by \(-\beta\) so that one of them is \(\{e_1,e_2\}\) and the other is \(\{e_1^{\alpha-\beta},e_2^{\alpha-\beta}\}\).

(If we think of these states as representing spins of electrons, then it turns out that we can assign these labels so that the basis \(\{e_1^\alpha,e_2^\alpha\}\) corresponds to measuring the spin about the axis in the \(x\)-\(z\) plane formed by rotating the \(z\) axis counter-clockwise by \(2\alpha\). In particular, note that when \(\alpha=\pi\), we get \(\{-e_1,-e_2\}\), which is the same pair of states as \(\{e_1,e_2\}\) and so had better correspond to the same measurement! We won’t actually need to invoke this physical interpretation at any point in this section, though.)

Say Alice has the first particle and Bob has the second. If Alice’s coin lands heads, she’ll measure her particle in the basis \(\{e_1^{\pi/4},e_2^{\pi/4}\}\). If she gets tails, she’ll use \(\{e_1,e_2\}\). Bob will use \(\{e_1^{-\pi/8},e_2^{-\pi/8}\}\) for heads and \(\{e_1^{\pi/8},e_2^{\pi/8}\}\) for tails. In each case, they’ll press the button if and only if the measurement resulted in the first basis vector.

I encourage you to check that if you use one of these rotated bases to construct the state \(e\) under discussion, the result is the same. That is, \[\frac1{\sqrt2}(e_1^\alpha\otimes e_1^\alpha+e_2^\alpha\otimes e_2^\alpha)=\frac1{\sqrt2}(e_1\otimes e_1+e_2\otimes e_2).\] Therefore, if the particles start in the entangled state \(e\) and Alice measures her particle in some basis \(\{e_1^\alpha,e_2^\alpha\}\) and sees, say, \(e_2^\alpha\), then the two particles together end up in the state \(e_2^\alpha\otimes e_2^\alpha\). In particular, after this happens, the second particle is in the state \(e_2^\alpha\), so if Bob now measures his particle in the basis \(\{e_1^\beta,e_2^\beta\}\) for some other \(\beta\), the probability that he sees \(e_1^\beta\) is \(|\langle e_2^\alpha,e_1^\beta\rangle|^2\), and likewise for \(e_2^\beta\).

So here are the possible outcomes:

Alice’s coin	Bob’s coin	\(P(\text{same})\)	\(P(\text{different})\)
H	H	\(\cos^2(3\pi/8)\)	\(\sin^2(3\pi/8)\)
H	T	\(\cos^2(\pi/8)\)	\(\sin^2(\pi/8)\)
T	H	\(\cos^2(\pi/8)\)	\(\sin^2(\pi/8)\)
T	T	\(\cos^2(-\pi/8)\)	\(\sin^2(-\pi/8)\)

The winning outcome is “different” in the top row and “same” everywhere else, so in every case, they have the same probability of winning, namely \(\cos^2(\pi/8)=\sin^2(3\pi/8)=\frac14(2+\sqrt 2)\approx 0.8536\). This beats the upper bound of \(\frac34\) we came up with from before!

It’s very tempting to assume that performing a measurement on, say, an electron is just a way of extracting some information about that electron that was already present before the measurement was performed — after all, this is more or less how we think of measurements in classical physics. In this picture, a complete description of the state of an electron would include, for each axis, whether that electron has spin up or spin down around that axis. The fact that the element of \(\mathbb{P}(\mathbb{C}^2)\) that we’ve been calling the state doesn’t determine all this information would just mean that our description is incomplete. If this were true, then entanglement would not be that strange: all it would mean is that for each axis, either both electrons have an “up” in the corresponding entry in their lists or both have a “down.” It would be no more mysterious than someone rolling a die and then writing the result on two different slips of paper.

The CHSH game is one of many ways of demonstrating that this simply can’t be true. If each electron contained a predetermined answer to every possible measurement, then bringing the entangled electrons with them could not possibly have helped Alice and Bob win this game. It would be no better than carrying in notebooks that they filled out together beforehand; since they’re allowed to coordinate on a strategy ahead of time anyway this clearly adds nothing. We are forced to conclude that quantum mechanics can’t be reduced to a local hidden variable theory — a description in which the result of measuring something about a particle is always determined by some preexisting information contained just in that particle.

The existence of entangled particles like these is one of the more straightforwardly weird predictions of quantum mechanics. Nonetheless, they really do exist, and experiments essentially like the one described here have been performed.

The Harmonic Oscillator

I’ll conclude this section with one example that actually involves analyzing Schödinger’s equation for an honest physical system: the harmonic oscillator. We analyzed the one-dimensional classical harmonic oscillator in the Hamiltonian mechanics article; recall that its Hamiltonian was given by \(H=p^2/2m+kq^2/2\) where \(m\) is the mass of the particle and \(k\) is the “spring constant” which controls the strength of the force pulling the particle toward the origin.

Following our quantization recipe, our quantum Hamiltonian is \[H=-\frac{\hbar^2}{2m}\frac{d^2}{dq^2}+\frac k2q^2.\] As always, states evolve according to the Schrödinger equation: \[\frac{d\psi}{dt}=-\frac{i}{\hbar}H\psi.\] A common strategy for studying the dynamics of a quantum-mechanical system is to first find all the eigenvectors of \(H\). If we can manage to expand some state in terms of eigenvectors of \(H\), then the dynamics are completely straightforward, since if \(\psi_E\) is an eigenvector of \(H\) with eigenvalue \(E\) we know that \(\psi_E(t)=e^{-iEt/\hbar}\psi_E(0)\).

To analyze the the corresponding Schrödinger equation, it will be convenient to introduce a bit of notation. We will write \(\omega=\sqrt{k/m}\) and \(\tilde q=\sqrt{m\omega/\hbar}q\), which lets us write \[H=\frac12\hbar\omega\left(-\frac{d^2}{d\tilde q^2}+\tilde q^2\right).\] We introduce a new, non-self-adjoint operator on \(L^2(\mathbb{R})\): \[a=\frac{1}{\sqrt{2}}\left(\tilde q+\frac{d}{d\tilde q}\right),\qquad a^*=\frac{1}{\sqrt{2}}\left(\tilde q-\frac{d}{d\tilde q}\right)\] (We are again ignoring all issues involving operator domains. If the fact that \(a\) is not self-adjoint is surprising, recall that \(d/d\tilde q\) is skew-adjoint.) A straightforward computation shows that we can write \(H=\hbar\omega(a^*a+\frac12)\) and that \([a,a^*]=1\).

From this, it’s possible to conclude the key fact that made us want to introduce these operators in the first place. Suppose we have an eigenvector \(\psi\) of \(a^*a\) with eigenvalue \(n\). (It will then be an eigenvector of \(H\) with eigenvalue \(\hbar\omega(n+\frac12)\).) Then \[a^*a(a^*\psi)=a^*(a^*a+1)\psi=(n+1)a^*\psi,\] that is, \(a^*\psi\) is an eigenvector of \(a^*a\) with eigenvalue \(n+1\). A similar argument shows that \(a\psi\) is an eigenvector with eigenvalue \(n-1\). We call \(a^*\) the raising operator and \(a\) the lowering operator. Furthermore, if we define \[\phi_0(\tilde q)=\left(\frac{m\omega}{\pi\hbar}\right)^{\frac14}e^{-\tilde q^2/2},\] we get that \(a\phi_0=0\), and therefore \(\phi_0\) is an eigenvector of \(a^*a\) with eigenvalue 0. (The factor out front is just there to make \(\phi_0\) have norm 1.)

If we take \(\phi_n=(a^*)^n\phi_0/\sqrt{n!}\), this gives us an eigenvector of \(a^*a\) for every nonnegative integer. (The \(\sqrt{n!}\) is again there for normalization.) To complete our quest for the eigenvectors of \(H\), we just need to show that there aren’t any others. In fact, something stronger is true: the \(\phi_n\)’s are an orthonormal basis of \(L^2(\mathbb{R})\). Checking that they’re orthogonal is the sort of thing it’s better to do yourself than to have shown to you, but it’s worth sketching the proof of why they (topologically) span \(L^2\).

First, we can write \(\phi_n\) a bit more explicitly. One can see by induction that, for some polynomial \(H_n\) of degree \(n\), we must have \[\phi_n(\tilde q)=\frac{1}{\sqrt{2^nn!}}\left(\frac{m\omega}{\pi\hbar}\right)^{\frac14}H_n(\tilde q)e^{-\tilde q^2/2}.\] The \(H_n\)’s are called Hermite polynomials and are interesting in their own right, but the only fact we’ll need about them is that \(H_n\) has degree exactly \(n\) and that therefore they span the space of polynomials in \(\tilde q\).

Suppose \(\langle\phi_n,f\rangle=0\) for every \(n\). Then we in fact have that \(f\) is orthogonal to \(g(\tilde q)e^{-\tilde q^2/2}\) for every polynomial \(g\) and, by continuity, also orthogonal to \(e^{ip\tilde q}e^{-\tilde q^2/2}=\sum_{i=0}^\infty\frac{(ip\tilde q)^k}{k!}e^{-\tilde q^2/2}\) for any real \(p\). But that means \[\int f(\tilde q)e^{-\tilde q^2/2}e^{ip\tilde{q}}d\tilde q=0,\] that is, the Fourier transform of \(f(\tilde q)e^{-\tilde q^2/2}\) is zero. So \(f(\tilde q)e^{-\tilde q^2/2}\), and therefore \(f\) itself, are zero almost everywhere, so \(f\) is the zero vector in \(L^2\) as desired.

So we’ve shown that the only eigenvalues of \(a^*a\) are nonnegative integers \(n\), meaning that the only eigenvalues of \(H\) are \(\hbar\omega(n+\frac12)\), and each eigenspace is one-dimensional. This situation is, once again, very different from anything that happens classically. The energy levels of the harmonic oscillator are discrete; they only come in whole lumps of size \(\hbar\omega\). The fact that this can happen is yet another fundamental difference between classical and quantum mechanics, and in fact it’s this phenomenon that gave rise to the name “quantum” in the first place.

Some Words About Measurement

Once one has gotten used to everything else, probably the most confusing thing about quantum mechanics is the central role that seems to be played by measurement and probability. For two states \(v\) and \(w\), we interpreted \(|\langle v,w\rangle|^2\) as the probability that, when you perform a measurement corresponding an observable for which \(v\) is an eigenvector on a system in the state \(w\), you see the corresponding eigenvalue as a result. This is called the Born rule. I referred briefly at the end of Section 2 to the standard story about what happens next: that if you perform this measurement and get the eigenvalue corresponding to \(v\), then the state of the system is \(v\) instead of \(w\) from then on. This hypothetical process, where the state suddenly jumps to \(v\) the instant the measurement is performed, is called collapse.

I call the process hypothetical because it is far from universally accepted within the field that collapse as described here literally happens. In fact, although I’m not a physicist myself, I think it’s fair to say that it’s the minority opinion. The question of what to make of all this is called the measurement problem and it’s a surprisingly thorny one. This is not an article on interpretations of quantum mechanics — I might write one in the future — but I still think it’s worth talking a little bit about what some of the positions are. (I am sure adherents of one or the other of the theories I’m about to describe will find something wrong with these descriptions; I’m far from an expert. If this is you, send me an e-mail and I’ll maybe correct it.)

It’s possible to take the collapse picture completely literally and assert that there is something about the measurement process that causes the initial state to collapse to the measured eigenstate. One posits that measurement involves the interaction between the quantum system and some separate, macroscopic object which behaves classically, and that it’s this interaction that causes collapse. The earliest attempts to make sense of the predictions of quantum mechanics took this form, and something like it is what’s commonly called the Copenhagen interpretation, although this seems to be a term that different people tend to use to refer to large class of slightly different ideas.

Copenhagen-like interpretations are the ones that tend to “leak out” into discussions of quantum mechanics aimed at the general public, sometimes being used to justify somewhat garbled attempts at a sort of quantum mysticism. One of the most exuberant is probably the 2004 movie What the Bleep Do We Know!? in which Marlee Matlin learns that water has feelings and the universe is made of consciousness, all because of quantum mechanics.

It’s not really fair to blame the Copenhagen interpretation for the loopy ways it’s interpreted, but I think it does have some problems as a fundamental description of reality. It seems to assert that there is a qualitative difference between quantum objects and classical objects and that only the latter can cause collapse. The idea that a bright line like this exists in nature seems implausible and it immediately raises a bunch of ridiculous-sounding questions. What determines how big an object has to be before it’s allowed to perform a measurement? If it’s just slightly smaller, does it behave totally differently? Moreover, large objects are of course made out of large numbers of small objects for which quantum phenomena are very real indeed. Experiments have been run that observe interference effects — that is, situations in which amplitudes can be seen to cancel out — in objects quite a bit larger than an electron (here is an example involving large molecules) and there’s no evidence that the rules of quantum mechanics are going to suddenly stop working when an object reaches a certain size. There is another, broader problem: time evolution in quantum mechanics is unitary, but projecting onto a subspace is very much not unitary, so for a theory of quantum mechanics to include literal collapse, it has to be “bolted on” to the rest of the laws of physics rather than somehow following from them.

There are other types of so-called “objective collapse theories” to choose from that try to get around the issue of when an object is macroscopic enough to be allowed to perform a measurement; for example, the Ghirardi-Rimini-Weber theory includes spontaneous random collapses that happen at some universal frequency. Many of these have the interesting property that, at least in principle, they’re testable: if collapse actually occurs in nature according to some definite law then we ought to be able to design experiments that could notice. But so far, no evidence has emerged to support anything like this, and even so there is still the aesthetic complaint about having a theory with a unitary and a non-unitary component glued together.

There is another class of interpretations, though, that tries to do away with the concept of collapse entirely. I’ll highlight two of them here, but there are many more to choose from.

One, called Bohmian mechanics, posits that to describe the world you need two ingredients: the wavefunction that we’ve discussed in this article that lives in a Hilbert space and evolves according to the Schrödinger equation, together with the position of every particle. There is an extra law of physics called the “guiding equation” that explains how the wavefunction pushes the particles around. The interaction only goes in that direction: the particles’ positions have no bearing on the future evolution of the wavefunction. If a particle could be observed in one of two spatially separated places, then the wavefunction still has some amplitude around both of them even after the measurement is performed, but the particle itself is “stuck” under one of the two pieces of the wavefunction. What we called collapse therefore never actually occurs; the whole wavefunction is still there, but one piece’s contribution to the particle’s future evolution is dominant, making it seem as though we’ve projected out the other half. In this way Bohmian mechanics is completely deterministic.

This might seem strange given that I’ve insisted repeatedly throughout this article that quantum mechanics can’t be explained merely by interpreting amplitudes as probabilities. Bohmian mechanics gets around this problem in a couple ways. First, we are not forced into paradoxical conclusions that come from having to pick an answer to all questions we might ask about any basis because the position basis is special; the theory only requires definite answers to questions about position, not any other observable. Second, even though we’ve added a determinate value for the position of every particle, the evolution of this value depends on the whole wavefunction, even the parts of it that are far away. At the end of the entanglement section I mentioned that the CHSH game excludes “local hidden variable theories” of quantum mechanics. Bohmian mechanics gets around this by not being local — you can’t explain what happens to one of your entangled particles without using your knowledge about what’s happening to the other one, because the time evolution of each particle depends on the entire, entangled wavefunction.

The other collapse-free interpretation I want to mention involves asserting that the wavefunction — the vector in the Hilbert space that evolves according to the Schrödinger equation — is enough all on its own to describe the state of the universe. The question then arises about how to interpret the results of a measurement. Suppose some particle is in a state like \(\frac{1}{\sqrt{2}}(s+s')\) and we measure an observable for which \(s\) and \(s'\) are eigenvectors. What happens according to this wavefunction-only interpretation is that the measurement process causes the universe to move from state like \(\frac{1}{\sqrt{2}}M_0\otimes(s+s')\) to a state like \(\frac{1}{\sqrt{2}}(M_s\otimes s+M_{s'}\otimes s')\), where \(M_0\) is a state in which the measurement has yet to be performed and \(M_s\) and \(M_{s'}\) are states in which we have measured \(s\) and \(s'\) respectively. That is, measurement causes the measuring apparatus — which may even include your brain — to become entangled with the particle, and each outcome is just as “real” as the other.

For this reason, this description usually goes by the somewhat unfortunate name of the many-worlds interpretation, since one might choose to interpret the first term of this sum as “the world in which we measured \(s\).” (Note that this entanglement phenomenon occurs in Bohmian mechanics too, but in Bohmian mechanics the actual positions of the particles pick out only one term of the sum to correspond to the actual world.) But the name “many-worlds” is slightly misleading: the splitting-up of the wavefunction into “worlds” is not fundamental to the theory. We might choose to express the wavefunction as a linear combination of vectors like \(M_s\otimes s\) that seem especially “world-ish,” but the wavefunction itself is all that’s necessary to specify the state of the universe.

From a mathematical perspective, the many-worlds interpretation is very appealing. Unlike any of the alternatives we’ve discussed, it doesn’t involve adding any extra mathematical objects to the bare bones of quantum mechanics — there is no such thing as collapse, the universe evolves through time completely deterministically, and there are no hidden variables or preferred bases; the wavefunction is all there is. Still, the idea that the other parts of the wavefunction — which probably include complete copies of slightly different versions of you — are just as real as the part we supposedly live in might be hard to swallow. Additionally, it’s difficult, though maybe not impossible, to square this story with the Born rule. Why, if all the branches of the wavefunction are equally real, do we see certain outcomes with a probability equal to the squared absolute values of the corresponding amplitudes?

All of these stories come from a desire to explain the same confusing set of facts. On the one hand, quantum mechanics is an incredibly empirically successful theory; it provides very precise and correct predictions for a wide variety of experiments and there are many phenomena which can’t be explained at all without it. On the other hand, the description of the world it provides us is hard to reconcile with the fact that the macroscopic universe seems to behave much more like the description offered by classical mechanics. We can see the measurement problem as asking how these two things can both be true at the same time.

There is not widespread consensus in favor of any one perspective either among physicists or philosophers of physics. Some writers contend that the measurement problem might be resolved by some better yet-to-be-discovered physical theory that includes quantum mechanics as a limiting case. Others claim to have a reason why the measurement problem isn’t a problem at all and that everyone who thinks it is is making some kind of fundamental mistake, although none of these explanations has won over the physics world either. I think it’s no exaggeration to say that quantum mechanics is the single biggest shift since Newton in humanity’s understanding of the physical world. Even if one of these stories turns out to somehow be true, the fact that there has been so much disagreement for so long about the measurement problem is a testament to how different quantum mechanics is from any physical theory that came before it.