Note (November 3, 2005): There are many things I didn't understand very well when I wrote this article below back in 2003. What I am describing here can be much more succinctly explained this way: The motivation is to seek a Lorentz-invariant wave equation of a free particle. For this reason, we start with the (Lorentz-invariant) relativistic equation of motion of a free particle, \(p^2=m^2\), where \(p\) is the four-momentum. We quantize this equation by formally replacing the momentum with the momentum operator, and appending the wave function thus: \(\hat p^2\psi=m^2\psi\). We then formally take the "square root" of this equation (this is Dirac's magnificent "trick"), taking into account that the momentum is a vectorial quantity: \(\alpha^i\hat p_i\psi=m\psi\). One way to make this work is to have \(\alpha^i\) that satisfy the equation, \(\alpha^i\alpha^j+\alpha^j\alpha^i=\eta^{ij}+\eta^{ji}\), where \(\eta^{ij}\) is the Minkowski-metric. (Implicit in the argument is that the \(\alpha^i\) and the momentum operator commute, but that would indeed be the case when the \(\alpha^i\) are constant and the momentum operator is a derivative operator.) This in particular means that \(\alpha^i\alpha^j+\alpha^j\alpha^i=0\) when \(i\ne j\), meaning that the \(\alpha^i\) cannot be ordinary numbers that commute under multiplication. Such noncommutative algebras can be represented using appropriately chosen matrices, but in four dimensions (where this algebra is the Clifford-algebra \(Cl_{1,3}\)), a conceptually simpler(?) representation can be achieved using biquaternions (quaternions over the complex numbers), which is what I am exploring below. Now without further ado, here are my thoughts from two years ago.

Klein-Gordon equation

In order to satisfy the conditions of special relativity, the Schrödinger equation needs to be modified. Originally derived from the (non-relativistic) relationship between momentum and energy:

\[E=\frac{p^2}{2m}+V,\]

the relativistic version should instead be based on

\[E^2=p^2+m^2.\]

It is, in fact, possible to derive just such a wave function equation, but it isn't without problems. The equation is called the Klein-Gordon (KG) equation (in the following discussion, \(\hbar\) is assumed to be 1):

\[-\frac{\partial^2\phi}{\partial t^2}=(-\nabla^2+m^2)\phi.\]

Just like in the case of the Schrödinger equation, it is possible to derive from the KG equation a continuity equation, by multiplying the equation on the left by φ*, multiplying its complex conjugate on the left by \(\phi\), and subtracting one from the other:

\begin{align}-\nabla^2\phi+m^2\phi+\frac{\partial^2\phi}{\partial t^2}&=0,\\ 0&=\phi^\star\nabla^2\phi-\phi^\star m^2\phi-\phi^\star\frac{\partial^2\phi}{\partial t^2}-\phi\nabla^2\phi^\star+\phi m^2\phi^\star+\phi\frac{\partial^2\phi^\star}{\partial t^2}\\ &= \nabla(\phi^\star\nabla\phi-\phi\nabla\phi^\star)-\phi^\star\frac{\partial^2\phi}{\partial t^2}+\frac{\partial\phi^\star}{\partial t}\frac{\partial\phi}{\partial t}-\frac{\partial\phi}{\partial t}{\partial\phi^\star}{\partial t}+\phi\frac{\partial^2\phi^\star}{\partial t}\\ &= \nabla(\phi^\star\nabla\phi-\phi\nabla\phi^\star)-\frac{\partial}{\partial t}\left(\phi^\star\frac{\partial\phi}{\partial t}-\phi\frac{\partial\phi^\star}{\partial t}\right).\end{align}

Substituting

\[{\bf\mathrm{j}}=-i(\phi^\star\nabla\phi-\phi\nabla\phi^\star),~~~~~~~~~~\rho=i\left(\phi^\star\frac{\partial\phi}{\partial t}-\phi\frac{\partial\phi^\star}{\partial t}\right),\]

we get

\[i\left(\nabla{\bf\mathrm{j}}+\frac{\partial\rho}{\partial t}\right)=0,~~~~~\nabla{\bf\mathrm{j}}+\frac{\partial\rho}{\partial t}=0.\]

But herein lies the problem. The expression we get for the probability density ρ in this equation is not positive definite!

Dirac-equation

The solution to this dilemma was first proposed by Dirac. First of all, he postulated another equation, linear in \(\partial/\partial t\) to ensure that the probability density will be positive definite, and also linear in \(\nabla\) for purposes of relativistic covariance:

\[i\frac{\partial\phi}{\partial t}=(-i{\bf\mathrm{\alpha}}\nabla+\beta m)\phi.\]

From this and the KG equation, we can derive some conditions for α and β. Squaring the operator on both sides of Dirac's equation we get:

\begin{align}-\left(\frac{\partial}{\partial t}\right)^2\phi&=(-i{\bf\mathrm{\alpha}}\nabla+\beta m)^2\phi\\
&=-\sum\limits_{i=1}^3\alpha_i^2\frac{\partial ^2\phi}{(\partial x^i)^2}+\sum\limits_{i,j=1}^3(\alpha_i\alpha_j+\alpha_j\alpha_i)\frac{\partial^2\phi}{\partial x^i\partial x^j}-im\sum\limits_{i=1}^3(\alpha_i\beta+\beta\alpha_i)\frac{\partial\phi}{\partial x^i}+\beta^2 m^2\phi.\end{align}

But we assumed that φ also satisfies the KG equation:

\[-\left(\frac{\partial}{\partial t}\right)^2\phi=-\sum\limits_{i=1}^3\frac{\partial^2\phi}{(\partial x^i)^2}+m^2\phi.\]

This leads us to the following conditions:

\begin{align} \alpha_i\beta+\beta\alpha_i&=0,\\ \alpha_i\alpha_j+\alpha_j\alpha_i&=0,\\ \alpha_i^2=1,~~~\beta^2&=1. \end{align}

A contradiction? Perhaps, if you assume that \(\alpha_i\), \(\beta\) and \(\phi\) are ordinary (real or complex) scalar quantities. But what if they aren't?

Quaternions

Most textbooks at this point proceed to introduce the Pauli-matrices. But, in my opinion, those matrices are an unnecessary distraction when a much more powerful concept is available to us: Quaternions to the rescue!

Quaternions go where no complex number has gone before: instead of one, we have three imaginary units, all obeying the equation \({\bf\mathrm{i}}^2={\bf\mathrm{j}}^2={\bf\mathrm{k}}^2=-1\). Quaternion multiplication is non-commutative: \({\bf\mathrm{ij}}=-{\bf\mathrm{ji}}={\bf\mathrm{k}}\), \({\bf\mathrm{jk}}=-{\bf\mathrm{kj}}={\bf\mathrm{i}}\), \({\bf\mathrm{ki}}=-{\bf\mathrm{ik}}={\bf\mathrm{j}}\). The multiplication rules should make it evident that just as a complex number can be expressed using two real numbers, a quaternion can be expressed using two complex numbers:

\[a+b{\bf\mathrm{i}}+c{\bf\mathrm{j}}+d{\bf\mathrm{k}}=(a+b{\bf\mathrm{i}})+(c+d{\bf\mathrm{i}}){\bf\mathrm{j}}.\]

Quaternions have quaternionic conjugates:

\[(a+b{\bf\mathrm{i}}+c{\bf\mathrm{j}}+d{\bf\mathrm{k}})^\star=a-b{\bf\mathrm{i}}-c{\bf\mathrm{j}}-d{\bf\mathrm{k}}.\]

Like real and complex numbers, quaternions form a division algebra: simply put, for two quaternions \(a\) and \(b\), their product is zero iff at least one of \(a\) or \(b\) is zero.

Before proceeding any further, it is worthwhile to examine whether the quaternionic unit \({\bf\mathrm{i}}\) is the same as the imaginary unit \(i\) that appears in the Dirac equation. Sadly, the answer is no: if you try to use the quaternionic unit, it is not possible to simplify the equation as elegantly as it was done here, because \(i\) and \(\alpha^i\) or \(\beta\) do not commute. In other words, we require both complex and quaternionic quantities: the algebra we use is the product of C and H, also referred to as quaternions over the complex numbers.

Dirac-matrices

Even with quaternions, all the conditions for the coefficients \(\alpha^i\) and \(\beta\) are difficult to satisfy at the same time. That is because quaternions provide only three quantities that satisfy an anticommutation relation, whereas we need four. Because of this, we need to take the next best thing, which is a \(2\times 2\) quaternionic matrix (which, as per the above, is the same as a \(4\times 4\) complex matrix.) The choice is not unique; one specific set of matrices looks like this:

\[\alpha_1=\begin{pmatrix}0&{\bf\mathrm{i}}\\-{\bf\mathrm{i}}&0\end{pmatrix},~~~ \alpha_2=\begin{pmatrix}0&{\bf\mathrm{j}}\\-{\bf\mathrm{j}}&0\end{pmatrix},~~~ \alpha_3=\begin{pmatrix}0&{\bf\mathrm{k}}\\-{\bf\mathrm{k}}&0\end{pmatrix},~~~ \beta=\begin{pmatrix}0&1\\1&0\end{pmatrix}.\]

The use of quaternionic matrices as coefficients in the Dirac equation implies that the wave function is no longer a (real or complex) scalar function either: instead, it itself becomes a quaternionic 2-vector.

This is a crucial thought: two seemingly contradictory conditions (the KG-equation and the Dirac-equation) are resolved when we switch from using real or complex numbers to some other kind of quantity.

Probability current

The Dirac equation solves the problem with probability density. Multiplying the Dirac-equation with the conjugate transpose of \(\phi\) on the left, multiplying the conjugate transpose of the Dirac-equation by \(\phi\) on the right, and adding the results, we get:

\begin{align} \phi^\dagger\frac{\partial\phi}{\partial t}&=\phi^\dagger(-i{\bf\mathrm{\alpha}}\nabla+\beta m)\phi,\\ \frac{\partial\phi^\dagger}{\partial t}\phi&=\phi^\dagger(-i{\bf\mathrm{\alpha}}\overleftarrow{\nabla}-\beta m)\phi,\\ \phi^\dagger\frac{\partial\phi}{\partial t}+\frac{\partial\phi^\dagger}{\partial t}\phi&=-\phi^\dagger i{\bf\mathrm{\alpha}}\nabla\phi+\phi^\dagger\beta m\phi-\phi^\dagger i{\bf\mathrm{\alpha}}\overleftarrow{\nabla}\phi-\phi^\dagger\beta m\phi,\\ \frac{\partial}{\partial t}(\phi^\dagger\phi)&=-\nabla(\phi^\dagger i{\bf\mathrm{\alpha}}\phi). \end{align}

The symbol \(\overleftarrow{\nabla}\) is borrowed from Aitchison (1996): \(\phi^\dagger\overleftarrow{\nabla}_x=\partial\phi/\partial x\) is a row vector (recall that \(\phi^\dagger\) is a quaternionic row vector that is the conjugate transpose of the column vector \(\phi\)).

Substituting

\[\rho=\phi^\dagger\phi,~~~~~~~~~~{\bf\mathrm{j}}=\phi^\dagger i{\bf\mathrm{\alpha}}\phi,\]

we get

\[\frac{\partial\rho}{\partial t}+\nabla{\bf\mathrm{j}}=0.\]

The expression for the probability density \(\rho\) is positive definite: for a two-component quaternionic vector \(\phi=(\phi_1,\phi_2)\), it is just \(|\phi_1|^2+|\phi_2|^2\). Therefore, the continuity equation we just derived from the Dirac equation is acceptable.

Negative energy

But what does a wave function that is a quaternionic 2-vector physically represent? Let's examine the energies associated with the two elements of this quaternionic wave function.

From the KG equation (which our wave function was postulated to satisfy) we know that energy can be positive or negative:

\[E=\pm\sqrt{p^2+m^2}.\]

In component form, after substituting our choice of values for α and β, the quaternionic 2-vector version of the Dirac-equation looks like this:

\[\frac{\partial}{\partial t}\begin{pmatrix}\phi_1\\\phi_2\end{pmatrix}=\begin{pmatrix}-im&({\bf\mathrm{i}}~{\bf\mathrm{j}}~{\bf\mathrm{k}})\cdot\nabla\\-({\bf\mathrm{i}}~{\bf\mathrm{j}}~{\bf\mathrm{k}})\cdot\nabla&im\end{pmatrix}\begin{pmatrix}\phi_1\\\phi_2\end{pmatrix}.\]

Or, multiplied by i on both sides:

\[i\frac{\partial}{\partial t}\begin{pmatrix}\phi_1\\\phi_2\end{pmatrix}=\begin{pmatrix}m&i({\bf\mathrm{i}}~{\bf\mathrm{j}}~{\bf\mathrm{k}})\cdot\nabla\\-i({\bf\mathrm{i}}~{\bf\mathrm{j}}~{\bf\mathrm{k}})\cdot\nabla&-m\end{pmatrix}\begin{pmatrix}\phi_1\\\phi_2\end{pmatrix}.\]

Now is the time to recognize that \(i\partial/\partial t\) is none other but the energy operator, and \(\nabla\) is none other but the momentum operator multiplied by \(–i\). For a particle at rest, its momentum is zero, therefore its energy will be:

\[E=\pm\sqrt{p^2+m^2}=\pm m.\]

Choosing \(E\) to be positive, we get

\[m\begin{pmatrix}\phi_1\\\phi_2\end{pmatrix}=\begin{pmatrix}m&0\\0&-m\end{pmatrix}\begin{pmatrix}\phi_1\\\phi_2\end{pmatrix},\]

which implies

\[\phi_2=0,~~~~~~~~~~\phi=\begin{pmatrix}\phi_1\\0\end{pmatrix}.\]

Similarly, if we choose \(E\) to be negative, we get

\[\phi_1=0,~~~~~~~~~~\phi=\begin{pmatrix}0\\\phi_2\end{pmatrix}.\]

What this means is that we have decomposed our quaternionic two-vector wave function into two quaternionic scalar wave functions: one corresponding to a positive energy solution, the other corresponding to a negative energy solution. Dirac's interpretation is that these two solutions represent a particle and an antiparticle; the fact that they show up inseparably together in the Dirac-equation implies that particles and antiparticles can only be created and annihilated together.

Spin

But why is the wave function of a particle itself a quaternionic function as opposed to an "ordinary" complex-valued function? If we map the quaternion to a pair of complex numbers, we have the answer: What it really tells us is that a particle represented by this wave function has an extra internal degree of freedom with two discrete values. This in fact, is precisely what a spin-1/2 particle is. Significantly, the Dirac-equation does not admit a solution describing a particle with no internal degrees of freedom; in other words, it implies that nature does not like spin-0 particles.


References

Aitchison, I. J. R. & Hey, A. J. G., Gauge Theories in Particle Physics, Institute of Physics Publishing, 1996
  and, assuming it is not overly pretentious to reference one's own unpublished article from within another unpublished article:
Toth, Viktor T, Principles of Elementary Quantum Mechanics, http://www.vttoth.com/CMS/index.php/physics-notes/137, 2003