﻿INVESTIGATIONS IN QUANTUM COMPUTING
Causality and Graph Isomorphism
Thesis by
David Eugene Beckman
Advisor
John Preskill
In Partial Fulfillment of the Requirements
for the Degree of
Doctor of Philosophy
California Institute of Technology
Pasadena, California
2004
(Defended May 14, 2004)
Acknowledgements
ii
First and foremost, I’d like to thank my advisor John Preskill, for immensely interest-
ing discussions and for the patience of a Buddhist in awaiting the completion of this
manuscript. In addition, I’d like to thank Michael Nielsen for especially useful discussions
on many topics. I’d also like to thank Dorit Aharonov and Amalavoyal Chari for helpful
discussions regarding graph isomorphism; and Charlene Ahn, John Cortese, Jim Harring-
ton, Alexei Kitaev, Andrew Landahl, Barry Simon, and Clint White for their comments
and discussions regarding causality and causal operations.
Also deserving of recognition are all those who, by encouraging comments or pointed
questions, helped to convince me to finally complete this thesis. These are too numerous
to list here, but special mention should be made of my parents, who mostly restricted
themselves to encouraging comments, and of Matt Matuszewski, who felt free to ask
pointed questions. Finally, I would like to thank the management of Toyon Research
Corporation for their patience and flexibility in allowing me the time to complete and
defend my dissertation.
Abstract
iii
INVESTIGATIONS IN QUANTUM COMPUTING
Causality and Graph Isomorphism
by
David Eugene Beckman
In Partial Fulfillment of the
Requirements for the Degree of
Doctor of Philosophy
In this thesis I explore two different types of limits on the time complexity of quantum
computation—that is, limits on how much time is required to perform a given class of
quantum operations on a quantum system. Upper limits can be found by explicit con-
struction; I explore this approach for the problem of determining whether two graphs
are isomorphic. Finding lower limits, on the other hand, usually requires appeal to some
fundamental principle of the operation under consideration; I use this approach to derive
lower limits placed by the requirements of relativistic causality on the time required for
implementation of some nonlocal quantum operations. In some situations these limits are
attainable, but for other physical spacetime geometries we exhibit classes of operations
which do not violate relativistic causality but which are nevertheless not implementable.
iv
Contents
v
1 Introduction 1
1.1 Why quantum computation? . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Physical limits to computation . . . . . . . . . . . . . . . . . . . . . 2
1.1.2 Computational power of physics . . . . . . . . . . . . . . . . . . . . 3
1.2 A physical model for computation . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.1 Reversible computation . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2.2 Computational complexity . . . . . . . . . . . . . . . . . . . . . . . . 13
1.2.3 The quantum computer, revisited . . . . . . . . . . . . . . . . . . . . 19
1.3 In conclusion, an introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2 Ruminations on Graph Isomorphism 23
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.1.1 The problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.1.2 Classical algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.2 Some primitives for quantum computation . . . . . . . . . . . . . . . . . . . 30
2.3 Quantum algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.3.1 Solving GI given a succinct quantum certificate . . . . . . . . . . . . 40
2.3.2 Examples of succinct certificates . . . . . . . . . . . . . . . . . . . . 43
2.3.3 An observable solving GI . . . . . . . . . . . . . . . . . . . . . . . . 46
2.4 Trying to construct � �G � . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.4.1 Structured Grover search . . . . . . . . . . . . . . . . . . . . . . . . 49
2.4.2 Stabilizer formalism . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.5 Related problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
vi
2.5.1 Group non-membership . . . . . . . . . . . . . . . . . . . . . . . . . 56
3 Causal Restrictions on Nonlocal Measurement Superoperators 61
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.1.1 The problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.2 Complete projective superoperators . . . . . . . . . . . . . . . . . . . . . . . 70
3.2.1 Causality and 1-causality . . . . . . . . . . . . . . . . . . . . . . . . 70
3.2.2 Localizability and 1-localizability . . . . . . . . . . . . . . . . . . . . 77
3.2.3 Causal, nonlocalizable superoperators . . . . . . . . . . . . . . . . . 79
3.2.4 DiVincenzo’s conjecture . . . . . . . . . . . . . . . . . . . . . . . . . 96
3.3 Unitary superoperators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
3.4 More general superoperators . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
3.5 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
3.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
Bibliography 112
List of Figures
vii
1.1 A quantum circuit equivalence . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.1 The label gadget for a graph . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.1 Two different causal geometries . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.2 Causal and semicausal geometries . . . . . . . . . . . . . . . . . . . . . . . . 66
3.3 States used in the completions of E . . . . . . . . . . . . . . . . . . . . . . . 105
Chapter 1
Introduction
1.1 Why quantum computation?
1
Research in classical complexity theory, the branch of computer science which studies the
power of algorithms and the limitations of computers, has generally followed one of two
complementary approaches. The first and most obvious line of research is the development
of efficient algorithms to solve classes of problems. This is the most straightforward method
of determining how difficult a problem is, and, in particular, whether it can be solved
with the allotted resources. But this approach, of course, only provides upper bounds
on resource requirements; an as-yet-undiscovered algorithm for the problem may require
fewer resources than the best algorithm currently known.
Finding nontrivial lower bounds on the resources required to solve a given problem
often requires more subtlety. The theories of computability and complexity (briefly dis-
cussed in §1.2.2 below) have been developed to place limits on whether a given problem
can be solved by a well-defined algorithm and, if so, what resources are required for the
solution. For apparently difficult problems, in which significant effort has not produced
an efficient solution algorithm, these formalisms can be a useful way of quantifying how
hard the problem is. Even for tractable problems, it may be useful to know whether more
efficient algorithms are possible in principle. 1
1 In fact, few nontrivial bounds on the difficulty of solving practical problems are known; in many cases
all that can be done is to reduce a problem which seems hard to a special case of another problem that is
also believed to be hard. Even this is valuable information, though. As Garey and Johnson point out [50,
2
The concept of a quantum computer, the central idea of quantum computation, arises
somewhat naturally as the result of two different lines of thought somewhat analogous
to these two motivations of classical computer science. The quantum computer, a com-
putational device using the axioms of quantum mechanics in some fundamental way to
perform its operations, 2 may first be thought of as a framework, based on physical theory,
for describing precisely what operations it is possible for a computer to perform—that is,
for determining what a computer cannot do. Second, it may be considered as a particular
computational model which is seemingly difficult to simulate on a “classical” computer
and hence (in a sense described in §1.2.2 below) apparently more powerful than a classical
computer.
1.1.1 Physical limits to computation
Historically, the first approach was pioneered by researchers such as Landauer. As early
as 1961, Landauer [69, 70] argued, using a simple physical model for a unit of memory,
that the irreversible erasure of information is necessarily dissipative; that is, it necessarily
results in heat generation of at least kT ln 2 per erased bit. (Earlier references to the value
kT ln 2, such as von Neumann’s [100, p. 469 in [8]], do not explicitly identify erasure as
the fundamental heat-generating process in computation. See [70] and [16] for summaries
of the historical evolution of this argument.) Landauer’s original paper [69] is instructive
as a demonstration of the usefulness of physical models for computation. This utility
is of course demonstrated by the power of his main result, in providing a fundamental
pp. 2–4], once we know that exact solution of a given problem is “probably hard” we know that we can
more profitably direct our efforts to changing the problem requirements than to solving the given problem.
2 A “quantum computer” should be distinguished from a device which, though its size is such that
quantum-mechanical perturbations of classical effects become nonnegligible in designing its fundamental
components, is constructed so that the overall operation can be understood as a purely classical process.
For example, an integrated circuit may have transistors whose gate region is so small that the presence or
absence of a single electron may affect the transistor’s switching characteristics; thus charge quantization
is a nonnegligible effect. However, in a classical computer such a transistor may be driven, either at a high
enough voltage or for a long enough time, that this quantization effect is negligible in the amplified output
signal, and the time-averaged current flow can be approximated classically.
3
limit, on what a computer may do, of a very different sort than the purely-mathematical
Turing limits on “computable” functions. But it is also shown in the negative by the
unwieldiness of his sketched argument (later seen to rest on an implicit assumption not true
for all possible computational models) that logical irreversibility, with the corresponding
dissipation of heat, is necessary for the operation of any general-purpose computing device:
With the existence of a general model for computation, such an argument might be easily
formalized, and its truth and assumptions easily seen as such. (Arguments such as these,
giving physical constraints on the performance of a computer, are still being made. For
example, a review by Meindl et al. [79] collects performance limits for silicon integrated
circuits; Bekenstein [13, 14] and Lloyd [75] derive ultimate architecture-independent limits
on computational power—of course, the generality of these limits comes at the cost of
outlandish bounds.)
Thus quantum computation, being on a firmer physical foundation than classical com-
putation, also has the potential to place more strict fundamental limits on the power of
computation than classical computer science.
It is reasonable, since nature is currently believed to be fundamentally quantum, to
turn to quantum theory as the basis for a general framework for computation, and at-
tempt to use this quantum computational model to derive limits on what a computer may
compute.
1.1.2 Computational power of physics
The second approach, in which quantum effects in a computer are considered as desirable
properties allowing the performance of interesting operations different from those per-
formable on a classical computer, began with proposals such as those of Feynman [46, 47]
and Deutsch [35, 36]. (These in turn were motivated by work by Fredkin and Toffoli,
who discussed reversible computation from a physical point of view.) Development of new
algorithms for which no classical equivalents are known has driven much of the recent
interest in the field; quantum computers are likely to be much more difficult to construct
and operate than their classical equivalents, so they are likely to be practically useful only
if a correspondingly large gain in computational power can be achieved.
4
Feynman [46], noting that large quantum systems seem to be difficult to simulate on
a classical computer (apparently requiring time exponential in the size of the quantum
system), proposed designing a “quantum computer” made up, e.g., of a lattice of spins
in which a local Hamiltonian could be tailored to match the Hamiltonian of the system
under study, allowing the “quantum computer” to effortlessly simulate the system of
interest: efficient simulation of the dynamics of arbitrary quantum systems. Deutsch [35],
motivated both by Feynman’s suggestion of a more powerful model of computation and by
the persistence of the “Church-Turing thesis” 3 in classical computer science (suggesting to
him the presence of some physical law limiting computational power: thus his motivation
was apparently a combination of both of the reasons sketched here), expanded these
proposals to a full definition of a quantum computer. His original proposal was a quantum
analogue to the machine proposed originally by Turing [97], having a “head” (a coherent
quantum analogue of a classical finite state machine) moving along a linear “tape” (a
linear array of quantum memory registers, infinite in both directions), undergoing at each
discrete time step a local unitary operation involving the head’s state and the state of the
memory register at that point on the tape. In later work Deutsch [36] displayed a different
model for quantum computation, the quantum circuit model analogous to a description
of the various “gates” used as fundamental building blocks for creating a classical circuit
(where each gate represents a quantum operation acting at a specified time on a specified
set of quantum registers), and gave an explicit set of gates allowing universal quantum
computation. This latter model is more commonly used in practice today.
Further hints at the increased power of a quantum computer, after Feynman’s com-
ment, were provided by Deutsch and Jozsa. As an example demonstrating the utility
of a quantum computer, Deutsch proposed the problem of determining, for a black-box
function f : {0, 1} → {0, 1}, the value of f(0) ⊕ f(1). Classically this always requires
two evaluations of f; but Deutsch and Jozsa [37] showed that a quantum computer could
determine f(0) ⊕ f(1), with certainty, with a single (quantum) evaluation of f. 4 Deutsch
3 The Church-Turing thesis is the conjecture that the class of “computable” functions does not depend
much on the particular definition of a computer used; this is discussed more fully in §1.2.2 below.
4 Deutsch [35] originally showed only that the quantum computer could give the correct answer half the
5
and Jozsa extended this result to the more general case of determining whether a function
f : {0, . . . , 2N − 1} → {0, 1} is balanced (having |f −1 (0)| = |f −1 (1)| = N) or constant.
The quantum algorithm takes time 5 O(log N) plus exactly two evaluations of f, while the
classical algorithm takes time O(N) plus N +1 evaluations of f in the worst case (although
on average it requires only time O(log N) and approximately three evaluations of f)—a
further suggestion that quantum computers might be exponentially more powerful than
classical ones, at least for some problems.
The most famous result in the theory of quantum algorithms occurred in 1994, when
Shor [91, 92] proved that a quantum computer could factor integers in “polynomial” time, 6
in contrast to the classical case in which no polynomial-time algorithm is known. (The
best classical factoring algorithms known are called “superpolynomial,” meaning that they
grow more quickly than any polynomial in log N but more slowly than N = exp(log N).
The number-field sieve [72], the classical algorithm whose asymptotic complexity is the
lowest known, requires time O � exp(c(log N) 1/3 (log log N) 2/3 ) � .
Two years later, Grover [55, 56] demonstrated another result, less spectacular in its
speed increase but far more general, for finding roots of arbitrary black-box functions
in less time (measured by the number of function evaluations required) than classically
possible: A quantum computer using Grover’s algorithm requires only O( √ N) evaluations
of f where the classical algorithm requires O(N)). 7 Grover’s algorithm may immediately
be applied to many search problems; in particular, every problem in NP 8 can be recast
time; the other half the time it would return a “|fail〉” state, giving no information about the value of
f(0) ⊕ f(1). This solution does not improve the mean runtime, but it does improve the best-case runtime.
5 See §1.2.2 for a definition of O-notation.
6 That is, the algorithm can find nontrivial factors of a composite integer N in time polynomial in log N,
the length of the binary representation of N. A simple algorithm [11] suffices to perform the factoring
in O ˆ (log N) 3˜ operations using space linear in log N; more complicated algorithms can reduce the time
required even further.
7 Given an arbitrary function f(x) : X → {0, 1}, a solution f(x) = 0 may be found with high probability,
if one exists, in at most O( p |X|) evaluations of f, without knowing any further properties of f. If the
p
|X| evaluations of f are required; classically the solution takes
solution x is known to be unique, about π
2
|X|/2 evaluations of f on average, so the quantum advantage is quadratic.
8 NP is the class of nondeterministic polynomial-time algorithms, defined in §1.2.2 below.
6
in a form amenable to use of Grover’s algorithm, by performing a search for one of the
“succinct certificates” (further described in Chapter 2 below) demonstrating the solution
of the problem. 9
Thus, we can view the theory of quantum computation as a tool both for discovering
limits on computational power and for improving computational power. The following
chapters in this thesis explore both of these approaches. The remainder of this Introduction
provides background material and notation for the later chapters: The design and notation
of the quantum computers discussed here are outlined, and the subject of computational
complexity is briefly reviewed.
1.2 A physical model for computation
The core idea behind the quantum computer is that it is a physical model for computation.
That is, unlike the basically empirical or intuitive rules used in specifying the allowed fun-
damental operations for various classical models of computation, the quantum computer
should be designed so that its states and operations correspond closely to fundamental
physical laws—in particular, the laws of quantum mechanics. 10 Because this mapping
is from such fundamental elements of quantum mechanics, a statement about quantum
mechanics may immediately map to a statement about quantum computation, possibly
providing nontrivial information about the power or limitations of quantum computation.
9 Note, however, that this demonstrates an improvement over only the most naïve classical algorithm for
solution of the problem, a brute-force search over the problem space. For many interesting problems in NP
algorithms more efficient than brute-force searching are known. Grover’s algorithm can sometimes be used
with these classical algorithms to further reduce the complexity, but in some cases the classical algorithms
do not seem to reduce easily to the Grover black-box search form. Similarly, some harder problems (not
in NP) do not seem amenable to Grover reductions [84, 44].
10 A complete physical model should thus use a complete unified field theory as basis for the compu-
tational model; this is beyond the scope of this paper and unlikely to be relevant to the functioning of
experimental realizations of quantum computers in the near future. In this work we will generally assume
(as is common in quantum-computing research) that nonrelativistic quantum mechanics is a valid approx-
imation for the systems under consideration. We will, however, consider the effects of finite propagation
speeds, for signals traveling between widely-separated subsystems, on models of allowed operations.
7
At the most basic level, we thus define our quantum computer by reference to the ax-
ioms of quantum mechanics [102, 86]: The state of a quantum computer is the quantum-
mechanical state of the system, a ray in the computer’s Hilbert space. An allowed oper-
ation of a quantum computer is a quantum-mechanical operator acting on the system’s
state. In the standard formalism for quantum mechanics of closed systems, the state can
be represented by a vector |Ψ〉 in the Hilbert space H of the system; the operators can
be thought of as compositions of unitary operators U and projective (or “von Neumann”)
measurement operators M. The Hilbert space H of a quantum computer is typically
finite-dimensional and often considered to be composed of a tensor product of some num-
ber n of fundamental d-dimensional Hilbert spaces Hd called qudits (qubits for d = 2),
H = Hd ⊗ · · · ⊗ Hd. For the purposes of this paper we will not typically refer directly to
the qudit Hilbert spaces but to quantum “memory registers” composed of some number
of qudits and having some distingushed basis {|0〉, . . . , |N − 1〉} (in which state prepara-
tion and measurements are easy to perform), the computational basis (using the usual
Dirac bra-ket notation to denote states of a quantum register). The computational basis
states of a quantum register will often be interpreted as radix-N numbers, just as usual
in classical computation.
In general, however, there is no reason to restrict our computer to begin in such a “pure
state” uncorrelated with external states; and we may be unable in practice to keep our
quantum system perfectly isolated from the environment. So, more generally, the quantum
computer’s state can be described using the language of open quantum systems, where
the state of the system is described by a density matrix ρ. In this formalism, operators
are allowed to interact both with the system of interest and (either intentionally or para-
sitically) with the external environment. Unitary operators are replaced by more general
superoperators /S, and generalized measurements (or positive operator-valued measures,
POVMs) M replace projective measurement operators [88, §3.1–3.3]. Superoperators and
POVMs can both be written explicitly using a Kraus representation, 11 a set of linear
11 The Kraus representation, as well as being a convenient form for calculations, emphasizes a formal
similarity between “unitary” evolution and “measurements” which is kept rather mysterious in the usual
Copenhagen picture. A superoperator, in this language, can be thought of as a particular implementation
8
operators {Mi} satisfying � †
i M i Mi = 1. A superoperator /S[{Mi}] with Kraus representation
{Mi} acts on a density matrix ρ as /S[{Mi}](ρ) = �
i M †
iρM i . A POVM with
this Kraus representation is a measurement with |{Mi}| outcomes, outcome i occurring
with probability tr M †
i Miρ. The operation of the quantum computer will cause the initial
state ρ to evolve to some final state /S(ρ). If, as is usually the case when the goal is solu-
tion of a classical problem, we desire a classical answer at the end of the computation, a
measurement may be performed on this final state to read out the answer.
The computer’s interactions with the environment during the course of its computation
can be divided into “desirable” interactions (e.g., those in which the computer is observing
or interacting with some other system) and undesirable “parasitic” interactions. The
latter interactions can be suppressed using quantum error correction, in which a smaller,
protected subspace of the computer’s full Hilbert space is used as the computer’s state; the
most likely parasitic interactions of the computer with the environment cause excursions
of the computer’s state out of this subspace which can, for a properly-designed quantum
error-correcting code (see, e.g., [83, Ch. 10] and [88, Ch. 7] for introductory treatments),
be measured and corrected; so such interactions need not be considered in the theoretical
analysis of the quantum computer. In the former case, if the external system obeys
quantum mechanics, then we may incorporate it as part of the quantum system making
up the computer to remove this source of interaction with an external environment. (The
caveat “if the external system obeys quantum mechanics” probably seems rather weak.
However, as in classical computer science, it is sometimes interesting to consider oracle
problems, in which a black box of unknown design—hence, possibly computing difficult
or even uncomputable functions—may answer certain questions. A common goal in such
problems is to find out what extra power such an oracle could provide, if it existed.
Another use of oracles is to provide black-box devices that cannot be reverse-engineered.
The Grover result, for example, can be stated as a quadratic speedup over the best possible
classical algorithm if the function f, for which a root is to be found, is considered to be
a randomly-chosen oracle function. If, on the contrary, f were given by some explicit
formula, then a clever algorithm might be able to simplify the formula and find an algebraic
of a POVM in which the measurement results are ignored and discarded.
solution in less time than a brute-force search.)
9
So, as long as the environment is quantum-mechanical in nature, interactions between
the computer and the environment may be either suppressed, by embedding the computer’s
state space in a protected subspace of a larger Hilbert space, or modeled, by enlarging
our computer’s Hilbert space to include the interacting system. Thus we can usually
pretend that our quantum computer is isolated from the external environment: undergoing
only unitary evolution over the course of its computation, perhaps followed by a single
measurement at the end so that we can learn the result of the computation.
This can be shown in two steps. First we can convert each instance of a superoperator
or POVM into a unitary operator or von Neumann measurement operator (respectively)
on an augmented Hilbert space; for a superoperator or POVM with Kraus representation
{Mi} (acting on the computer’s Hilbert space S), an extra quantum register K of at least
log |{Mi}| qubits (able to hold the Kraus index i), initially in state |0〉 K , is adjoined to the
computer, and the operator (which turns out to be unitary—see [88, §3.2])
U[{Mi}] : |ψ〉 S |0〉 K ↦→ �
i
� �
Mi |ψ〉 |i〉K
S
(1.1)
is performed. 12 For a POVM, the register K is measured (in the computational basis);
for a superoperator, the register K is simply discarded. We are left with a quantum com-
puter (on an augmented Hilbert space) undergoing only unitary evolution and projective
measurements. Each measurement can be thought of as correlating the external environ-
ment with the state of the system via a unitary operator integrated from an interaction
Hamiltonian (see [88, §3.1]). (The measurement result may be used in constructing later
12 We will label quantum registers with letters in a roman font; thus |ψ〉X ∈ H X . Lowercase letters will
be used for single-qubit registers; uppercase letters denote longer registers. For a register X thought of
as containing natural numbers we may use the notation Xi to denote the notional qubit holding the ith
“bit” of register X, the coefficient of 2 i in its binary representation. (This qubit may not exist as part
of the register in all implementations: for example, if our quantum computer is based on 3-dimensional
“qutrit” systems we may be using trinary representation instead of binary. But base conversions are easy
to compute, so we may assume that this qubit exists without greatly affecting our estimates of the difficulty
of a computation.)
10
operations, and it may also be of interest as part of the computer’s output.) We can
replace the coupling to the environment by a coupling to another new quantum regis-
ter M (again, initially in the state |0〉 M ) of sufficiently large dimension; after performing
this unitary operation, M, instead of the external environment, holds the measurement
results. Any (unitary) operator later in the computation which was originally classically
conditioned on the outcome of this measurement may be replaced by a controlled-unitary
operator (defined in §1.2.3 below) with control register M. Finally we have a quantum sys-
tem undergoing only unitary evolution. At the end of the computation, we can read out
all of the measurement results by measuring the ancilla systems M added in unitarizing
the measurements.
1.2.1 Reversible computation
This is already a theoretically useful result. As a simple example of its power, we can
already see that, since its unitary evolution U may be perfectly undone by the evolu-
tion U −1 = U † , such a quantum computer is reversible. Thus (apart from uncorrectible
errors arising from interactions with an inaccessible environment during the computer’s
operation) we can simulate any quantum computer with a reversible quantum computer
undergoing only unitary evolution. This is a very different sort of computation than that
performed by a standard classical computer. Among even the simplest sorts of operations
possible on a classical computer are those, such as erasing memory or replacing two num-
bers with their sum, which are not reversible and which therefore cannot be performed
with a unitary quantum computer. Because of this strong restriction on the allowable op-
erations it is not immediately obvious that a reversible computer can efficiently perform
all of the operations of a normal classical computer. However, as shown by Bennett [15]
and Fredkin and Toffoli [49], in fact a reversible computer can simulate the action of an
(irreversible) classical computer fairly efficiently. 13
Of course, this simulation would not be difficult if our reversible computer had unlim-
ited storage space. In this case an arbitrary computation can be performed reversibly by
13 Exactly what is meant by “efficiency”—the notion of the complexity of a computation—will be dis-
cussed in §1.2.2 below.
11
saving all of the intermediate values which are destroyed by the irreversible computer; for
example, whenever the irreversible computer performs an operation 14 [x] ↦→ [f(x)], then
the reversible computer, simulating this operation, performs instead 15 U[f] : |x〉 X |y〉 Y ↦→
|x〉 X |y ⊕ f(x)〉 Y . The reversible computer, limited to performing only bijective operations,
can still compute arbitrary functions by preserving the register containing the input state
and writing its output to a second register. Typically the initial value of this second
register is y = 0 so that the final state is just |x〉 X |f(x)〉 Y , but to fully define the oper-
ation we must define its action on each state |x〉 X |y〉 Y of the computational basis. (For
a quantum computer, where arbitrary superpositions of states are allowed, we can use
linearity to extend this definition to the entire Hilbert space.) To maintain reversibil-
ity we store the function in the second register using some simple-to-compute bijection
(here, as is common, we use addition). We will often define the reversible operation as
|x〉 X |0〉 Y ↦→ |x〉 X |f(x)〉 Y , implicitly defining its action on states with initial output register
y �= 0 by addition, as above.
In a long classical computation many irreversible operations will generally be per-
formed, and maintaining all of the intermediate results, to preserve reversibility, may
become infeasible. However, Bennett found a general method for restraining this memory
usage and periodically erasing (reversibly!) most of the intermediate results. Suppose we
have a computer U[f] reversibly calculating f(x); however, in the process, it is forced to
save various intermediate results g(x),
U[f]
|x〉 |0〉 |0〉 ↦−→ |x〉 |f(x)〉 |g(x)〉 .
X Y G
X Y G
14 We use the notation [x] to represent a register of an irreversible classical computer with value x, to
stress the difference between reversible and irreversible computations.
15 Operations are usually considered to be performed modulo the dimension N of the register’s Hilbert
space when necessary; thus the addition operation in |y ⊕ f(x)〉 Y is understood as mod N addition. N will
typically be chosen implicitly to be large enough to hold all of the values of interest. Subscripts on bras
and kets are used to label the various register subspaces so that the entire state of the computer need
not be given for an operation acting nontrivially only on a small subsystem of the computer. To avoid
ambiguity, an operation may also be labeled with the registers on which it acts nontrivially; these will,
however, usually be omitted for clarity of notation.
12
Since U[f] is reversible, we can erase the intermediate results g(x) by merely performing
U[f] −1 , running our reversible computer backwards. Of course, this erases the desired
output f(x) as well, so before doing this we should save the desired result in yet another
register:
|x〉 X |0〉 Y |0〉 G |0〉 F
U[f]
↦−→ |x〉 X |f(x)〉 Y |g(x)〉 G |0〉 F
U[⊕] Y,F
↦−→ |x〉 X |f(x)〉 Y |g(x)〉 G |f(x)〉 F
U[f] −1
↦−→ |x〉 X |0〉 Y |0〉 G |f(x)〉 F .
At the close of this computation, the ancillary registers Y and G have been restored to
their initial states |0〉, so they can be reused in later computations. (The operation labeled
U[⊕], used to copy the result into the output register F, is just another addition operation,
adding the contents f(x) of register Y to those of register F, initialized to 0.) This process
can be repeated throughout the computation to clean up unneeded “scratch space.” It can
even be done recursively; for example, the computation U[f] may include some subroutine
U[h] which cleans up its own intermediate results. It should be noted, though, that since
the clean version of a subroutine requires two iterations of the messy version U[h], the
clean version of U[f] now requires four iterations of U[h]. Thus cleaning up recursive
subroutines using this process may be more time-consuming than expected, and it may
be more efficient to redesign the subroutine to use less scratch space.
The methods and results described so far apply to any reversible computer; although
we have been using the language and notation of quantum mechanics the results have been
quoted in terms of actions on individual basis states and so are not explicitly quantum.
The fundamental quantum nature of these equations only becomes explicit when we begin
to consider the action of these operators on superpositions of states in the computational
basis. For example, consider the operator U[⊕] used above to copy our final answer. This
operator only acts as a “copy” operator in the computational basis. For example, the
state16 � �
|x〉 A + |y〉 A |0〉B becomes, when acted on by U[⊕]α,β, � �
|x〉 A |x〉 B + |y〉 A |y〉 B , which
16 In these expressions the overall normalization factors are not shown explicitly, a difference from stan-
dard quantum-mechanics notation. In quantum computation sums over states in an orthonormal basis are
13
is not the same as � �� �
|x〉 A + |y〉 A |x〉B + |y〉 B . (The quantum no-cloning theorem [38, 105],
in fact, states that it is impossible to make a perfect copy of an unknown quantum state.
U[⊕] only makes perfect copies of states in a known basis, which does not violate the
theorem.)
1.2.2 Computational complexity
Although this formal equivalence between quantum-mechanical axioms and elements of
quantum computation already allows some useful discussions, it is not quite enough for a
fully satisfactory definition of a quantum computer; our descriptions so far have not given
us a complete method for determining how difficult a given quantum operation U is—or
if, indeed, it is even possible. These questions belong to the fields of complexity theory and
computability theory, respectively. A brief outline of these topics will let us define more
precisely what we mean by the “difficulty” or “efficiency” of an algorithm, already alluded
to several times above. (More complete introductions to computability and complexity
theory can be found in Papadimitriou’s two texts [73, 85] and Davis’ review [32].)
The primary goal of computability theory is to determine whether a given class of
computational problems is solvable using a well-defined algorithm. Development of this
theory started with results such as Turing’s proof [97] of the unsolvability of Hilbert’s
Tenth Problem (the Entscheidungsproblem 17 ) in 1936, and occurred in parallel with the
study of various computational models, attempting to formalize the notion of an algo-
rithm, then being developed. Turing’s model, the Turing machine, consisted of a finite
state machine (the head) moving along an unbounded linear array of memory registers
common; for a sum of N such terms the overall normalization factor is trivially 1
√ N and may often be
omitted without causing confusion.
17 Literally “decision problem,” referring to the problem of determining the validity of a formula in
predicate calculus. Hilbert [57] was interested in algorithmic solutions to Diophantine equations; Gödel
showed that arbitrary formulas in predicate calculus could be encoded as Diophantine equations; and
Turing showed (once his definition of an algorithm was accepted) that such a problem was equivalent to
the uncomputable halting problem, of determining whether a computation of a particular Turing machine
would ever terminate. (See [34, 78] for an elegant, self-contained number-theoretic proof that Hilbert’s
Tenth Problem is unsolvable.)
14
(the tape) on which it can read and write symbols from a finite alphabet. Turing showed
that the Entscheidungsproblem is equivalent to the halting problem, of determining
whether a computation of a particular Turing machine would ever terminate (i.e., reach
a particular head state), and that this problem is in general uncomputable by a Turing
machine. Another early model, apparently quite different, was the class of recursive func-
tions described by Church [29]; other models were proposed later. These models, some
invented for ease of theoretical analysis and some invented to model practical computing
machines, varied widely in their details; but it was soon noticed that all of the models
which seemed physically realizable agreed on the class of algorithmically-computable func-
tions. This empirical result, known as the Church-Turing thesis [73, Ch. 5], ascribes a
fundamental mathematical importance to the concept of an algorithm, deeper than any
of these particular models, which allows them all to agree on the same result.
Computability theory thus tries to answer the qualitative question of whether a par-
ticular problem can be solved at all. Complexity theory, on the other hand, attempts to
answer the quantitative question of how hard a particular problem is to solve (assuming
that it has been found to be computable); that is, how much of various limited compu-
tational resources must be applied to solve the problem. The computational resources
of greatest interest in most applications are Time, the amount of time required for our
algorithm to compute the problem’s solution; and Space, the memory required by the
computer over its course of execution. In some applications other resources such as Com-
munication, the amount of information necessary to transmit between distant points,
and Energy dissipated in the computer are important. In many cases there are trade-
offs between these resources; different solution algorithms will optimize different functions
of these resources.
To define these requirements, the costs of each individual operation of the computer
must all be defined. This is commonly done by defining a small set of computable fun-
damental operations (often called gates, especially in circuit models of computation); the
entire computation is then built by composing these fundamental operations. The time
complexity of the entire computation, for example, can then be found (for serial compu-
tations) by summing over all the fundamental gates in the computation.
15
Computational resource requirements may be expressed in varying degrees of abstract-
ness. In some cases the exact resource requirements are useful to know; since quantum
computers are expected to be difficult to construct, for example, pioneering experimen-
talists may want to know exactly how many qubits are required to perform a particular
calculation and for exactly how long their coherence must be maintained, to know whether
a proposed design is sufficient to solve the given problem. However, in many cases we are
interested not merely in a single problem but in an entire class of related problems—for
example, in finding the factors of a given integer. There may be wide variation in the
difficulty among problems of this class (22N+ 1 is much harder to factor than 22N ), and
it may be difficult (possibly as difficult as actually solving the problem) to determine ex-
actly how hard a given problem of the class is. Since the usefulness of these resource limits
comes from knowing them before we solve the problem, we may be forced to settle for
finding upper bounds on the resources required (average-case and worst-case bounds are
both of interest: average-case bounds when the problems are proposed by a neutral party,
but worst-case bounds if proposed by an adversary). These bounds are expressed in terms
of easily-computed functions of the input parameters; the simplest such function, and the
most common value used to express these complexity bounds, is simply some measure of
the total length of the input (most bounds on factoring an integer N, for example, are
given in terms of the length log N of N).
The actual values of these resources required will vary, of course, in obvious and
relatively uninteresting ways due to differences in parameters like the computer’s clock
speed. The resource requirements may be stated in units which try to normalize for
these differences; for instance, Time is measured by counting the number of fundamental
operations of the computer required to solve the problem. Alternatively, it is quite common
to express resource bounds using Knuth’s “big-O” asymptotic-growth notation O(·), in
which only the scaling of the leading-order term is expressed: For f, g : N → N, we say
f(n) = O(g(n)) iff there exist N ∈ N and c > 0 such that f(n) ≤ cg(n) for all n > N—that
is, whenever f grows no more quickly than g. (For example, for an algorithm requiring
Time(n) = � k
i=0 αin i with αk �= 0, we would say Time(n) = O(n k ).) Knuth also defined
the related growth classes Ω, Θ, and o. f(n) = Ω(g(n)) whenever g(n) = O(f(n)) (f
16
grows at least as quickly as g); f(n) = Θ(g(n)) whenever both f(n) = O(g(n)) and
g(n) = O(f(n)) (f and g have the same asymptotic rate of growth); and f(n) = o(g(n))
whenever f(n) = O(g(n)) but f(n) �= Θ(g(n)) (f grows strictly more slowly than g).
A more interesting variation is between different computational models—for example,
between a Turing machine, in which the memory cells can only be accessed in a fixed
sequence, and a computer with random-access memory. Because different computational
models have different sets of fundamental operations, one model may prove more efficient
than another at solving a particular problem. A somewhat surprising empirical result,
noticed in the course of investigating the many various reasonable 18 computational models
proposed, is that they are all polynomially reducible to Turing machines [85, p. 36]. That
is, given an algorithm for solving a class of problems on some reasonable computational
model which uses Resource f(n) for a problem of input size n, a Turing machine may
be programmed to simulate this computer and solve the problem using Resource at
most P1(f(P2(n))), where P1 and P2 are polynomial functions. This empirical result
(sometimes called the “strong Church-Turing thesis” [83, p. 140]) seems to argue that
searching for new and more powerful models of computation is not likely to be very
fruitful, the efficiency gains over existing models likely at best polynomial. It is common
in complexity theory not only to entirely ignore constant factors (as with O-notation) but
also to denigrate “merely” polynomial improvements in resource usage. This is somewhat
surprising at first, since a polynomial improvement from, say, O(n 1000 ) to O(n 3 ) can
make a difference between a theoretical curiosity and a practical algorithm. In practice,
though, such large exponents are rare. Furthermore, the strong Church-Turing thesis
implies that at least some polynomial reductions are plausible consequences of architecture
changes, so that differences in the polynomial degree may not be fundamental in the way
that Θ(2 n ) and Θ(n 2 ) algorithms differ. 19 The algorithms in the class P of polynomial-
time algorithms 20 are often considered for these reasons to be practically computable
18 Some computational models (e.g., the nondeterministic models) are proposed for theoretical reasons
and are not expected to be physically realizable; those are not considered “reasonable” for this discussion.
19 The class of polynomial-time algorithms is said to be stable, i.e., invariant, under polynomial-time
reductions.
20 Strictly speaking, P is defined as the class of polynomial-time decision algorithms, i.e., algorithms
17
(and, similarly, algorithms which require superpolynomial time to complete are often
considered impractical). A complexity class closely related to P, and often considered to
be practically equivalent to the set of practically-computable functions, is BPP, the set of
decision problems computable in polynomial time with bounded (i.e., bounded away from
1
2 ) probability of error, in the presence of a random-number generator.
There are many other complexity classes of interest besides P. (Algorithms can also
be classified, for example, based on how much ancillary memory they require.) One
other class of great interest, already mentioned above, is NP, the class of nondeterministic
polynomial-time algorithms. 21 A nondeterministic class can be described as one in which
guessing correctly is an allowable operation: so clearly this is not meant to directly describe
a realistic computational model. However, given the list of correct guesses a normal
computer can find and check the answer in polynomial time, so NP can also be described
as the class of problems with polynomial-time verification algorithms. The list of guesses
is sometimes called a certificate for the solution; certificates will be further discussed in
Chapter 2 below.
(More formally, we say the Boolean function f(x) ∈ NP if there exist functions g(x) and
ˆf(x, y), with f(x) ≡ ˆ f(x, g(x)), and there exists a polynomial-time algorithm computing
ˆf. Here g(x) represents the set of “guesses”—the certificate, an alleged proof that f(x)
is true—for x, and ˆ f can be thought of as checking the validity of the certificate. Clearly
P ⊆ NP; a major open question in classical complexity theory is the question of whether
P = NP. NP is not a symmetric class; it requires that there exist a single valid proof g(x)
whenever f(x) is true, but that there are no valid proofs g(x) when f(x) is false. The
complementary class coNP is the class of Boolean functions f for which ¬f ∈ NP.)
One implicit detail common to all of these different “reasonably-physical” computa-
tional models (NP is obviously excluded here) discussed so far is that they were all based
computing boolean functions. We say the Boolean function f(x) ∈ P if there exists a polynomial-time
algorithm for computing f(x).
21 The class NP appears to contain some very difficult problems. The most difficult of these problems are
called the NP-complete problems; a problem is NP-complete if any other problem in NP can be efficiently
reduced to an instance of this problem class: so solving this one problem is essentially equivalent to solving
all problems in NP.
18
on classical physics. Likewise when computers were actually constructed, although their
design details differed their operations, whether mechanical or electrical, were understand-
able using classical physics. As mentioned above, this assumption was not noted explicitly
during the first few decades of computer science research when the Church-Turing theses
were being noticed. This recognition, that physical theories form a foundation for theories
of computation, immediately suggests an important question: Do different physical the-
ories provide different computers with different computational power? —That is, do the
Church-Turing theses still hold for computers based on new physics?
In this thesis our primary interest, as already mentioned, will be in utilizing nonrel-
ativistic quantum-mechanical effects in computation. For such quantum computers, the
class of computable functions remains unchanged, and the (weak form of the) Church-
Turing thesis still holds, because quantum-mechanical systems can be simulated on clas-
sical computers to any desired level of fidelity (though the Space and Time required
may be large). The interesting question is whether the strong form of the Church-Turing
thesis is true for quantum as well as classical computers. At this time there are some
indications that it may not be true; Shor’s factoring algorithm [91, 92], the Deutsch-
Jozsa algorithm [37], and Feynman’s early idea of simulating quantum systems [46, 47]
all provide better-than-polynomial improvements in time complexity over the best clas-
sical algorithms known. (Since the classical algorithms are not proven optimal, however,
this does not prove a superpolynomial separation between the computational power of
quantum and classical mechanics. 22 )
22 The Deutsch-Jozsa result is, strictly speaking, provably exponentially faster than the optimal classical
algorithm (in the worst case), as stated above. However, the Deutsch-Jozsa problem is an oracle problem,
presuming the existence of a black-box function f (the function to be tested for constant/balanced behav-
ior); since the oracle may not be strictly part of classical (or quantum-mechanical) theory, these results,
“relative to an oracle,” do not necessarily relate strictly to physically-based “absolute” complexity classes,
though they may provide hints as to the relationship of the corresponding absolute complexity classes.
The random-number generator mentioned in our definition of BPP can also be thought of as an oracle,
although since quantum mechanics allows the preparation of truly random bits this oracle, at least, can
be considered a physical device.
19
1.2.3 The quantum computer, revisited
In our earlier description of a quantum computer, its allowable operations were described
by reference to the axioms for evolution of quantum systems. In a sense this description
follows the first approach, of finding limits to the power of a computer. Now we complete
the description of the quantum computer as a model for computation, by explaining the
assignment of complexity costs to the individual quantum operations.
For simplicity we will use the quantum circuit model of computation throughout the
work. In this model, all of the qudits (or quantum registers) within a computer are
considered mutually local, or, equivalently, the cost of transporting qudits around the
system is negligibly small, so that (for example) it is equally easy to perform a given
n-qudit operation Un on any n-qudit subset of the entire system. 23 In the quantum circuit
model of computation, as in the classical circuit model, an operation will be considered
as composed of a sequence of fundamental gates, each having known resource costs. In
the classical case, there are several essentially equivalent universal gate sets 24 , i.e., sets
of fundamental gates which allow construction of a circuit for any computable function,
with equivalent complexities (up to the usual uninteresting constant factors). Similarly
in the case of quantum circuits one can construct different interesting universal gate sets.
Because Hilbert space is continuous, in contrast with the discrete Boolean space of classical
computation, a gate set which allows exact construction of a arbitrary quantum operators
must be infinite; however, finite gate sets which allow approximate construction with any
desired fidelity also exist. (See [83, §4.5] for several such constructions.) An example of
such a universal gate set is the set of all one-qubit unitary operators together with a single
two-qubit operator, the controlled-not gate cnot ≡ Λ(not) ≡ Λ(σx) (the operator not
23 This model can be contrasted with various proposed physical models such as Lloyd’s lattice model [74],
in which (usually) qubits are localized in a periodic lattice of some sort and able to interact, via some
local coupling, only with neighboring qubits. In such models the cost of qubit transport may be nontrivial
(though they can be bounded by a factor at most linear in the size of the system); such considerations
complicate the design and discussions.
24 Two universal classical gate sets in common use, for example, are {and(a, b) ≡ a ∧ b, or(a, b) ≡
a ∨ b, not(a) ≡ ā} and {nand(a, b) ≡ a ∧ b}.
20
is of course identical to the Pauli operator σx) defined by 25
cnot [A],B |0〉 � � � �
α |0〉B + β |1〉 = |0〉A α |0〉B + β |1〉 A
B
B
cnot
[A],B |1〉 � � � �
α |0〉B + β |1〉 = |1〉A α |1〉B + β |0〉
A
B
B
(and, of course, acting linearly on superpositions); that is, cnot leaves its second qubit
(the target) unchanged when the first qubit (the control) is |0〉 and flips the second
qubit when the first qubit is |1〉. Here we have introduced Kitaev’s notation Λ(U) for a
controlled-U operator,
Λ(U) [A],B
�
�
�
|a〉 |ψ〉
A B
= |a〉
A
U a
B |ψ〉 B
�
. (1.2)
Note that this generalizes the original definition; we allow the “control” Hilbert space H A
to have basis states a > 1, in which the unitary operation is performed multiple times. 26
As mentioned above, we may usually assume that the computer performs only unitary
actions over the course of its evolution, preceded by an initial state preparation and
followed by a final measurement. We must take care to consider the costs of the state
preparation and measurement (if necessary) in computing the resource costs. This may
be done by assuming that state preparation and measurement in the computational basis
is easy; we take the computer’s initial state to be the state |0 · · · 0〉 (in the computational
basis) and require any final measurement to be performed in the computational basis. In
analogy with the classical complexity class BPP, the class of algorithms computable on a
quantum computer in polynomial time with bounded probability of error is denoted BQP.
The relations between BQP and the various classical complexity classes is of obvious in-
terest. One interesting result—the oracle-relative exponential separation for the Deutsch-
Jozsa problem—has already been briefly mentioned above. Bernstein and Vazirani, in [20],
investigated BQP more generally, finding classical complexity classes bounding BQP both
25 We put brackets around the control qubit labels to distinguish them from the targets.
26 This is not practically very useful unless U d A is efficiently implementable (or d A ≡ dim H A is small)
but it is a useful notation.
21
HA H �
H
HB H ❣ H
≡
H A
H B
Figure 1.1: An interesting quantum circuit equivalence.
above and below: BPP ⊆ BQP ⊆ P #P . 27 Other work, such as the paper by Adleman et
al. [2] considering quantum computers with restricted operation sets, has slightly tight-
ened these bounds but has not sufficed to answer the question of whether BQP contains
problems which are classically difficult to solve. Bennett et al. [17], Simon [93], and Vazi-
rani [98] have proved several oracle-relative results, giving both upper and lower (relative)
bounds. These indicate that BQP, while showing occasional surprising strength, cannot
provide superpolynomial reductions for black-box problems: the quantum computer is
apparently well suited to attacking some special structure in particular types of problems.
For future use, we close this section with the definition of a unitary operator particu-
larly useful for quantum computation, the Hadamard gate (so named because it performs
the one-bit Walsh-Hadamard transform on the vector of amplitudes in the computational
basis) H:
H |0〉 = 1
� �
√ |0〉+ |1〉
2
H |1〉 = 1
√ 2
� |0〉− |1〉 � .
❣
�
(1.3)
Quantum circuits are often represented pictorially (in much the same way as classical
circuits), with qubits flowing along “wires” and boxes and other symbols representing
various operations. For example, the circuits describe graphically the operator equivalence
H A H B cnot [A],B H A H B ≡ cnot [B],A .
27 #P denotes, essentially, the class of functions counting the number of solutions of a function in P; it is
apparently a very powerful class. The superscript notation P #P denotes the class of functions computable
in polynomial time given a #P oracle, i.e., a black box capable of efficiently computing the answers to all
problems in #P.
22
The circuit is to be read left-to-right, with the qubits entering from the left as shown
by the two wires. The boxed H indicate the Hadamard gates acting on each qubit; the
vertically-aligned pairs commute and can be performed in either order, H A H B or H B H A
(or even simultaneously, if parallelization is possible). The symbol in the center represents
the cnot gate, with the filled circle indicating the control qubit and the open circle the
target qubit. Thus we have the somewhat surprising result that the sense of the controlled-
not gate, with its misleadingly-named “source” and “target” qubits, can be changed by
surrounding it with Hadamard gates, implementing changes of basis on the source and
target Hilbert spaces. (This equivalence may trivially and inelegantly be proved by writing
the operators H A , H B , cnot A,B explicitly as 4 × 4 matrices and computing their product.
A more elegant proof is possible using the language of stabilizers.)
This introduction to quantum computing and the quantum-circuit model has neces-
sarily been much abbreviated. For more background and more detailed descriptions of
various quantum algorithms see Nielsen and Chuang [83, Ch. 4–7] and Preskill [88, Ch. 6].
1.3 In conclusion, an introduction
The following two chapters contain examinations of quantum computation from the dual
perspectives outlined above. In Chapter 3 I explore the first approach, showing how
physical laws—quantum mechanics plus causality—result in restrictions on nonlocal mea-
surements; some such measurements are disallowed as violating causality. The question
of whether all measurements not forbidden by these physical laws can be implemented is
considered. In Chapter 2 I explore the second approach, with an emphasis on the clas-
sical combinatoric problem of determining whether two graphs are isomorphic. Various
quantum approaches to solving this problem are proposed.
The magic words are QUEASY LAMMERGEIER.
Chapter 2
Ruminations on Graph Isomorphism
2.1 Introduction
23
The development of efficient quantum algorithms to solve interesting problems is one of
the major driving forces behind the growth of the field of quantum information theory,
just as in the classical case the development of more efficient algorithms is a major source
of interest. Sadly, there are as yet few quantum algorithms providing substantial gains
over their classical counterparts. In this chapter we approach the subject of quantum
computation with the goals of understanding quantum algorithms and developing new
ones. Only a few distinctively-quantum algorithms have been discovered so far, giving us
a fairly small set of known essentially-quantum tools to use in new algorithms. We begin
with a review of some of these primitives, to try to gain hints as to the source of whatever
extra computational power quantum evolution can give us.
To guide our discussions we have chosen to consider a particular classical problem
of interest, that of determining whether two graphs are isomorphic to one another. This
problem is of interest in classical complexity theory because it, like Factoring, is believed
to be a relatively easy problem in NP (see, e.g., [66]) which is, however, not known to lie
in BPP. We present several potentially-fruitful quantum approaches to a solution of the
problem.
2.1.1 The problem
24
We begin with definitions of a few terms. An undirected graph (in this work, when
we refer to a graph we will be considering an undirected graph) is a set V of vertices
together with a set E of edges, unordered pairs of distinct vertices. (We can think of E
as a symmetric, antireflexive binary relation on V .) Two graphs G = (V, E), G ′ = (V, E ′ )
are called isomorphic (we write G ∼ G ′ or G φ ∼ G ′ ) if there is a bijection φ ∈ S |V |, the
isomorphism, such that (e1, e2) ∈ E iff (φ(e1), φ(e2)) ∈ E ′ . Intuitively, two graphs are
isomorphic if they represent the same pattern of vertices and edges, or if the drawing of one
graph can be topologically distorted (allowing edges to pass through one another, unlike
in knot theory) into the drawing of the other. A topological graph G is an equivalence
class of isomorphic graphs, G ≡ {H : H ∼ G}. The automorphism group Φ(G) of a
graph G is the group of automorphisms of G, Φ(G) = {φ : G φ ∼ G}. A graph is rigid if
it has trivial automorphism group, Φ(G) = e.
Finally, we define a particular representation of a graph useful for considering al-
gorithms on graphs. The adjacency matrix A for a graph G = (V, E) with vertices
V = {1, . . . , n} is the n × n matrix A = (aij) with aij = 1 if {i, j} ∈ E and aij = 0
otherwise. 1 (Thus the adjacency matrix of an undirected graph is a symmetric, traceless
0 − 1 matrix.)
Graph Isomorphism (GI) is the following decision problem in computer science:
Given two graphs G, H over the same set of vertices, is G φ ∼ H? In terms of the adjacency
matrices AG and AH, we may equivalently ask whether there is a permutation matrix
Pφ such that A G = P φ A H P −1
φ . Clearly GI ∈ NP: the permutation Pφ is a polynomial
certificate for the isomorphism. Whether GI ∈ P, and even whether GI ∈ coNP (i.e., that
there exists a succinct proof that two graphs are not isomorphic) are open problems.
The isomorphism problem for graphs is well studied in classical computer science as
1 Another common representation of graphs is as an edge list (e1, . . . , en) (where E ≡ {e1, . . . , en}).
(This representation is strictly valid only as long as G has no isolated vertices, since V is defined only im-
plicitly as the set of all edge endpoints, the domain of the relation E.) This representation is more efficient
for sparse graphs, having o(n 2 ) edges, but for general graphs is equivalent in power to the adjacency-matrix
representation.
25
an interesting problem in its own right—many problems of equivalence of algebraic struc-
tures, such as GroupIso, the problem of isomorphism of two abstract groups (explicitly
defined by their multiplication tables), can be reduced to GI, essentially by encoding the
multiplication table in the structure of the graph; see [81] for a general review of the prob-
lem. GI is also interesting as an example of a problem in NP which is neither known to be
NP-complete nor known to be in P. In this it is rather similar to the problem of Factor-
ing 2 (given a positive integer N, find its prime factors), although unlike GI, Factoring
is known to lie in NP ∩ coNP. (This is because there is a succinct proof that an integer is
prime; thus the list of factors multiplying to N, together with proofs that each is prime,
provide a succinct certificate for N.) Factoring is known to have a polynomial-time
quantum algorithm (Shor’s algorithm [92]); whether the same is true for GI is currently
an open question.
GI is one of a number of related isomorphism problems, many of which are easily
reducible to each other [67]. GA, the problem of determining whether a graph has a non-
trivial automorphism, is (polynomial-time) reducible to GI, but it is not known whether
GI is reducible to GA. However, the related counting problems #GI (find the number
of distinct permutations taking G to H) and #GA (find the number of distinct permuta-
tions taking G to itself, i.e., the order of the automorphism group of G) are polynomially
equivalent to each other and to GI. The precise definition of “graph” is not very impor-
tant to the complexity of GI: GI for (undirected) graphs, directed graphs (where edges
are ordered pairs of vertices), multigraphs (where two vertices may have more than one
edge joining them), and pseudographs (where loops, edges joining a vertex to itself, are
allowed) are all polynomially equivalent. (These equivalences can be proved by mapping
the vertices and edges of these graph variants onto gadgets, clusters of vertices and edges
of an undirected graph having the required properties. Another example of such a gadget
is shown below, in Figure 2.1.) The Subgraph Isomorphism problem SubGI (given two
2 Strictly speaking, it is the decision-problem version of Factoring (given N, k > 1, does N have a prime
factor less than k?) that is in NP. This distinction is not crucial: both the version of Factoring defined
here and GI (and many other problems in NP) have self-computable solutions: the function problems
are polynomial-time reducible to the decision problems, by a sort of binary search on the solution space.
An example of such a search for the graph isomorphism problem is given in this article.
✉
v
26
�
✉
n + 1 �� vertices
✉ ♣ ♣ ♣ ✉
�
✉ ✉
�
✉
n vertices
��
✉ ♣ ♣ ♣ ✉
�
✉ ✉
❅
❅❅
✉
�
✉ ♣ ♣ ♣
��
✉ ✉
�
j vertices
Figure 2.1: A label j (1 ≤ j ≤ n) for the vertex v, in a graph with n vertices.
graphs G = (V, E) and G ′ = (V ′ , E ′ ), determine whether there is an injective mapping
ϕ : V → V ′ such that if {e1, e2} ∈ E, {ϕ(e1), ϕ(e2)} ∈ E ′ ) is apparently harder than
GI, though: it is known to be NP-complete. 3 (Recall that a problem X in NP is called
NP-complete if any problem in NP has a polynomial-time reduction to X.)
One way of looking at the difficulty in solving GI is to think of it as a difficulty
in finding one of an exponentially large number of permutations. Consider the closely
related function problem FGI: Given two graphs G, H, return a permutation P such
that AG = P AHP −1 if G ∼ H (otherwise return an arbitrary P ). Clearly there is a
polynomial reduction from GI to FGI: we merely check the returned P to see if it is in
fact an isomorphism. There is also a polynomial reduction from FGI to GI: Suppose we
have an algorithm that determines whether two graphs G, H are isomorphic. To determine
an isomorphism with polynomially many uses of this algorithm, we inductively construct
partially labelled graphs from G and H and test these graphs for isomorphism. We label
a graph by attaching labels (gadgets made up of extra vertices and edges) to vertices in
such a way that any isomorphism must map a vertex with a label to another vertex with
an identical label; when the algorithm is complete, the isomorphism can be read directly
from the labelled graphs. It is easy to construct such a gadget. One example is given here
as Figure 2.1 (from [67]): The idea is that this set of gadgets is distinguishable from all
structures appearing in the original graph (having more vertices than the base graph) and
recognizable as the label j; any isomorphism must therefore map a label to an equivalent
label.
The algorithm is as follows:
3 This is easy to see; in particular, reduction of the well-known NP-complete problem HamiltonCycle
to SubGI is trivial.
27
BEGIN with two isomorphic graphs G, H on n vertices.
G0 := G
H0 := H
for j := 1 to n
Choose an unlabeled vertex v of Gj−1 and give it the label j.
for each unlabeled vertex w of Hj−1 do
endif
Label w with label j. [try partial matching v ↦→ w]
if Gj−1 ∼ Hj−1 then
else
Gj := Gj−1
Hj := Hj−1
G ′ := Gn
H ′ := Hn
break [exit inner loop, having found that φ : v ↦→ w]
remove label j from vertex w. [fail this partial matching]
END; the identical labels of G ′ and H ′ explicitly define φ : G φ ∼ H.
[save this partial matching and continue]
This algorithm has n = |V | steps; each step involves at most n queries to the GI subrou-
tine, on a graph with at most O(n 2 ) vertices (since each label has O(n) vertices). Thus
FGI is polynomial-time reducible to GI.
2.1.2 Classical algorithms
As mentioned above, GI seems to be hard because the space Sn of potential solutions is
so large. It is not so surprising, then, that when this space can be reduced substantially,
the problem becomes tractable. Polynomial-time algorithms are known for several types
of graphs in which some graph invariant allows this search space to be reduced: Graphs
with bounded degree [77] (the degree of a vertex v is the number of edges incident on v),
graphs with bounded eigenvalue degeneracy [10] (the eigenvalues of an undirected graph
are the (real) eigenvalues of its adjacency matrix), and graphs with bounded genus [48, 80]
(the genus of a graph is that of the minimum-genus orientable surface on which the graph
28
may be embedded with no edge crossings; for the particular case of planar graphs, a more
readable algorithm is in Kučera [68]) all have polynomial-time isomorphism algorithms.
(Typically the time bound for such results is of the form O(n ax+b ), where x is the bounded
quantity and a and b are constants.)
A simple instructive example is the algorithm for TreeGI where the graphs are re-
stricted to be trees (connected acyclic graphs). The algorithm is as follows:
BEGIN with an unlabeled tree.
Give each leaf (vertex of degree 1) the label 1.
while the tree contains unlabeled vertices do
for each unlabeled vertex v adjacent to a vertex labeled in a
previous iteration, attach the label {ℓe}, the set of all
labels of labeled vertices adjacent to v.
END with a labeled tree; the list of labels, sorted lexicographically, is a
faithful invariant for the tree.
To determine whether two trees are isomorphic, one computes each tree’s lexicographi-
cally-sorted list of labels and compares them. If the two lists are identical, the trees are
isomorphic; if the lists differ, the trees are not isomorphic. 4
This example illustrates an important concept in isomorphism problems, also men-
tioned in passing above: the idea of invariants. We now formalize this concept, since it
will be useful in later discussion. A function r(x) : X → R (where the domain X is the
presentation space, on which the isomorphism ∼ is defined, and the range R is the
representation space) is an invariant, or a canonical representation, if it is constant
on equivalence classes of X: i.e., r(x) = r(y) if x ∼ y. An invariant is complete, or
faithful, if it is constant only on equivalence classes of X: i.e., r(x) = r(y) iff x ∼ y.
A faithful invariant (often called a certificate) preserves all the information about an
object’s equivalence class, and so an efficient method of finding faithful invariants allows
4 This algorithm may be improved somewhat, if all we care about is whether G ∼ H, by noting that
the lexicographically-sorted lists must agree at each stage of labeling; so we may compare the partial lists
after each iteration of the while loop, returning not isomorphic if the lists ever differ. The algorithm is
presented as shown to give an example of an invariant for a graph.
29
a solution to the isomorphism problem. In the example above, the set of labels for the
completely-labelled tree is just such a faithful invariant. Unfortunately, no deterministic
(polynomial-time computable) certificate r is known for a general graph ([31], for example,
computes a certificate in exponential, but subfactorial, time). Succinct (polynomial-size)
certificates are known: e.g., first{H : H ∼ G}, where first X gives the element of X which
is lexicographically minimal. 5 Unfortunately, succinct certificates for graphs seem to be
difficult to compute in general.
Note that a general isomorphism problem reduces polynomially to the problem of
finding succinct certificates, but that these two problems are not necessarily equivalent:
it may be easier to determine whether two objects are isomorphic than to find a succinct
certificate for an object. It may, however, be more useful to find a succinct certificate.
For instance, consider the application of finding a match to a previously-classified graph
in a database of N graphs. If we only have an algorithm to solve the decision-problem
version of GI for a graph, we must apply it on average O(N) times to the graphs in the
database (we cannot sort the database, since we don’t have a useful ordering function) to
find a match. On the other hand, if we have an algorithm to find a succinct certificate
for a graph, we can apply it once to find a certificate G for the unknown graph, and then
compare (in average time O(|G| log N)) the certificate to the certificates in the database
(which is sorted, of course, on the certificate). Thus for large databases, finding succinct
certificates is more useful than merely determining whether two graphs are isomorphic.
It is important, in considering the quantum case, to examine carefully what is needed
for this method to work. The definition of an invariant needs no modification for the
quantum case, but we must decide what we mean by a quantum certificate: Let us call
a quantum representation (i.e., a representation by elements of a Hilbert space) faithful
if distinct objects map to orthogonal states 6 , 〈r(x) |r(y)〉 = 0 if x �∼ y. In the classical
case, of course, it is easy to compare the two certificates, and they may be copied at
5 It is easy to see that in fact all problems in P have deterministic certificates, and all problems in NP
have succinct certificates.
6 This restriction, at least, seems necessary to preserve the idea of a faithful representation; otherwise
one cannot perfectly distinguish different certificates. Further restrictions on the allowable classes of basis
states may also be useful to consider.
30
will; so the comparison routine is perfect and information-preserving. But if we have two
quantum certificates, neither of these conditions necessarily holds: I know of no efficient
ways to implement either a perfect comparison routine or a perfect copier for states chosen
from an arbitrary (but known) orthogonal set, so a database of quantum certificates is
more difficult to use than a database of classical certificates. 7 Thus, it seems that generic
quantum certificates have less power than classical certificates.
2.2 Some primitives for quantum computation
Only a few generic primitive operations seem to differentiate the currently known quantum
algorithms from classical ones. (This is not intended as a rigorous statement; it is merely
an observation that very few extra generally-interesting tools seem to be available thus
far to the quantum computer programmer.) These are the useful primitives which differ
from the tools available classically, as I see them:
(i) The simplest obviously quantum primitive is the creation of superpositions
�
|i〉 , (2.1)
N−1
√1 N
i=0
which can be implemented in time O(log N). (This primitive can be viewed as a
special case of the quantum fast Fourier transform, (iv) below, for input |0〉. We
consider it to be its own primitive, though, both because it seems more fundamental
than the QFFT, appearing in more algorithms, and because it is somewhat easier
to implement than the general QFFT.)
(ii) Since unitary evolution is necessarily reversible, there is generally no exact analog
of the classical primitive of function evaluation, x ↦→ f(x). Instead, to ensure
7 One way to perform these operations, since the orthonormal set is assumed known, is to rotate this
set into the computational basis, perform the measurement or copy operation, and then rotate back into
the set of interest. However, these rotations may be difficult to perform (because the certificates may be
difficult to construct) so this is an unsatisfactory solution. In §2.3.1 we will discuss some simple comparison
methods which are independent of the given set but which provide only a probabilistic answer.
31
reversibility we must in general preserve the input value:
U[f]
|x〉 |0〉 −→ |x〉 |f(x)〉 , (2.2)
X Y
X Y
which can be implemented efficiently whenever f is efficiently computable classically.
Because U(f) is necessarily linear, it acts linearly on superpositions; this is the
operation often pointed to as “quantum parallelism.” But although this primitive,
along with superposition creation ((i) above) is often credited with giving quantum
computers their added power, these primitives alone are insufficient; the ability
for both constructive and destructive interference between different “computational
paths” is also necessary. 8
(iii) If f is injective, then the operator mapping |x〉 to |f(x)〉 is unitary. If we want to
save space by reusing qubit registers, we need to erase the input registers containing
x after computing the outputs f(x). (If we have a state �
i |xi〉 X |f(xi)〉 Y , we can’t
merely discard the x register and prepare new qubits; such an operation would
cause the coherent superposition to decohere into a mixed state.) The quantum
implementation of such an operator seems to require, in general, an algorithm to
evaluate f −1 as well as f. (f −1 is needed to “uncompute” the input |x〉 X after
the output |f(x)〉 Y has been computed.) Thus, a bijection evaluation can be
implemented in two steps,
U[f]
U[f
|x〉 |0〉 −→ |x〉 |f(x)〉
X Y
X Y
−1 ] −1
−→ |0〉 |f(x)〉 , (2.3)
X Y
which can be implemented efficiently whenever both f and f −1 are efficiently com-
putable classically.
(iv) The quantum (fast) Fourier transform (QFFT), defined (for the simplest case
8 Exactly what characteristics of a quantum computer are responsible for its apparent power beyond
that of a classical computer is a subject of some debate. A few interesting discussions can be found in the
manifesto of the Quantum Computation Collective [30] and in articles by Steane [95] and by Jozsa [59]
and Ekert and Jozsa [42], but many other opinions are there for the asking.
of a cyclic group Zn) by
32
|x〉 QFFT
−→ 1
N−1
√
N
i=0
�
e 2πixy/N |y〉 , (2.4)
can be used to sample the eigenvalues e iφn of an operator U, with probability dis-
tribution for an eigenvalue given by the corresponding eigenstate’s overlap with the
given initial state, if the controlled-U operator
Λ(U) [A],B : |k〉 A |ψ〉 B ↦→ |k〉 A (U B )k |ψ〉 B
can be implemented. 9 For instance, if we are given an eigenstate |φ〉 for U, with
U |φ〉 = e iφ |φ〉, the unitary operator Eig(U) A,B ≡ QFFT A · Λ(U) [A],B · QFFT A
9 The basic definition of the controlled-U operator, as mentioned previously, has a two-dimensional
Hilbert space for the control register (so that U is performed when the control qubit is |1〉, in the com-
putational basis, and not performed when the control qubit is |0〉). Here we extend the definition to an
N-dimensional control-qubit Hilbert space, where the unitary operation performed is U k if the control
register is in state |k〉. This can be performed efficiently by, for example, performing the sequence of
operations
a−1 Y
Λ(U 2i
)
[Ai ],B
i=0
where Ai refers to the Hilbert space of bit i of register A.
allows estimation of φ:
|0〉 A |φ〉 B
33
QFFTA ↓ Perform the QFFT on state |0〉 A
�
|k〉 |φ〉
A B
N−1
√1 N
k=0
to create the superposition
Λ(U) [A ],B
↓ Perform U k , where k is the value in register A, on register B
�
�
N−1
√1 N
|k〉 (U
A
k=0
k ) |φ〉 =
B B 1
N−1
√
N
e
k=0
ikφ |k〉 |φ〉
A B
1
N
QFFT A ↓ Perform a second QFFT on register A
N−1 �
j=0
� N−1
�
k=0
2πj
ik(φ+
e N )
�
|j〉 |φ〉
A B
Measure register A
↓
|j〉 A |φ〉 B , with probability
�
�N−1
��
� e
�
ikθj
�2
�
�
� =
�
sin2 N
N 2 sin
k=0
2 θj
2 1
2θj , where θj ≡ φ + 2πj
N .
This probability is small unless θj is small, i.e., unless j ≈ − Nφ
2π ; so sampling j can
give a good estimate (to O( 1
N )) of φ. In particular, if Λ(U) [A],B can be efficiently
implemented (i.e., in time polynomial in n = lg N, for an n-qubit register A), then
we may estimate φ with n bits of precision in time polynomial in n. (If U N = 1,
so that the eigenvalues are Nth roots of unity, then the correlation is exact; the
bracketed sum is zero except for a single value of j.) Sampling the eigenvalue also
has the happy side effect of projecting out (approximately) an eigenstate with that
eigenvalue, so we can use the Eig(U) to sample eigenstates of U.
The QFFT allows, for example, solution of the problem of period finding, the
determination of the period r of a strictly periodic black-box function f (i.e., one
for which f(n + m) = f(n) if and only if m = kr, r ∈ Z), in time O � (log r) 2� .
(The optimal classical algorithm, for black-box f, requires average time O(r 1/2 ) and
34
worst-case time O(r).) PeriodFinding is closely related to the above eigenvalue-
estimation problem; the operator Λ(U) is replaced with operator U[f] having action
U[f] [X],Y |n〉 X |y〉 Y = |n〉 X |y + f(n) − f(0)〉 Y . Now U[f] acts as the identity on the
subspace spanned by {|kr〉 X |y〉 Y : k ∈ Z}, so (using the same argument as above)
the phases measured by the QFFT must have the form φk = 2πi k
r , for some integers
k. Large divisors of the denominator r (the desired period) can be found with
high probability, assuming the values k are uniformly distributed in {0, . . . , r − 1},
using continued-fraction expansions of the measured phases φk
2π . (If k and r have
common divisors, the continued-fraction expansion of the phase will actually find
the fraction k′
r ′ reduced to lowest form, giving only a divisor of r; sampling the result
a few times, together with guessing at small missing factors, suffices to determine
r with high probability. A detailed consideration of PeriodFinding, including
the distribution of the measured denominators r ′ , is somewhat involved; see Shor’s
pioneering papers [91, 92] for the full details. 10 ) The QFFT is quite fast: it can be
implemented in time O � (log N) 2� , compared with O(N log N) for an implementation
of the classical FFT. 11 The reason the QFFT is so fast is that the vector of complex
amplitudes is encoded directly in the phases of the elements of the superposition, so
that the O(N) complex additions are performed automatically by the linear rules of
quantum mechanics. (The Fourier transform often shows up in describing changes of
basis, e.g., between position and momentum bases, so it might not be too surprising
10 Shor’s algorithm uses a probabilistic reduction of IntegerFactoring to PeriodFinding. The over-
all goal, following a method originated by Fermat and refined by Kraitchik (and shared with many
classical factoring algorithms such as the quadratic sieve—see [26] for a review), is to find a nontriv-
ial solution of x 2 ≡ 1 (mod N); given such x, gcd(x ± 1, N) yields a nontrivial factor of N. This can
be probabilistically reduced to the problem of finding the order of an element a of the multiplicative
group modulo N, an instance of PeriodFinding: the order r of a is even with reasonable probability,
r = 2s; so a r ≡ (a s ) 2 ≡ 1 (mod N). The modular-exponentiation operator whose period we want to
find, Exp(a, N) [X],Y : |x〉 X |1〉 Y ↦→ |x〉 X |ax mod N〉 Y , is efficiently implementable (see [11] for an explicit
O ˆ (log N) 3˜ implementation). Again with reasonable probability we have a s �≡ ±1 (mod N), giving a
nontrivial factor of N. Preskill [88, §§6.9-6.12] also has a nice explanation of the QFFT and its uses in
PeriodFinding and IntegerFactoring.
11 See, e.g., [83, Chapter 5] for a detailed implementation.
35
that it has a simple implementation in quantum computation.) Note, however,
that the QFFT is less powerful than the classical FFT: after a classical FFT, the
sampled values of the transformed function F (k) are all known classically; after a
QFFT, the sampled values are represented as state amplitudes, which cannot be
measured directly. In addition, for the complexity reduction to be important there
must be an efficient—polynomial in log N—method for preparing the initial state;
in many classical applications of the FFT, the preparation algorithm is O(N) and so
improvement of the FFT to o(N) will not substantially reduce the overall complexity.
The QFFT is in fact a solution to a particular case of a more general problem,
the Abelian stabilizer problem (given an Abelian group A, a finite set X acted
on by A, and x ∈ X, find the stabilizer Ax = {g ∈ A : g(x) = x} of x in
A, e.g., by giving a basis for Ax), which is also efficiently solvable on a quantum
computer [61, 62]. Unfortunately, though, the nonabelian stabilizer problem, to
which GI easily reduces, has no known polynomial-time algorithm. (An algorithm
for GI based on this reduction, developed by Ettinger and Høyer [43], is discussed in
§2.3.3. The algorithm does not appear to be polynomial-time; one of the observables
used seems to be difficult to implement.)
(v) If a search doesn’t seem to be reducible to a period-finding problem, all is not lost.
Given an arbitrary function f : {0, . . . , N − 1} → {0, 1}, the Grover search algo-
rithm [55] (the presentation here follows Preskill [88, §6.4] in a formalism originally
due to Brassard and Høyer [24]) allows determination of a solution i to f(i) = 1,
if one exists, in O( √ N) evaluations of f, a quadratic improvement over the O(N)
evaluations of f required classically. The Grover search can be thought of as a ro-
tation of a quantum state in a two-dimensional space, spanned by the two vectors
�
|i〉
|α〉 = 1
|ω〉 =
√
N
N−1
i=0
1 √
n
�
i:f(i)=1
|i〉 , n ≡ � � {i : f(i) = 1} � � .
(2.5)
Such a rotation can be implemented as the product of reflections along the axes
defined by |α〉〈α| and |ω〉〈ω|, i.e., the product −UωUα of the unitary operators Uα =
36
1 − 2 |α〉〈α| and −Uω = 2 |ω〉〈ω| − 1 (for details see, e.g., Jozsa [60]). Uα can
be constructed if we can detect the state |α〉 (e.g., by a unitary operator U with
U |α〉|0〉 = |α〉|1〉 and U |α ⊥ 〉|0〉 = |α ⊥ 〉|0〉 for any 〈α |α ⊥ 〉 = 0); similarly Uω can be
constructed if we can detect the state |ω〉. Thus the Grover search is feasible if we
can reliably detect (on the two-dimensional Hilbert space spanned by |α〉 and |ω〉) the
initial state |α〉 and the desired final state |ω〉. In particular, these two operations
can be efficiently implemented to find a succinct certificate for any problem in NP:
using the terminology of §1.2.2, we wish to solve the equation ˆ f(x, y) = 1 for a valid
certificate y (if one exists), given a problem instance x.
The operator −UωUα rotates a state in this two-dimensional space by the angle
2θ = 2 sin −1 |〈α |ω〉| (acting trivially on the orthogonal complement of this space), so
rotating the initial state |α〉 to the final state |ω〉 with the Grover rotation (−UωUα) M
will take about M ∼ π
2|〈α|ω〉| = O� |〈α |ω〉| −1 � iterations. 12 For the simplest case of
finding the unique solution to f(i) = 1 presented above, the Grover algorithm with
�
iterations. The initial
projectors on the states (2.5) takes O � |〈α |ω〉| −1 � = O � 1
√N
state may also be some state other than |α〉, as analyzed by Brassard et al. [25] and
by Biron et al. [23, 22]; clever choice of initial state, if partial information is known
about the solution set, can reduce the time required to find a solution.
There are a number of possible generalizations to the form of Grover’s algorithm as
presented above. For instance, note that |α〉 and |ω〉 need not have the specific form
of (2.5); given any two noncommuting unitary matrices Uα, Uω, each performing
an inversion about a single axis (i.e., each having one eigenvalue −1, the remainder
being +1), the Grover iteration will perform a rotation in the plane spanned by
these two axes, so the Grover algorithm can be used to construct states |ω〉 besides
the equally-weighted superpositions of (2.5).
12 π
Unless the Grover rotation angle 2θ is commensurate with , M will not be an integer, and no integral
2
number of Grover iterations will take |α〉 exactly to the desired state |ω〉. However, for large N the state
after M iterations will have high fidelity with the state |ω〉, so measurement will give the desired result
with high probability. In addition, one can construct a modified Grover iteration with rotation angle 2λθ
for any λ ∈ [0, 1]; using one of these iterations, a rotation by any desired angle can be implemented exactly.
This technique is discussed in Høyer [58] but is not necessary for this paper.
37
In fact, Uα and Uω need not be single-axis inversions. For instance, suppose more
generally that Uα = 1 − 2Pα and Uω = 1 − 2Pω, where Pα and Pω are projectors
(idempotent Hermitian operators). In the simple case where either Pα or Pω is
still a one-dimensional projector, the situation is essentially the same as before; for
example, if Pα is one-dimensional, we can decompose Pω = �
i Pωi as a sum of one-
dimensional projectors, all but one of which are orthogonal to Pα. The remaining
one-dimensional projector Pω0 may be substituted for Pω in the Grover algorithm
with essentially no change (a single iteration −UωUα will effect a reflection along the
axes specified by Pωi with i �= 0; but a pair of Grover iterations is unaffected:
(−UωUα) 2 =
=
�
1 − 2 �
i
Pωi
�
�
(1 − 2Pα) 1 − 2 �
i
Pωi
�
(1 − 2Pα)
� �
�
� �
�
(1 − 2Pωi) (1 − 2Pα) (1 − 2Pωi) (1 − 2Pα)
i
= (1 − 2Pω0)(1 − 2Pα)(1 − 2Pω0)(1 − 2Pα)
since the orthogonal projectors commute). The case in which both projectors are
multidimensional is more complicated. In this general case it is easy to see that UαUω
still decomposes as a direct sum of rotations in orthogonal two-dimensional subspaces
(the eigenvalues of UαUω other than ±1 appear in complex-conjugate pairs). Gener-
ically there will be one such rotation for each dimension of the lower-dimensional
projector 13 ; the rotation angles will generally be incommensurate, leading to com-
plicated trajectories for generic input states. 14 Another generalization, studied by
Brassard et al. [25] and by Biron et al. [23, 22], is the removal of the requirement
that the initial state be a uniform superposition over the solution space.
As a prelude to the discussion of GI, I would like to discuss one other primitive which
does not make the list above (because it has not yet proved generically useful in the most
interesting quantum algorithms). The notion of a set, an unordered collection of elements,
is of course a very useful mathematical primitive. In classical computer programming, a
13 More precisely, in a space of dimension N with nα ≡ tr Pα and nω ≡ tr Pω, there will generically be
min{nα, nω, N − nα, N − nω} rotation angles.
14 I don’t know of any situations where this case is useful.
i
38
faithful, canonical representation of a set with polynomially many elements is trivial to
create: We merely arrange the elements in lexicographic order. This reduces a set to a
list, which is a far more natural structure to implement algorithmically. The problem
is somewhat more subtle for the quantum computer programmer: The transformation
from an unsorted list to a sorted list is not reversible (since different orderings of the
same elements map to the same sorted list) and so not unitary, so we cannot expect set
construction and manipulation operations to be trivial to implement.
Ignoring for the moment this difficulty, there are two essentially different constructions
for implementing a set. In the first method, we use the classical representation as a list
(sorted according to an arbitrary but fixed total order), or, in a pretty quantum variant,
we represent the set as a superposition over all orderings of the list of elements:
|{x1, . . . , xn}1〉 ≡ 1
√ n!
�
|xσ1 〉· · · |xσn〉 . (2.6)
This representation is always faithful (that is, distinct objects map to orthogonal states):
two sets X, Y with |X| = |Y | = n have 〈X |Y 〉 �= 0 iff X = Y , in which case 〈X |Y 〉 = 1.
(The requirement that |X| = |Y | is only necessary so that the inner product 〈X |Y 〉 is
a scalar. We could alternately “pad” the smaller set with values guaranteed not to be
in either X or Y to perform the comparison even in the case |X| �= |Y |.) Thus this
representation is in fact a certificate for the set.
σ∈Sn
A second method represents the set as a coherent superposition of its elements:
|{x1, . . . , xn}2〉 ≡ 1
√ n
n�
|xi〉 . (2.7)
This method has an advantage over the first, and over the classical representation above:
The restriction to polynomial-sized sets is no longer necessary (since we only require
enough space to hold a single element). But there is a corresponding disadvantage. This
representation is not always faithful: two sets X, Y have 〈X |Y 〉 =
i=1
|X∩Y |
√ |X|·|Y | . (Thus this
representation, although an invariant, is not a certificate for general sets. It is, however,
a certificate for classes of disjoint sets, though.) Which representation is useful in a
particular instance depends on the desired behavior and on the class of sets we wish to
39
represent. 15 Below I will present a faithful certificate for a topological graph which uses
both of these set representations.
2.3 Quantum algorithms
The simplest quantum technique to speed up a classical search algorithm is Grover’s
algorithm. Given a classical algorithm which uses an initial guess to try to find a solution
to an instance of GI and which requires O(N) guesses on average to find a correct solution,
Grover’s algorithm can reduce the average run-time (measured by the number of times
a comparison of the form H ? = σ(G) is made) quadratically, from O(N) to O( √ N). For
an instance of GI on n vertices, the solution space has size Sn = n!; thus a brute-force
search of the solution space requires time O(n!), and Grover’s algorithm reduces this time
to O �√ n! � . (Of course, for a graph with more structure, e.g., one in which not all vertices
have the same degree, the search space can be easily reduced by classical methods before
the Grover search is started.) Unfortunately, since the speedup is only quadratic, such a
search is not polynomial in n unless the size of the original classical search space is also
polynomial.
Other attempts at quantum algorithms for GI make use of extra structure in the
problem. For instance, GI reduces (as mentioned above) to the problem of finding a
succinct (classical) certificate for a topological graph G (given a graph G). We may try to
solve GI by considering the analogous quantum solution method: Suppose that we have
an algorithm producing a succinct quantum certificate � �G � given a graph G. Then we
may determine whether two graphs G, H are isomorphic by determining whether the two
quantum certificates � �G � , � �H � are identical or orthogonal. Two probabilistic algorithms to
make this measurement are presented here. (In each case, however, the given certificate
states may end up entangled with the ancilla registers, so we cannot guarantee that they
are reusable.) Later, in §2.4, we consider the problem of actually constructing such states.
15 Since the time of writing, this set primitive has found use in Watrous’ certificate for the group non-
membership problem (discussed in §2.5.1 below).
2.3.1 Solving GI given a succinct quantum certificate
40
In the first construction, we begin with the state |0〉 � �G � + |1〉 � �H � , a superposition of two
succinct certificates. (Given a quantum algorithm to construct the certificates � � G � , such
a state can be created by running the algorithm twice, conditioned on the state of the
control qubit |·〉 c .)
⎧
⎨
⎩
�
√1 |0〉
2 c
H1
�
1
2 |0〉
c
�
�G �
A + |1〉 �
�H
c
� �
A
Perform a Hadamard gate on qubit c
↓
� ��G �
A + � �H � �
� ��G
+ |1〉
A
c
�
A − � �H � ��
A
Measure qubit c
↓
|0〉 c , with probability 1, if G ∼ H, so that � G � �H � = 1
|0〉 c or |1〉 c , each with probability 1
2 , if G �∼ H, so that � G � �H � = 0
If we repeat the state preparation and measurement n times, then conclude that G ∼ H
if only |0〉s were measured and G �∼ H if at least one |1〉 was measured, our conclusion will
be in error with probability at most 2 −n (but will be certain if a |1〉 is ever measured). 16
A prettier method [63] uses the controlled swap (a series of Fredkin gates). This time
16 This construction can be thought of as a special case of the QFFT algorithm for PeriodFinding,
§2.2(iv), where the period is 1 if the graphs are isomorphic and 2 if not. Similar HΛ(U)H constructions
occur in many contexts in quantum information theory.
41
we may begin with the certificates in a product state � � G �� �H � .
|0〉 c
�
�G �
H c
A
�
�H �
B
Perform a Hadamard gate on qubit c
↓
� � �
√1 |0〉c + |1〉 �G
2
c
�
1
√ 2
� �
|0〉c �G �
H c
� ��
1
2 |0〉c �G �
⎧
⎨
⎩
A
�
�H �
B
Swap registers A, B conditioned on qubit c
↓
A
�
�H �
B + |1〉 �
�H
c
�
A
�
�G � �
B
Perform a second Hadamard gate on qubit c
↓
�
�H
A
�
B + � �H � �
�G
A
� � ��
+ |1〉c �G
B
� �
�H
A
�
B − � �H � �
�G
A
� ��
B
Measure qubit c
↓
|0〉 c , with probability 1, if G ∼ H (so � �G � = � �H � )
|0〉 c or |1〉 c , each with probability 1
2 , if G �∼ H (so � G � �H � = 0)
This construction has the same failure probability as the first and can be subjected to
the same probability amplification process. Note that this second construction (unlike
the first) only requires the states � �G � , � �H � to be equal up to phase: it works as long as
�
� � G � �H �� � = 1.
These constructions both provide only probabilistic answers. It is easy to show that
no construction can decide with probability 1 whether a given pair of states chosen from
an unknown orthonormal set. In fact, for the case of an unknown set of orthonormal
states, if we require the algorithm to return |0〉 with certainty if the states are parallel
and allow the state to return √ 1 − p |0〉+ √ p |1〉 if the states are orthogonal (as in the two
constructions above), then we must have p ≤ 1
2 : the two algorithms above are optimal.
(As mentioned previously, we may be able to do better in this case, because we know the
form of the states � �G � . It is not clear whether there is a computationally efficient method
for improving the accuracy of this state comparison for these graph certificates.)
In the constructions above, we actually only require a reasonable approximation � �� G �
to � � G � , since the methods above will succeed as long as the two probabilities
42
py ≡ P {|0〉 measured : G ∼ H}
pn ≡ P {|0〉 measured : G �∼ H}
are polynomially distinguishable, i.e., |py − pn| ≥ 1
p(n)
for some polynomial p(n), so that
with polynomially many trials we can resolve the two distributions. For the first construc-
tion described above,
P {|0〉} = 1
�
�
� � � �
��
G + �H � �� �2 �
4
= 1
�
2 1 + ℜ � G
� �
�H � �� ;
if we require that ℜ � G � � � G � > 1 − p for all G, then some simple algebra gives worst-case
bounds on � G � � � G � :
and so
ℜ � � G � � � H � ≥ 1 − 4p + 2p 2 , G ∼ H
ℜ � � G � � � H � ≤ 2 � 2p − p 2 , G �∼ H ;
|py − pn| ≥ 1
2 − � 2p − p 2 − 2p + p 2 ,
and the distributions can be efficiently resolved as long as p < 1 − 4
�
3
4
≈ 0.07 (with at
least a polynomial gap). For the second construction (in which the relative phase of the
certificates does not matter),
P {|0〉} = 1
�
�
� � ��
��
G �H � � �
+ �H � �� �
��
G � �2 �
4
= 1
�
2 1 + � � � G
� �
�H � ��
�2 �
if we require that � � � G � � � G �� � = 1 − p, the worst-case bounds on � � � G � � � G �� � are
and
�
� � � G � � � H �� � ≥ 1 − 4p + 2p 2 , G ∼ H
�
� � � G � � � H �� � ≤ 2 � 2p − p 2 , G �∼ H ;
|py − pn| ≥ 1
2 − 8p + 12p2 − 8p 3 + 2p 4 = 2(1 − p) 4 − 3
2 ,
(as long as p ≤ 1 − 1
√ �
4 3
2 2), so the distributions can again be resolved as long as p < 1 − 4
;
43
(with at least a polynomial gap). (These are worst-case estimates assuming that all the
phases conspire against us; it is likely that far worse fidelities for the states � �G � will suffice
for distinguishability in reality.)
2.3.2 Examples of succinct certificates
We now give an explicit example of a succinct certificate � �G � : Given a graph G, represented
by an n × n adjacency matrix AG, let 17
�
�G � = �
σ∈Sn
�
�
�PσAGP −1
σ
�
. (2.8)
It is clear that this state is a faithful representation of the topological graph G. A sim-
ilar construction works for other representations of the graph; the representation of the
topological graph is made canonical by summing over all possible representations of the
topological graph. For example, using the edge-list representation � (v1, w1), . . . , (ve, we) �
for a graph G = (V, E) with no isolated vertices (so that V = {v1, . . . , ve, w1, . . . , we} and
E = {(v1, w1), . . . , (ve, we)}),
�
�G � = �
�
σ∈Sn π∈Se
�
� (σvπ 1 , σwπ 1 ), . . . , (σvπe , σwπe )� , (2.9)
summing both over all permutations of vertex labels and over all orderings of the edges
in the list. Note that this construction uses both representations of sets discussed above:
The set of edges of the labelled graph is represented using the first method |{· · · }1〉, while
the set of all graphs in the topological graph’s equivalence class is represented using the
second method |{· · · }2〉. We can use the second form here because the set of topological
graphs is a partition of the set of graphs; so G ∩ H �= ∅ iff G = H.
Unfortunately, such representations seem to be difficult to construct, even approx-
imately. Several methods for constructing G (none of which seem to be efficient) are
considered in §2.4 below.
The above arguments show that there is a probabilistic polynomial-time reduction
17 We omit the normalization factors where this will not cause confusion.
44
from GI to the problem of constructing the states � � G � . In fact, it is easy to see that a
solution to GI allows construction of such states, so the reduction also goes the other
way: GI is equivalent to the problem of constructing the states � �G � . As discussed above,
a solution to FGI is polynomial-time reducible to a solution to GI, so we may assume
wlog that we have an algorithm for FGI. Suppose the algorithm can be written as a
unitary operator U[FGI] A,B,P,x with action
⎧
⎨
U[FGI] |G〉 |H〉 |0〉 |0〉 =
A,B,P,x A B P x ⎩
|G〉 A |H〉 B |0〉 P |1〉 x if G �∼ H .
|G〉 A |H〉 B |σ〉 P |0〉 x if G ∼ H (where G σ ∼ H)
(We assume that G is rigid, so that the isomorphism σ is well-defined. This assumption
is not necessary but makes the construction easier to describe.)
We can use this operator to construct the state � �G � for a given G. We begin with the
state |G〉:
|G〉 A |0〉 B |0〉 P |0〉 x
|G〉 A |0〉 B
|G〉 A
U[FGI] −1
|G〉 A
Create a superposition of all permutations in Sn, where n is
the number of vertices in G (an easy construction)
↓
� �
�
σ∈Sn
|σ〉 P
|0〉 x
Act on G with σ, putting the result σ(G) in the second
register
↓
� �
�
σ∈Sn
↓
� �
σ∈Sn
|σ(G)〉 B |σ〉 P
|0〉 x
Note that for a rigid graph G,
U[FGI] |G〉 A |σ(G)〉 B |0〉 P |0〉 x = |G〉 A |σ(G)〉 B |σ〉 P |0〉 x .
|σ(G)〉 B
�
� �
|0〉 |0〉 = |G〉 �G P x A B |0〉 P |0〉 x .
(Since this is a tensor-product state, we may discard the states |G〉 A |0〉 P |0〉 x without
affecting state of � �
�G B .) Thus a unitary algorithm implementing GI is polynomial-time
45
equivalent to creation of the succinct certificates � � G � . As is not too surprising, algorithms
that approximately solve FGI can be used to approximate � �G � : Suppose we have a unitary
operator � U[FGI] A,B,P,x satisfying, for all graphs G, H,
� �
�
�1 −
A 〈G| B 〈σ(G)| P 〈σ| x 〈0| � �
U[FGI] |G〉 |σ(G)〉 |0〉 |0〉
A,B,P,x A B P x
σ
�
�
� < ɛ
A 〈G| B 〈H| P 〈0| x 〈1| � U[FGI] A,B,P,x |G〉 A |H〉 B |0〉 P |0〉 x = 1 , G �∼ H .
Because the isomorphism σ is a certificate for FGI, we can easily check that a purported
isomorphism is actually an isomorphism; this is why we can require that the second
condition hold exactly, when G �∼ H, even for an approximation � U[FGI]. A randomized
FGI algorithm, however, may only find the isomorphism σ, if one exists, with some finite
probability; because GI is not known to lie in coNP, there is no known certificate allowing
this result to be checked. The first requirement is that, averaged over all isomorphisms
σ (and over all internal random draws used by the algorithm), the algorithm fails to
find σ only ɛ of the time. (In these cases the algorithm can be assumed to transform
|G〉 A |σ(G)〉 B |0〉 P |0〉 x to |G〉 A |σ(G)〉 B |0〉 P |1〉 x ; any other spurious results could be easily
checked and rejected.) Then 18
�
�
�
�
�A〈G| � �
G� 〈0| 〈0| ·
B P x � �
�
�
−1 1
�
U[FGI] · √n! |G〉 |σ(G)〉 |σ〉 |0〉
A,B,P,x
A B P x�
�
σ∈Sn
= 1
�
�
� �
�
n! �
〈G, ρ(G), 0, 0| ·
�ρ,σ∈Sn
� U[FGI] −1 �
�
�
· |G, σ(G), σ, 0〉 �
�
�
= 1
�
�
� �
�
n! �
〈G, σ(G), σ, 0|
�ρ,σ∈Sn
� �
�
�
U[FGI] |G, ρ(G), 0, 0〉 �
�
�
�
�
�
�
1
= �
� n! 〈G|〈σ(G)|〈σ|〈0|
σ∈Sn
� �
�
�
U[FGI] |G〉|σ(G)〉|0〉|0〉 �
�
��
�
= � 〈G|〈σ(G)|〈σ|〈0| � � �
�
U[FGI] |G〉|σ(G)〉|0〉|0〉 �
≥ 1 − ɛ ,
18 We explicitly include the normalization factors here. We concatenate the registers, |α〉|β〉 ≡ |α, β〉 so
that there aren’t so many |·〉s floating around.
σ
46
so we can use the operator � U[FGI] −1 (on the state �
|G〉|σ(G)〉|σ〉|0〉, which, as previ-
ously shown, is easily constructed) to create a good approximation � �� G � to � �G � .
This is an interesting difference from the classical case: Although GI is of course
reducible to the problem of finding a succinct certificate for a topological graph G given
the graph G, classically it is not known whether a reduction in the other direction is
possible. With a quantum computer, however, these two problems become equivalent
in complexity. [I have not seen this result explicitly noted before, although others [3]
have independently constructed succinct certificates � �G � in the same manner as described
below.] Unfortunately, as noted above, these certificates do not seem as powerful as
classical certificates; it seems to be difficult to do operations on the basis states { � � G � }.
This equivalence is also potentially interesting from the other point of view described
in the Introduction: we have shown that (a class of) quantum algorithms for GI are
equivalent in power to construction of a class of quantum states. If we can find lower
bounds on the difficulty of constructing such states, we have immediate lower bounds on
the quantum complexity, and hence on the classical complexity, of GI. (I know of no such
bounds, however.)
2.3.3 An observable solving GI
The above state-construction method defines states which are apparently difficult to con-
struct but uses operators which are easy to implement. It is possible to transfer the
difficult operations from the state construction to the operator implementation. This al-
lows a conceptual simplification: in the state-construction method, construction of a class
of states { � �G � }, one state for each distinct topological graph, is required; Ettinger and
Høyer [43] define a single observable, essentially by the direct sum of the operators over
this entire class. (It is not clear whether there is any real gain in this, though, since there
seems to be no reason to believe that their observable is any easier to implement than the
state construction.)
Before we describe the actual construction, a little terminology is in order: Consider
the graph G, the disjoint union of two connected graphs G1, G2 each on n vertices (where
the two graphs are placed “side by side”, sharing no vertices). The automorphism group
σ
47
Φ ≡ Φ(G), a subgroup of the wreath product group 19 Γ ≡ Sn ≀ S2, contains an involutive
swap, an element of the form s(φ) ≡ (φ −1 , φ, (1 2)) (i.e., a bijection between the vertices
of G1 and those of G2), if and only if G1 ∼ G2 (in which case φ is the isomorphism,
G1
φ
∼ G2). For convenience let us define Γ0 ≡ Sn ≀ {e} (� Sn × Sn) to be the index-2
normal subgroup of Γ (the subgroup Γ0 ⊳ Γ which does not swap the vertices of the Gi).
Now if G1
φ
∼ G2, then Φ is of the form Φ = Φ0 ∪ Φ0s(φ), where Φ0 ≡ Φ ∩ Γ0 is
the subgroup of Φ which does not swap the Gi; so the cosets of Φ all have the form
γΦ = Ψ0 ∪ Ψ0s(φ) (where, similarly, none of the elements of Ψ0 swap the Gi). On the
other hand, if G1 �∼ G2, then Φ = Φ0 ⊂ Γ0, and the elements of a coset γΦ either all
swap the Gi or all leave the Gi fixed. The essential point, and the main result used in
the construction, is that if G1 ∼ G2, elements of γΦ appear in pairs, γ and γs(φ); but if
G1 �∼ G2, only one of these can appear in γΦ.
Let us represent a coset γΦ using the set representation |{· · · }2〉 of (2.7), in the 2(n!) 2 -
dimensional Hilbert space H ≡ span{|γ〉 : γ ∈ Γ}:
|γΦ〉 = 1 √
|Φ|
�
|γξ〉 ; (2.10)
note that distinct coset states (of the same group Φ) are orthogonal. 20 Now if G1
19 The wreath product group Sn ≀ S2 ∼ = (Sn × Sn) ⋊ S2 is defined by the multiplication
and its action on G is induced by (α1, α2, a)v (j)
i
ξ∈Φ
(α1, α2, a)(β1, β2, b) ≡ (α1βa(1), α2βa(2), ab)
φ
∼ G2,
≡ v(a(j)) , i ′ ≡ αa(j)(i). The two Sn subgroups should be
i ′
thought of as acting on the vertices of G1 and G2, and the S2 subgroup as swapping the vertices of G1
with those of G2 using a particular fixed bijection. (Notice that the action on G is read right-to-left, as
conventional for composition of permutations.)
20 Although finding Φ is not easy—FGI, and hence GI, reduce to finding Φ—random coset states |γΦ〉
can be efficiently constructed, as follows: Begin with a superposition of all elements γ ∈ Sn ≀ S2, together
with a representation of the graph G. Act on G with the γ, creating the entangled state
X
|γ〉|G〉 ↦→ X
|γ〉|γ(G)〉 .
γ∈Γ
γ∈Γ
Now measure the second register; a measurement result of |γ0(G)〉 creates a superposition (in the first
48
then |γΦ〉 is a superposition of states of the form |k(φ, γ)〉 ≡ 1
�
�
√ |γ〉+ |γs(φ)〉 . Thus |γΦ〉
2
lies in the (n!) 2 -dimensional Hilbert subspace H1(φ) ≡ span{|k(φ, γ)〉 : γ ∈ Γ}, and
write
〈γΦ| P H1(φ) |γΦ〉 = 1 if G1
φ
∼ G2 .
(2.11)
The states |k(φ, γ)〉 are, for fixed φ, orthonormal (since s(φ) 2 = 1). Hence we may
PH1(φ) = �
Pk(φ,γ) = �
|k(φ, γ)〉〈k(φ, γ)| . (2.12)
γ∈Γ0
Let φ be an arbitrary element of Sn. If G1 �∼ G2, then for each term |γξ〉, ξ ∈ Φ, in
the superposition |γΦ〉, exactly one projector P k(φ,ζ) of this sum has nonzero projection
P k(φ,ζ) |γξ〉: namely, ζ = γξ if γ ∈ Γ0, and ζ = γξs(φ −1 ) if γ �∈ Γ0. Since 〈γξ| P k(φ,ζ) |γξ〉 =
1
2 , this implies
γ∈Γ0
〈γΦ| P H1(φ) |γΦ〉 = 1
2 if G1 �∼ G2 . (2.13)
Hence the binary measurement observable {P H1(φ), 1−P H1(φ)} (a two-valued von Neumann
measurement) acting on a random coset state |γΦ〉 can distinguish (with bounded error
probability) whether the graphs Gi are isomorphic.
Unfortunately, we don’t know φ (this is the isomorphism we’re trying to find), so we
can’t expect to implement P H1(φ). We can solve this problem by considering a projec-
tor onto a larger Hilbert space H1 = span{H1(ϕ) : ϕ ∈ Sn}. Of course we still have
〈γΦ| PH1 |γΦ〉 = 1 if G1 ∼ G2. It is difficult to find an exact value when G1 �∼ G2, but
�
we can set a trivial bound: Clearly PH1 ≤
ϕ∈Sn P H1(ϕ). For each term an argument
like the one above applies, 〈γΦ| PH1(ϕ) |γΦ〉 = 1
2 , and there are n! terms in the sum; so
〈γΦ| PH1
|γΦ〉 ≤ n!
2 .
This bound is unfortunately too trivial; we have solved one problem but created an-
other. But this problem can also be solved, to make the bound interesting. The trick
now is to consider tensor products of the coset states, |�γΦ〉 ≡ |γ1Φ〉· · · |γmΦ〉, and of the
subspaces Hm(φ) ≡ H1(φ) ⊗m (note that, as before, |�γΦ〉 ∈ Hm(φ) if G1
φ
∼ G2). Again we
register) of all γ ∈ Sn≀S2 with the same action on G. Thus we create the state |{γ : γ(G) = γ0(G)}2〉 = |γ0Φ〉
with probability |γ0Φ|
|Γ|
= |Φ|
2n! 2 , a uniform distribution over all cosets.
49
consider the projector onto the span of all these subspaces, Hm ≡ span{Hm(ϕ) : ϕ ∈ Sn}.
By the arguments above, 〈�γΦ| P Hm(φ) |�γΦ〉 = 1 if G1
φ
∼ G2, but 〈�γΦ| P Hm(φ) |�γΦ〉 = 1
2 m if
G1 �∼ G2. The difference between the expectation values of PHm has thus been amplified:
〈�γΦ| PHm |�γΦ〉 = 1 if G1 ∼ G2
〈�γΦ| PHm |�γΦ〉 ≤ n!
2 m
if G1 �∼ G2 .
(2.14)
For sufficiently large m � log 2 n!, this separation is measurable. (The point is that the
dimension of Hm, the Hilbert space spanned by the coset states of all groups Φ containing
an involutive swap, grows more slowly with m than the dimension of H ⊗m , the Hilbert
space spanned by all coset states of all subgroups of Γ:
dim Hm
dim H
2m n!
= . (2.15)
(n!) 2m 2m ⊗m ≤ (n!)2m+1
So we can make m large enough that random coset states |�γΦ〉 have small overlap with
the subspace Hm unless Φ contains an involutive swap, i.e., G1 ∼ G2.) However, there
is no obvious reason to believe that the desired measurement observable {PHm, 1 − PHm}
can be efficiently implemented.
2.4 Trying to construct � � G �
As mentioned above, I know of no efficient method of constructing the states � �G � . Here I
outline a few ideas and try to understand why they don’t work.
2.4.1 Structured Grover search
First, we can use the Grover search to construct the state. Recall that implementation of
Grover’s algorithm requires that we have efficient detectors for the initial and final state.
Here, detection of the initial state |G〉 is easy, since |G〉 is a known computational basis
state. We can also detect the final state � � G � , by detecting the +1 eigenvalues of a complete
set of generators of the group Γ being summed over 21 , since the state � �G � = �
γ∈Γ |γ(G)〉
˛
21
This actually detects not the state ˛G ¸ but the space spanned by all such states; so we are using
Grover’s algorithm with Pα = |G〉〈G| and Pω = P ˛ ¸˙ ˛
˛G G˛. Pα is one-dimensional, so (as discussed in
G
50
is a simultaneous +1 eigenstate of the operators Uσ (σ ∈ Γ) which implement the group
action Uσ |x〉 = |σ(x)〉. For convenience in the following discussion, let us assume that G
is rigid, and that we are using the adjacency-matrix representation (2.8) for the graph;
then Γ = Sn. Thus if we begin with the system in a state |G〉, application of Grover’s
algorithm will rotate the system to the state � �G � in O � | � G � �G � | −1� ∼ √ n! operations.
Of course, this is not a polynomial-time algorithm. We can try to remedy this by
using a structured Grover search [28, 45, 56], in which we use extra information
about the structure of the search space to perform a multi-stage search. For example,
suppose that in a search problem we are able to efficiently and reliably detect (in the sense
explained in §2.2(v)) not only the initial state |α〉 ≡ |φ0〉 and the final state |ω〉 ≡ |φn〉,
but also intermediate states |φ1〉, . . . , |φn−1〉, where the inner products of consecutive
terms 〈φk |φk+1〉 are large relative to 〈α |ω〉. Then we can construct the desired state
|ω〉 in n stages of Grover rotation; stage i (i ∈ {1, . . . , n}), from |φi−1〉 to |φi〉, requires
�
O |〈φi−1 |φi〉| −1�
�
��
�
applications of the operator 1 − 2 |φi−1〉〈φi−1| 1 − 2 |φi〉〈φi| , for a
total time of O
� �n−1
i=0 |〈φi |φi+1〉| −1�
.
In the optimal (but unrealistic) case, for example, the presence of a single efficiently-
detectable intermediate state |φ〉 ∝ |α〉 + |ω〉 (bisecting the angle between |α〉 and |ω〉,
so that |〈α |φ〉| = |〈φ |ω〉| ≈ 1
√ 2 ) would reduce the time required from O(|〈α |ω〉| −1 ) to
O(|〈α |φ〉| −1 + |〈φ |ω〉| −1 ) = O(1). In more realistic cases (examples of which are given in
the references listed above) the intermediate states are not in the plane spanned by |α〉
and |ω〉 but still have larger transition rates than for the unstructured search problem.
The problem structure, for instance, might allow rejection of some classes of solutions as
impossible (in the examples above, these classes represent partial solutions in a partially-
prunable search tree), giving rise to a descending chain X ≡ X0 ⊃ X1 ⊃ · · · ⊃ Xn ≡ Z
of candidate solution sets, beginning with the given domain X and ending with the set Z
of actual solutions. In this case we may define |φk〉 ≡ �
|x〉; the structured Grover
x∈Xk
search may take much less time than the unstructured search if the sets Xi efficiently
resolve X (i.e., if all the ratios
|Xi|
|Xi+1|
are about equal).
§2.2(v)) this performs precisely as if Pω were the one-dimensional projector ˛ ˛ G ¸˙ G ˛ ˛ (the one projector in
the subspace Pω which does not commute with Pα).
51
Suppose in our case, for instance, that we have a set of generators for the group of
interest, Γ = 〈γ1, . . . , γn〉. Let us set Γi = 〈γ1, . . . , γi〉 (with Γ0 ≡ {1} and Γn ≡ Γ). If
we can efficiently detect the intermediate states |Gi〉 = �
|γ(G)〉, then we can use
the structured Grover search to construct the desired state |G〉; the search will take time
�
O
. It is possible to choose the generators γi such that this time is
� �n−1
i=0 |Γi : Γi+1|
polynomial in n (for instance, using the generators of (2.23) or of (2.25) below). The
remaining question is then whether each intermediate state |Gi〉, a sum over the subgroup
Γi, can be efficiently detected, so that each may be efficiently constructed in turn from
the previous one.
One tempting idea is to use, instead of projectors onto the states |Gi〉, projectors PH
onto the states transforming as the trivial representation of some subgroup H of Γ: that is,
PH projects onto the simultaneous +1 eigenstate of all η ∈ H. PH is the projection onto
the span, over all graphs G, of all states of the form �
η∈H |η(G)〉(so, for example, P {e} ≡ 1,
and PΓ ≡ PSn ≡ �
G |G〉〈G|). Such a projector is easy to implement, since it is merely the
projection onto the simultaneous +1 eigenstate of all η ∈ H (where η acts by permutation
of the vertices of G). Unfortunately, use of these easily-implemented projectors alone does
not suffice to construct the desired state: Suppose that only such projectors are used,
so that at each stage the Grover iteration is of the form (1 − 2PHi+1 )(1 − 2PHi ). We
show below that this iteration preserves the subspaces corresponding to the irreducible
representations of the group Γ = Sn.
First let us define, for all γ ∈ Γ, the unitary operator V (γ) acting by the group action
on the graph states: V (γ) |G〉 ≡ |γG〉. The projector PH onto a subgroup H ⊂ Γ can be
written as a linear combination of V (γ):
γ∈Γi
PH = 1
�
|H| V (η) .
η∈H
(2.16)
(Clearly PH acting on a graph state prepares a state which is invariant under action by
V (η), for all η ∈ H; it is also easy to verify that (PH) 2 = PH, so PH is a projector.)
Now recall the orthonormality and completeness relations from classical representation
theory [96],
nµ
�
nΓ
γ∈Γ
nµ
�
µij
nΓ
52
D † µ(γ) k iD ν (γ) j l = δ ν µδ j
i δk l
(2.17)
D µ (γ) j iD † µ(γ ′ ) i j = δγ,γ ′ (2.18)
where D µ (γ) is the representation matrix for group element γ in the nµ-dimensional
unitary irreducible representation µ of Γ and nΓ ≡ |Γ|, and D † µ(γ) j i ≡ [D µ (γ) i j] ∗ . We can
use these relations to define an alternate basis for the space spanned by {|γG〉 : γ ∈ Γ}:
�
�φG(µ) i �
� nµ
j ≡
nΓ
�
γ∈Γ
D µ (γ) i j |γG〉 . (2.19)
The orthonormality and completeness of these basis states follows immediately from the
orthonormality (2.17) and completeness (2.18) relations above.
�
preserves the representation µ, only mixing the states
The action of V (γ) on � � φG(µ) i j
within each irreducible representation:
V (γ) � �φG(µ) i �
j =
=
�
nµ
�
D
nΓ
η∈Γ
µ (η) i j |γηG〉 =
�
nµ
�
nΓ
η∈Γ
�
nµ
� �
D µ (γ −1 ) i kD µ (γη) k j |γηG〉
= �
k
nΓ
η∈Γ
k
D µ (γ −1 ) i k
� D µ (γ −1 )D µ (γη) � ij |γηG〉
�
�φG(µ) k �
j . (2.20)
So the action of the unitary operator 1 − 2PH (one of the two reflections composing a
Grover iteration), in the representation basis, is
� � �
1 − 2PH
�φG(µ) i � �
j = �φG(µ) i � �
2
j − |H|
= � �φG(µ) i �
2
j − |H|
�
�
�
k
η∈H
D µ (η −1 ) i k
V (η) � �φG(µ) i �
j
η∈H
�
��φG(µ) k j
� , (2.21)
acting only within the representation µ. Any product of such unitary operators will thus
preserve the amplitude of the projection onto a representation subspace Pµ (the space
53
spanned by all � �φG(µ) i � � �
j ). The desired state �G is just the state |φG(1)〉 of the trivial
representation 1 (defined by D 1 (γ) ≡ 1), so the amplitude in this state is preserved
throughout the structured Grover search. (This amplitude, if we begin in the state |G〉, is
just | � G � �G � | = 1 √ =
|Γ| 1 √ .)
n!
Thus such projectors (or any projectors which can be written as linear combina-
tions of the V (γ)) do not suffice for implementing the Grover search. What we would
like is a more powerful detector which projects onto the space � �� �
�Gi Gi
� (for the given
graph G) instead of onto the span of all such states, so that the Grover iteration is
�
1−2 � �� �
�Gi+1 Gi+1�
��
1−2 � �� �
�Gi Gi�
�
. If we could efficiently implement all of these projec-
tors, then each Grover iteration in the structured search could be implemented efficiently.
�
��
(In fact, 1 − 2PHi+1 1 − 2 � �� �
�Gi Gi
� �
would have exactly the same action, as discussed
in §2.2(v).)
Unfortunately, these projectors seem to be about as difficult to implement (for i ∼ n)
as a classical solution to GI: We can decompose |Gi〉〈Gi| = PHi PΓiG, where PΓiG projects
onto the space of graphs isomorphic to G under the group Γi (i.e., onto the space spanned
by |H〉 with G φ ∼ H and φ ∈ Γi). But implementing this projector then requires being
able to solve GI for the groups Γi; for large i, |Γi| ∼ n!, and this is still believed to be a
hard problem.
2.4.2 Stabilizer formalism
An abelian stabilizer code for a quantum system is a subspace ¯ H of a Hilbert space
H characterized by its stabilizer subgroup S (the subgroup of the Pauli group 22 P ⊗n
leaving states in ¯ H invariant). The normalizer subgroup N(S), another subgroup of
P ⊗n , is the group of operators in P ⊗n commuting with all s ∈ S, and thus mapping ¯ H
onto itself; elements in the normalizer subgroup represent “encoded operations” on the
logical subspace ¯ H of a stabilizer code. S is an abelian subgroup of the Pauli group P ⊗n ,
which in turn is a discrete nonabelian subgroup of U(n). The motivation for requiring
22 The n-qubit Pauli group P ⊗n
2
is, P (i)
2
is the group generated by the single-qubit Pauli matrices and −1; that
≡ P(i) ≡ {±1 (i) , ±σ (i)
x , ±σ (i)
y , ±σ (i)
z } = 〈σ (i)
x , σ (i)
z 〉. Elements of the qubit Pauli group are thus
both Hermitian and unitary.
54
S to be an abelian group is that this allows simultaneous diagonalization of all stabilizer
operators; hence there is a simultaneous eigenbasis (for H) of all stabilizer operators. ¯ H
is just the subspace of H with all eigenvalues +1.
The theory of abelian stabilizer codes has proved useful in the development of quantum
error-correcting codes (see [52, 90, 64, 65, 94] for several examples) and is well developed
(Preskill [88, Ch. 7] and Nielsen and Chuang [83, Ch. 10] have two introductory treat-
ments). Various useful operations, including stabilizer measurements (projecting onto
the code subspace and its coset spaces), error-basis operations (operations in P ⊗n per-
muting among these subspaces), and encoded operations (operations in N(S) acting in
interesting ways within the code subspace), can be easily implemented.
The techniques used in quantum error correction can also be used in state prepara-
tion. Construction of a standard state |¯0〉 (the ¯0 representing an encoded, “logical” qubit,
possibly constructed as a superposition of several physical qubits) in an abelian stabilizer
code S can be thought of as an error-correction operation applied to an abelian stabilizer
S ′ ⊲ S having a one-dimensional code subspace (the stabilizer S ′ being generated by the
elements of the original stabilizer S together with all the logical σ (i)
¯z operators of N(S)):
For any basis {Mi} of S ′ there is an error operator Ei ∈ P ⊗n such that {Ei, Mi} = 0
and [Ei, Mj] = 0 for i �= j. (There are actually many such operators; for each stabilizer
generator Mi there is a coset EiN(S) of the normalizer which anticommutes with Mi
and commutes with the other generators. Typically the error operator used is the one
which represents the most likely error, according to some assumed error model.) The
error-correction technique for such an abelian stabilizer is to apply a measurement corre-
sponding to each of the (Hermitian) Mi in turn, and then apply the (unitary) operator Ei
iff the measurement result was −1. It is easy to see that because the stabilizer is abelian,
the result is a state in the simultaneous +1 eigenspace of all stabilizer operators, i.e., a
state in the stabilizer subspace.
For an example of why we might want to extend this theory to nonabelian groups,
consider the state � �G � defined above. All states � �G � (for graphs G on v vertices) are
simultaneous +1 eigenstates of all permutation operators Λσ defined (for the adjacency-
matrix representation of � � G � ) by its action on the computational basis:
55
Λσ |AG〉 = � �Pσ AGP −1�
σ . (2.22)
The Λσ clearly do not commute (the group {Λσ} is isomorphic to Sv), but on the subspace
symmetrized as above they all have trivial action. Of course, even in the abelian case there
are operators in U(n) which do not commute with S but which do have trivial action on
¯H; these are just the unitary operators on ¯ H which are not in the (discrete) normalizer
subgroup. But the subspace spanned by the � �G � does not seem to have a convenient
representation as an abelian stabilizer code. And although the stabilizer operators are easy
to implement (they are just permutations) the normalizer operators seem to be difficult
to implement, and the error basis operators are hard to even define.
Suppose we want to modify this technique to allow for projection into the simultaneous
+1 eigenspace of a nonabelian stabilizer group. (For concreteness, and because it is the
group of interest, let us assume that the nonabelian group is just Sn, the symmetric
group.) The important facts used above were the existence of two sets of operators: the
stabilizer generators Mi (which all commute) and the error operators Ei (which commute
with all of the Mj except that EiMi = −MiEi). Of course, in the nonabelian case our
stabilizer generators Mi do not all commute, and so operation with all the Mi in turn
(even if we can find Ei satisfying the condition above) will not generally project us into
the stabilizer subspace; but perhaps it will get us closer to it, and repeated action of all
of these operators will get us close enough to use the above algorithms.
It is not clear that all sets of generators for Sn will give the same results. We mention
here three different generating sets, each having some nice properties (stated without
proof). The first generating set contains exactly one k-cycle for each 2 ≤ k ≤ n:
{c2 ≡ (1 2), c3 ≡ (1 2 3), . . . , cn ≡ (1 · · · n)} . (2.23)
These generators are potentially interesting because of the following identity on the algebra
over Sn:
�
σ =
σ∈Sn
56
n� �k−1
(ck) i . (2.24)
k=2 i=0
Thus each element of Sn can be uniquely expressed as a product of the ck (in increasing
order of k) to some power.
The second generating set contains only 2-cycles:
{(1 2), (2 3), . . . , (n−1 n)} . (2.25)
Since 2-cycles square to the identity, operators implementing 2-cycles have eigenvalues ±1
and so are both Hermitian and unitary.
Finally, the third generating set contains only two generators:
{(1 2), (1 2 · · · n)} . (2.26)
This is a subset of the first set mentioned; it may be useful for explicit calculations.
2.5 Related problems
2.5.1 Group non-membership
The algorithms discussed above for GI have the obvious shortcoming that they have no
obvious, efficient implementation; they use an “oracle,” performing an operation that
seems to be difficult to implement on a quantum computer, for one of the computational
steps. In particular, we have no algorithm for creating the state � �G � (2.8) in polynomial
time given a graph G, or for measuring the observable (2.14) given a pair of graphs.
One way of weakening the oracle condition is to consider an untrustworthy oracle.
Specifically, we consider the standard complexity class MA introduced by Babai (defined,
e.g., in [67]) of problems which Arthur can solve if he is allowed to consult a powerful
oracle, Merlin, who can solve arbitrarily difficult problems but who may try to trick poor
Arthur by giving him a false proof. Arthur must therefore verify that Merlin’s utterances
(his certificate for the problem) are correct, as well as using the certificate to solve the
57
problem. The algorithms for GI above do not obviously fall into this class: Arthur does
not know how to completely verify the correctness of Merlin’s certificate � �G �� �H � for the
graph isomorphism problem. (He can check that the states are correctly symmetrized, and
he can also measure some graph invariants; but since no complete, efficient, deterministic
set of graph invariants is known, he cannot be certain that Merlin is not tricking him.)
Of course, if the graphs are isomorphic, Arthur can compute an isomorphism between
them and use this to check that Merlin was truthful; the problem is that Arthur cannot
easily check that Merlin was truthful when he finds that the graphs are not isomorphic.
Classically the same problem exists: Succinct, verifiable classical certificates are known
for GI but not for GNI (the problem of proving that two graphs are not isomorphic). 23
Watrous [101] has independently used many of the ideas above to find a verifiable
certificate for GNM (Group Non-Membership), the problem of proving that a particular
element g of some finite group G is not in the subgroup H = 〈h1, . . . , hk〉 with given
generators. As with the quantum certificate for GI given above, this certificate may be
difficult to construct; the major difference from the problem above is that for GNM,
Watrous has found a way to verify that the given certificate is valid. Thus GNM is in
QMA, the class of problems solvable in polynomial time on a quantum computer given a
powerful but untrustworthy oracle. (The problem GM (Group Membership) is, like GI,
in MA; if g ∈ H, Arthur need only ask for a list of generators of H whose product is g.
Merlin can’t trick Arthur, because Arthur can easily multiply the generators together to
verify that their product is g.)
Watrous’ quantum certificate for GNM (to be provided by Merlin) is the state
|H〉 = 1
�
√ |h〉 , (2.27)
|H|
h∈H
the representation |{· · · }2〉 of the set of elements of H. Given such a certificate, one can
23 This is just a restatement of the fact that GI is in NP but is not known to be in coNP.
easily check whether g ∈ H:
⎧
⎨
⎩
|0〉 c |H〉 A |g〉 B
H c
58
Perform a Hadamard gate on qubit c
↓
� �
√1 |0〉c + |1〉 |H〉A |g〉
2
c
B
Perform a left multiplication of register A by h (here stored
in register B), controlled by qubit c
↓
� �
√1 |0〉c |H〉 + |1〉 |gH〉 |g〉B
2
A c A
H c
Perform a second Hadamard gate on qubit c
↓
� � � � ��
1
2 |0〉c |H〉A + |gH〉 + |1〉c |H〉A − |gH〉 |g〉B
A
A
Measure qubit c
↓
|0〉 c , with probability 1, if g ∈ H so that gH = H
|0〉 c or |1〉 c , each with probability 1
2 , if g �∈ H so that H ∩ gH = ∅ .
(Note that this works because distinct cosets gH of a group H are disjoint.) This part of
the algorithm is quite similar to the certificate solution for GI above. The difference is that
prior to doing this measurement procedure, we can verify the validity of the certificate:
Any h ∈ H must satisfy hH = H, so the above procedure, run with h instead of g, should
always return |0〉 if the certificate is invariant under left multiplication by h. This doesn’t
mean that the certificate has to be |H〉, of course; it could, for instance, be any |G〉, where
G is any finite group containing H. But any certificate |ψ〉 which is invariant under left
multiplication by any element h ∈ H suffices to prove that g �∈ H. (If the algorithm always
returns |0〉, Arthur cannot guarantee that g ∈ H, since Merlin may be giving Arthur a
state such as |G〉 with H < G; but if it returns |0〉 and |1〉, each with probability 1
2 , then
Arthur can be certain, having verified the certificate, that in fact g �∈ H.)
So to verify the certificate |ψ〉 we should check that it is invariant under left multi-
plication by all h ∈ H. This can be done explicitly, in polynomial time, by generating
a superposition of all elements of H and using it in place of register B in the algorithm
above. A theorem of Babai [9] shows that elements in a finite group (defined by its gen-
59
erators) can be efficiently generated with a random distribution which is very close 24 to
uniform. This allows generation of a state which is very close to the uniform (incoherent)
superposition over elements of H,
B |0〉|0〉 ≡ �
h∈H
√
ph |h〉|sh〉 ≈ 1
�
√ |h〉|sh〉 ,
|H|
h∈H
where |sh〉 represents the random numbers and scratch registers used to compute h. (This
theorem does not immediately allow generation of a state | � H〉 close to |H〉, because it may
be difficult (or even impossible) to invert the map from the random numbers to the group
elements; hence each element will be correlated with some other state information that
cannot efficiently be erased—recall the discussion about bijection evaluation in §2.2(iii).)
Watrous actually uses a slightly cleaner technique than the one above to verify the
certificate, dispensing with the control register: Suppose Merlin supplies the certificate
|ψ〉 = �
x∈G cx |x〉. (We assume that Arthur can easily verify that the state is a superposi-
tion only of elements of G, so this is the most general certificate possible.) For clarity, we
assume that the Babai construction (with operator B) gives an exactly uniform superpo-
sition over elements of H; it is straightforward to work out the inequalities for the actual
state.
24 The algorithm is polynomial in log |G| and log ɛ, where we require that the probability of generating
h ∈ H satisfies
1
|H| − ɛ < ph < 1
+ ɛ.
|H|
|ψ〉 |0〉 |0〉 =
X H S �
cx |x〉 |0〉 |0〉
X H S
�
x∈G
B H,S
√ 1
|H|
cx |x〉 X
�
(B −1 ) H,S
√ 1
|H|
x∈G
60
Generate random elements of H in register H, approximately
uniformly (using register S for scratch space), using Babai’s
algorithm B
↓
�
√ 1
|h〉 |sh〉
|H|
H S
h∈H
Perform a left multiplication of register X by register H
↓
�
cx |hx〉 |h〉 |sh〉
X H S
x∈G h∈H
Uncompute the random superposition of elements (if
unentangled with register X)
↓
� �
x∈G h∈H
cx |hx〉 X (B −1 ) H,S |h〉 H |sh〉 S
Measure registers H and S; accept the certificate as a valid
certificate for H iff the measurement result is |0〉 H |0〉 S
↓
� �
√ C
cx |hx〉 〈0| 〈0| (B
|H|
X H S
x∈G h∈H
−1 ) |h〉 |sh〉 H,S H S
= C
� �
|H|
cx |hx〉
X
x∈G h∈H
for an accepting measurement, where C is a normalization constant. Note that if this
verification procedure accepts the certificate, then the certificate is placed in a uniform
superposition 25 over cosets of H; but this is exactly the condition required of the certificate,
if Arthur wants to determine whether g ∈ H.
This technique immediately gives Watrous verifiable, succinct certificates for several
related properties on finite groups. However, verifiable certificates for graph-isomorphism
problems do not seem to follow from his result.
25 The superposition is actually only approximately uniform since B only approximates a uniform super-
position.
Chapter 3
Causal Restrictions on Nonlocal Measurement
Superoperators
3.1 Introduction
61
The previous chapter contained several examples of operations on quantum systems which
can be used to solve interesting problems. But in our attempt to discover quantum algo-
rithms we may find that implementing a particular quantum operation seems difficult or
impossible. In such cases we may suspect that in fact some general principle is forbidding
implementation (or at least efficient implementation) of such an operation. We may ask
more generally, Given a quantum system, what operations on the system are physically re-
alistic? The basic formalism of quantum mechanics allows, in principle, the measurement
of any observable. But this does not provide a complete answer to our question, because
the answer may depend on other properties of the system—for example, the presence of
certain conservation laws may forbid the measurement of certain observables.
One limit on the sorts of operations we allow can arise as a consequence of symmetries
of the system. The WAY theorem [7, 103] (discussed in Peres [86, pp. 421–422]), for
example, states that it is impossible to implement exactly the von Neumann measurement
of any observable which does not commute with an additive conserved dynamical variable. 1
1 For example, since the x component of angular momentum Jx is an additive conserved quantity which
does not commute with the z component Jz, it is impossible to exactly measure Jx for a system using a
von Neumann measurement.
62
The idea is that such an operator has broken the symmetry implied by the conservation
law, and so an interaction Hamiltonian effecting such a measurement also must violate
the conservation law; only operators respecting this symmetry are measurable.
The proof is simple: Suppose (following [51]) that M is the observable (on H S )
to be measured, with orthonormal eigenstates {|m, α〉} satisfying M |m, α〉 = m |m, α〉.
(The index α accounts for any degeneracy in the eigenvalues of M.) Let U be the
measurement unitary operator on H S ⊗ H A , producing the desired ideal correlations
between the system H S and the apparatus H A while preserving the eigenstates of M:
U |m, α〉 S |0〉 A = |m, α〉 S |m, α〉 A (where |0〉 A is the initial state, and A 〈n, β |m, α〉 A ∝ δmn;
we do not require orthogonality of apparatus states corresponding to degenerate eigen-
states of M). Now suppose that Π is an additive conserved quantity, so that all allowable
operations preserve the value of Π = Π S ⊗1 A +1 S ⊗Π A . Then in particular the interaction
unitary U must preserve Π, U † ΠU = Π, and
S 〈n, β| [Π S , M] |m, α〉 S = S 〈n, β| A 〈0| [Π, M] |m, α〉 S |0〉 A
= (m − n) S 〈n, β| A 〈0| Π |m, α〉 S |0〉 A
= (m − n) S 〈n, β| A 〈0| U † ΠU |m, α〉 S |0〉 A
= (m − n) 〈n, β| 〈n, β| Π |m, α〉 |m, α〉
S A S A
�
= (m − n) S 〈n, β| ΠS |m, α〉 SA 〈n, β |m, α〉 A
�
+ 〈n, β |m, α〉 〈n, β| Π |m, α〉
S SA A A
= 0
for all m, n, α, β; hence we must have [Π, M] = 0. 2
Another simple example, demonstrating the subtlety required for a description of the
set of observables, is given by Nielsen [82]: There exist Hermitian operators which, if
measurable, could be used to compute functions which are not classically computable.
For example, the halting function h : N → {0, 1}, defined to be 1 if the Turing machine
2 Approximate measurements of observables M which do not commute with Π are still possible; how-
ever, lower bounds can be placed on the amount of distortion of the eigenstates caused by the mea-
surement operator U, as shown, e.g., in [7, 51]. That is, U cannot leave eigenstates of M unchanged:
ˆ 1 − |mα〉S S 〈mα| ˜ U |mα〉 S |0〉 A �= 0.
63
described by the integer n halts on input n, and 0 if not, was proved by Turing [97]
to be uncomputable. But we may define the “halting observable” ˆ h, for a system with
orthogonal basis states {|n〉 : n ∈ N}, as ˆ h = � ∞
n=0 h(n) |n〉〈n|.3 If such an observable
could be measured, then a measurement of h(n) could be effected, for arbitrary n ∈ N,
merely by preparing the system in the state |n〉 and measuring ˆ h. (One can, similarly,
find examples of unitary operators which allow by their implementation the determination
of uncomputable quantities.) While such an operator is an observable, and is therefore
theoretically measurable by the formal principles of quantum mechanics, it is hard to
believe that such a thing could actually exist—at any rate, ˆ h could not be implemented
on a quantum computer as a product of operators each acting on small numbers of qubits,
as the evolution of such a system is classically computable, while the evolution of a system
under ˆ h is not computable.
The requirement in special relativity that an operation preserve causality provides
another source of restrictions on the class of allowed operations; these are the restrictions
we will primarily consider in this chapter. In some situations the naïve definition of an
operator allows violations of causality (e.g., superluminal communication) and hence is not
physically realistic; so further restrictions must be placed on the class of physical operators.
Such restrictions depend, of course, on the spacetime regions on which the operators are
considered to act. For example, one can formally write down a superoperator /S on a
bipartite Hilbert space H AB ≡ H A ⊗ H B which allows communication from one half of
the Hilbert space (H A ) to the other (H B ): i.e., when a state in H A is acted on by one
of two different local operations on H A , and then by the superoperator /S, the results are
distinguishable by examining only H B . The requirements of causality imply that such
an operator is physical only if the region S A (in which we allow Alice to interact with
the Hilbert space H A ) intersects the causal past of the region S B . One of the purposes
of this paper is to consider the problem of characterizing the operations on a quantum
3 This construction is somewhat reminiscent of Chaitin’s constant
in classical computability theory.
χ =
∞X
h(n)2 −n
n=0
Alice Bob
light cone
(a)
t
64
Alice Bob
light cone
(b)
Figure 3.1: Two causal geometries with different characteristics, with information trans-
mission denoted by wavy lightlike lines. In (b) there is sufficient time for a roundtrip
transmission; in (a) there is not.
system which do not allow violations of causality. (This question has also been raised by
Aharonov, Albert, Popescu, and Vaidman [4, 5, 6, 87], who were largely concerned with
single-state nondemolition measurements. In this paper we consider a different class of
quantum operations.
It is also interesting to consider the related question of which superoperators Alice and
Bob are actually able to implement, using local operations, in such cases. In particular,
we may ask whether, for a given “causal geometry,” any operator whose generation is
not forbidden by causality considerations can actually be implemented. (For some such
geometries, as we shall see, the answer is No.)
Two examples of separated regions of spacetime are shown in Figure 3.1. In Fig-
ure 3.1(a), there is time for two one-way messages (which could consist of either classical
or quantum communication) to be sent: Alice can send a message from S A to Bob in
S B , and Bob can send a message to Alice; each receives the other’s message after sending
his own. In Figure 3.1(b), there is time for one round-trip message to be sent: Alice
can send a message to Bob, who can then send a return message to Alice after receiving
t
65
Alice’s message. The situation in (b) is, then, at least as powerful as that in (a): 4 Bob of
(b) can, after all, ignore Alice’s message until after sending his own, if he wishes. If we
allow Alice and Bob to share a quantum channel, then in fact the situation in (b) allows
performance of any unitary operation on H AB : Alice merely sends her system H A to Bob,
who performs the desired operation and sends her system back to her. 5 It seems unlikely
that all operators allowed by the geometry of (b) must also be allowed by the geometry
of (a), though; we may gain something by allowing Bob to send Alice information based
not only on his own local operations but also on her earlier communication.
3.1.1 The problem
In this paper we consider the simple geometries that result from assuming that S A and S B
are small, widely separated regions of Minkowski spacetime (i.e., with a Euclidean separa-
tion much larger than their Euclidean diameters). Figure 3.2 shows the two possible causal
structures given this assumption. In Figure 3.2(a), S A and S B can have no causal contact,
since neither intersects the other’s future light cone (they have spacelike separation); for
this geometry, an operator on H AB must not allow signals to be sent either from Alice to
Bob or from Bob to Alice. In Figure 3.2(b), S B intersects the causal future of S A , but
not vice versa (they have timelike separation); for this geometry, causality allows signals
to be sent from Alice to Bob, but not from Bob to Alice.
4 Here we ignore “complexity-based” constraints: that is, although Figure 3.1(b) seems to imply that
Bob has very little time to perform any computations after receiving a signal from Alice but before sending
a reply, we will assume that he has sufficient time to perform any desired operation on his system.
5 If we allow only a classical channel between Alice and Bob, then not all operations can be performed
even with arbitrarily many rounds of communication between Alice and Bob. For example, Alice and Bob
cannot generate shared quantum entanglement using only local operations and classical communication,
if they begin with a tensor-product state (and no shared entanglement). Even “separable” superopera-
tors (those for which the Kraus operators can all be written as tensor products, as defined in [89] and
discussed in [18, 99]) cannot all be simulated using local operations and arbitrarily many rounds of clas-
sical communication; this is proved for a specific superoperator in [18]. Below we exhibit another class of
superoperators for which the classical channel provides no aid in simulation; see Theorem 3.
Alice (S A )
(a)
t
Bob (S B )
light cone
66
Alice (S A )
t
light cone
(b)
Bob (S B )
Figure 3.2: The causal geometries considered in this paper. In (a), S A and S B are sep-
arated by a spacelike interval. In (b), the spaces have timelike separation. The dashed
lines represent the regions on which the superoperator is to act.
Causality
If an operator allows no communication in either direction, from Alice to Bob or from
Bob to Alice (as must be the case, for any physical operator, in the situation shown in
Figure 3.2(a)), we will say that the operator is causal. If it allows no communication
from Bob to Alice (as in Figure 3.2(b)), we say it is 1-causal or semicausal (more pre-
cisely, 1-causal from Bob to Alice); thus a causal operator is 1-causal “in both directions.”
Operators which can be written as tensor products of local operators clearly satisfy this
limitation, but there exist causal operators which are not of this form—for example, the
superoperator projecting onto the Bell basis is also a causal operator (a consequence of
Theorem 1 below).
Nonlocal operators which allow superluminal signalling if implemented on a space-
like section of spacetime (e.g., as suggested in Figure 3.2(a)) are easily found, how-
ever. For example, consider the superoperator which causes decoherence onto the basis
{|Φ + 〉, |Φ − 〉, |01〉, |10〉} (where Alice and Bob each have one qubit of the two-qubit system).
Suppose Alice and Bob begin with their system in the state |00〉. If the superoperator acts
on this state, Alice’s density matrix is then σ(1) = 1
21 (the entire system is an incoherent
67
superposition of the two Bell states |Φ ± 〉). But suppose that before the superoperator
acts, Bob applies the unitary operator σx to his qubit, changing the state to |01〉. If the
superoperator acts on this state instead, Alice’s density matrix is σ(σx) = |0〉〈0|, a pure
state. 6
These two results are different and hence distinguishable: Bob can send a message
to Alice. If Alice and Bob are far apart and the superoperator is viewed as acting on a
spacelike slice, this allows Bob to send a superluminal signal to Alice; such superoperators
must therefore be viewed as nonphysical. (Such superoperators may be considered physical
if they are sufficiently “smeared out,” acting on a timescale at least as long as (c −1 times)
the spacelike separation between Alice and Bob, so that no superluminal signalling can
occur.)
Let /S be a superoperator acting on the Hilbert space H AB ≡ H A ⊗H B , where dim H A =
dim H B = d. We wish to find conditions on /S for which Bob can send a superluminal
signal to Alice. Alice and Bob may, of course, have ancillary systems H R and H S (which
may initially be entangled or classically correlated, since such correlations may have been
created at some time in the past), on which /S acts trivially. 7 We can state the semicausality
condition simply, as follows:
Semicausality condition Signalling from Bob to Alice is possible utilizing superoperator
/S, and /S is therefore not 1-causal, iff there exist an initial pure state 8 |ψ〉 ARBS and a
superoperator B on H BS such that
tr BS
� � �
/S |ψ〉〈ψ|
AB
�
� �
�= tr /S (1AR ⊗ B )(|ψ〉〈ψ|)
BS AB
BS ��
, (3.1)
so that the two density matrices for Alice’s system H AR are different and hence dis-
tinguishable (by local actions of Alice).
6 We use the notation σ(B) throughout to denote the final value of Alice’s density matrix, after Bob has
applied the local operator B to his system, and after the global superoperator /S has acted.
7 It turns out that adding ancillary systems doesn’t change the class of causal measurements, at least
in the special case discussed in Theorem 1.
8 The requirement that the initial state be pure is no restriction: If signalling is possible with an initial
density matrix ρ AB , it must also be possible for one of the pure states in any decomposition of ρ AB .
Localizability
68
A closely related condition on a superoperator is that it be localizable: implementable
using only local operations (performable by Alice alone or by Bob alone) and previously-
shared resources. Alice and Bob, using their local quantum computers in the geometry
of Figure 3.2(a) and possibly using pre-shared entanglement, can effect an operator on
H AB iff it is localizable. 9 If Bob’s Hilbert space intersects the causal future of Alice’s,
as in Figure 3.2(b), it is also reasonable to allow one-way communication from Alice to
Bob; we call these superoperators 1-localizable (or semilocalizable). 10 (Of course, if
Alice just sends her state to Bob then Bob can perform any superoperator he wants, but
Alice ends up without her quantum system; we require that at the end of the operation
H A remains with Alice and H B with Bob.) Restricting the communications to a classical
channel does not tighten this class, as long as we allow Alice and Bob to share Bell pairs
as part of their pre-shared resources: Alice may use the Bell pairs and classical channel to
teleport quantum information to Bob. Since we are primarily interested in placing bounds
on what is possible, in this paper we will mostly confine our attention to the broad case
of quantum channels.
(We note in passing that all superoperators are “2-localizable” (in the sense of Fig-
ure 3.1(b)) when a round-trip quantum channel, or its simulation via teleportation, is
available. However, as shown by Lo and Popescu [76], two-way classical communication is
no more powerful than one-way classical communication in terms of entanglement manip-
ulation. Bennett et al. [18] also demonstrated a measurement operator on a bipartite state
which cannot be implemented using only local operations and classical communication.)
9 We distinguish localizable and bilocal superoperators: A localizable superoperator on HA ⊗ H B is one
which can be simulated as a bilocal operation /S AR ⊗/S BS on a Hilbert space H AR ⊗H BS : the original Hilbert
space augmented with a possibly-entangled ancilla system H R ⊗ H S .
10 In order to keep the notation from becoming too cumbersome, the term “1-localizable” and its earlier
analogue “1-causal” leave the Hilbert space labels implicit. Throughout this paper I have tried to maintain
a convention consistent with that suggested by Figure 3.2(b) and the example above: That is, a 1-causal
superoperator /S AB does not allow Bob to send a signal to Alice (though it may allow Alice to send a signal
to Bob); a 1-localizable superoperator /S AB is one that can be implemented with use of a one-way quantum
channel from Alice to Bob, with no communication from Bob to Alice.
69
We can immediately see the following relations on these four classes of superoperators:
{causal superoperators} � {1-causal superoperators}
�
{localizable superoperators} � {1-localizable superoperators}
The remaining two questions about the inclusion relations (the questions about whether
the two vertical inclusions are strict) can be stated as two conjectures:
Conjecture 1 The set of localizable superoperators is exactly the set of causal superoper-
ators.
Conjecture 2 (DiVincenzo [39]) The set of 1-localizable superoperators is exactly the
set of 1-causal superoperators.
DiVincenzo’s conjecture holds for the special case of complete projective superoperators, as
proved and discussed at length in §3.2 below; the general case was not proved here. After
publication of these results (in [12]), the general case was proved by Eggeling, Schlinge-
�
mann, and Werner [40]; thus DiVincenzo’s conjecture is true.
However, as will be shown below, the first conjecture is false. This indicates that some
further physical constraint, besides causality, is required to tighten the set of operations
which are not forbidden until it agrees with the set of operations which we can actually do.
Preskill has shown [12] that the CHSH and Cirel’son inequalities can be used to further
constrain the class of operations allowed by physical principles, for spacelike-separated
systems. It is unclear whether restrictions of this type suffice to reduce the class of
“allowed” superoperators to exactly the set of localizable superoperators.
11 That is (using our convention), the class of superoperators which can be implemented with a one-way
quantum channel from Alice to Bob is contained in the class of superoperators which do not allow signalling
from Bob to Alice.
11
70
3.2 Complete projective superoperators
Let us restrict our attention to a special case: Suppose /S is a complete von Neumann
measurement superoperator on H AB , 12
/S(ρ) ≡ �
EaρEa , (3.2)
a
where the {Ea ≡ |φa〉〈φa|} are a complete set of one-dimensional projectors on H AB (so
E † a = Ea, EaEb = δabEb, tr Ea = 1, and �
a Ea = 1 AB ). Let us define σa ≡ tr B Ea, Alice’s
density matrices for the basis states |φa〉.
3.2.1 Causality and 1-causality
For this special case, we can give a complete, intuitive characterization of the measure-
ments which allow signalling:
Theorem 1 A complete von Neumann measurement superoperator /S AB with (orthogonal,
one-dimensional) projectors {Ea ≡ |φa〉〈φa|}, having partial traces σa ≡ tr B Ea, is
1-causal (i.e., cannot be used for signalling from Bob to Alice) iff for all σa and σb,
either σa = σb or σaσb = 0.
We will say that Bob can cause a transition a→b (i.e., from state |φa〉 to state |φb〉) if, with
the system H AB initially in the pure state |φa〉, Bob can perform some local superoperator
B BS such that after /S acts (and Alice and Bob throw away their ancilla systems), there is
a nonzero probability for the system to be in the final state |φb〉, i.e.,
Pa→b ≡ tr RS
�
� � �
〈φb| /S B |φa〉 〈φa| ⊗ ρ
AB BS
AB AB RS
� �
|φb〉 �= 0 , (3.3)
where ρ RS is the state of the ancilla system shared between Alice and Bob. Intuitively,
the meaning of the theorem is clear: Bob can only induce transitions between subspaces
which have nontrivial overlap on Alice’s system (hence the requirement that σaσb �= 0).
12 We think of the measurement being implemented “by the environment”; the measurement result is
not told to Alice or Bob. That is, this superoperator causes decoherence, or “dephasing,” onto the basis
{|φa〉}.
71
But Bob can use such a transition to send a signal to Alice iff the transition is between
two subspaces that Alice can distinguish (hence the requirement that σa �= σb).
The proof of this theorem divides naturally into two parts, which we state as lem-
mas. First we find the conditions under which Bob can induce a transition a→b (namely,
precisely when Alice’s density matrices σa and σb have nontrivial overlap):
Lemma 1 Bob can induce a transition a→b iff σaσb �= 0. Furthermore, when such a
transition is allowed, he can always induce the transition with a local unitary operator
U acting only on H B —no ancillary system H RS is needed.
Proof of Lemma 1:
Write the Schmidt decompositions of /S’s eigenstates |φa〉 as
|φa〉 ≡ AB � �
λai |a, i〉 |a, i〉 σa = A B �
λai |a, i〉 〈a, i| ,
A A
i
where all λai > 0 (the sum implicitly runs only over the nonzero terms in the
Schmidt decomposition) and A 〈a, i |a, j〉 A = B 〈a, i |a, j〉 B = δij. So
σaσb = �
λaiλbj 〈a, i |b, j〉 |a, i〉 〈b, j|
A A A A
i,j
and σaσb = 0 iff all A 〈a, i |b, j〉 A = 0 (i.e., the two subspaces are orthogonal). But
local action by Bob (on H BS ) cannot change Alice’s density matrix. In particular,
if the subspaces of σa and σb are orthogonal, then for any local operator X BS (not
necessarily Hermitian or trace-preserving) on Bob’s system,
AB 〈φa| X |φb〉 =
BS AB �
λaiλbj 〈a, i |b, j〉 〈a, i| X |b, j〉
A AB BS B
ij
= 0 .
Let Bob’s superoperator B BS (ρ) ≡ �
c BcρBc † . /S will project onto the subspace
Eb with probability
Pa→b = tr RS AB 〈φb| B BS
� �
|φa〉 〈φa| ⊗ ρ |φb〉 ;
AB AB RS
AB
i
then if σaσb = 0,
72
� �
λbiλajλakλbl 〈b, i |a, j〉 〈a, k |b, l〉 A AA A
Pa→b = trRS cijkl
� �
〈b, i| Bc |a, j〉BB 〈a, k| ⊗ ρ Bc B RS
† |b, l〉
B
= 0 .
On the other hand, if σaσb �= 0 so that (say) A 〈a, 1 |b, 1〉 A �= 0, then we can just
define B to be a unitary operator U on HB which rotates the basis � �
|a, i〉 B into
the basis � �
|b, j〉 B , with appropriate phases so that the sum is necessarily positive,
then Pa→b > 0, and Bob may induce the transition a→b with nonzero probability.
(We may assume without loss of generality that A 〈a, i |b, i〉 A ≥ 0 by moving all
phases to the definition of the |a, i〉 B , a freedom of the Schmidt decomposition; for
this choice of basis we then define U |a, i〉 B = |b, i〉 B . The sum now consists only of
nonnegative terms, at least one of which is positive.) �
Next we show that a transition a→b allows signalling from Bob to Alice precisely if
Alice’s density matrices σa and σb differ.
Lemma 2 Signalling from Bob to Alice via the nonlocal superoperator /S AB is possible iff
Bob can induce a transition a→b with σa �= σb. Furthermore, if such a signalling
transition can be induced, it can be induced via a local unitary operator U acting only
on H B ; again, no ancillary system is needed.
Proof of Lemma 2:
First we show that if Bob can only cause transitions a→b for which σa = σb,
then Alice’s final density matrix is unaffected by any local operation performed by
Bob. Since the |φa〉 define an orthonormal basis for H AB , we may write the initial
state as |ψ〉 ARBS ≡ �
a αa |φa〉 AB |χa〉 RS (where the |χa〉 need not be orthogonal).
Suppose Bob performs the superoperator B BS (with Kraus operators {Bd}, acting
on his Hilbert space H BS ) prior to the action of the environment’s measurement
superoperator /S. Alice’s final density matrix, as a function of Bob’s operation B BS ,
is then
σ(B) ≡
�
tr /S B BS BS
�
= tr /S
BS
B
BS
= tr
BS
�
�
= tr S
� �
abcd
abcd
� |ψ〉〈ψ| � �
� �
ab
73
αaα ∗
b |φa〉
��
〈φb| ⊗ |χa〉 〈χa|
ABAB RS RS
α a α ∗
b (E c ) AB (B d ) BS |φa〉 AB |χa〉 RS AB 〈φb| RS 〈χb| (B †
d ) BS (E c ) AB
α a α ∗
b
�
〈φc| Bd |φa〉|χa〉〈φb|〈χb| B †
d |φc〉
�
�
σc
(in the final line we have dropped the subspace labels for compactness). But
each term in this sum has factors 〈φc| Bd |φa〉 and 〈φc| Bd |φb〉 ∗ , which can only be
nonzero if the transition probabilities Pa→c and Pb→c (respectively) are nonzero;
hence a term in the sum is nonzero only if both transitions a→c and b→c are
allowed. Then by assumption, a nonzero term in the sum has σa = σb = σc, and
� �
�
σ(B) = tr S
abcd
αaα ∗
�
b 〈φc| Bd |φa〉|χa〉〈φb|〈χb| Bd † �
|φc〉 σb
Now we may use, successively, the identities �
c |φc〉 AB AB 〈φc| = 1, �
d Bd † Bd = 1,
and AB 〈φb |φa〉 AB = δab to reduce this sum 13 to
σ(B) = �
a
|αa| 2 (σa) B ⊗ tr S
� |χa〉 RS RS 〈χa| � ,
which is independent of B. So Alice can learn nothing by any measurement on H BS
about whether Bob performed B, and no signalling from Bob to Alice is possible.
Suppose, on the other hand, that Bob can cause a transition a→b with σa �= σb.
We show that signalling from Bob to Alice is possible in this case, by showing that
there is a particular initial state |φ0〉from which he can, by a local unitary operation
on H B alone, affect Alice’s final density matrix and hence send Alice a signal.
Suppose Alice and Bob begin with their system in the state |φa〉. After Bob
13 It is awkward to write these transformations in bra-ket notation; the presence of the ancilla system
H RS prevents the two conjugate factors from being true scalars, though the ancilla’s presence does not
prevent us from commuting these identities through. Writing the equation in tensor notation may help
convince you if you don’t believe it.
.
�
performs a unitary U and the superoperator /S acts, Alice’s density matrix is
σ(U) =
�
� �
tr Eb 1 ⊗ U |φa〉〈φa| 1 ⊗ (U B
A B A
b
† �
�
) Eb
B
= � �
〈φb| 1 ⊗ U |φa〉〈φa| 1 ⊗ (U
A B A † �
) |φb〉
B
b
= � �
b
74
� 〈φb| 1 A ⊗ U B |φa〉 � � 2 σb . (3.4)
(In particular, if Bob performs the identity operation, then Alice’s final density
matrix is just σ(1) = σa.) So σ(U) is a convex sum of elements of the reachable
set Sa ≡ {σb : σbσa �= 0} of density matrices which have nonzero transition
amplitudes from σa. We know by Lemma 1 that all such transitions are possible
(i.e., for any σ ∈ Sa there is some U for which σ’s coefficient is nonzero).
Suppose that there is a subspace Ea for which σa is extremal in Sa, i.e., for
which σa cannot be expressed as a convex sum of elements in Sa � {σa}; then σa
can be written, as a convex combination of elements of Sa, only as the trivial sum
σa = σa. But if Sa contains at least two distinct elements (|Sa| > 1; we will call
such an Sa nontrivial), then by Lemma 1 there is a local unitary operator U on
H A inducing a transition to another σb ∈ Sa (with σb �= σa). For such a U, Alice’s
final density matrix σ(U) is a convex combination (3.4) of elements of Sa whose
coefficient of σb is nonzero. But σ(1) = σa cannot be written in this form; hence
σ(U) �= σ(1), and signalling is possible.
since
Now we must show that such an extremal σa exists. Note that if σa ∈ Sb, then
0 �= σbσa = (σ †
a σ†
b )† = (σ a σ b ) † ,
σb ∈ Sa as well; that is, if Bob can induce the transition a→b, he can also induce
the transition b→a. So if Sa is nontrivial and σb ∈ Sa, then Sb is nontrivial as well.
Now consider among all nontrivial reachable sets the set S0 having maximal tr σ 2 0
(i.e., tr σ 2 0 ≥ tr σ2 a for all a with nontrivial Sa). For all σa ∈ S0, Sa is nontrivial
(since σ0, σa ∈ Sa), so necessarily tr σ 2 0 ≥ tr σ2 a for all σa ∈ S0.
Suppose that σ0 is not extremal in S0. Then σ0 is a nontrivial convex combi-
σb
above.
nation of elements of S0,
75
σ0 = �
σa∈S0
paσa
(where pa ≥ 0 and �
a pa = 1), containing at least two nonzero terms. Hence
tr σ 2 0 < �
σa∈S0
pk tr σ 2 a ≤ max tr σ
σa∈S0
2 a = tr σ 2 0 ,
a contradiction. (The first inequality follows from Corollary IX.4.4 in Bhatia [21],
using the trace norm | · | ≡ tr | · |. 14 ) Hence such a σ0 is in fact extremal in S0.
Now if Alice and Bob begin in the state |φ0〉, and Bob performs either the
identity 1 or a unitary U inducing a (nontrivial) transition 0→a, where σa ∈
S0 � {σ0} (i.e., U satisfies 〈φa| 1 A ⊗ U B |φ0〉 �= 0; U may be chosen, e.g., via
Lemma 1), Alice’s two possible density matrices have tr σ(1) 2 > tr σ(U) 2 and so
σ(1) �= σ(U), and signalling from Bob to Alice is possible. �
Now we may prove Theorem 1 as a straightforward application of the two lemmas
Proof of Theorem 1:
By Lemma 2, a superoperator /S is 1-causal iff all allowed transitions a→b have
σa = σb. But by Lemma 1, a transition a→b is allowed unless σaσb = 0. Thus /S is
1-causal iff we have either σaσb = 0 (transition forbidden) or σa = σb (transition
doesn’t allow signalling). This completes the proof of Theorem 1. �
We can massage these conditions on the σa into a more interesting form. Let Pa project
onto the da-dimensional support subspace of σa, and define d A ≡ dim H A , d B ≡ dim H B .
Then we find
d 1
B
=
�
tr Eb B
=
b
�
�
σb =
b
�
Pad B = Pa
b
σb
b:σb=σa
14 It is also straightforward to prove this directly, by showing that
tr(pX + qY ) 2 ≤ p tr X 2 + q tr Y 2
σb = kσa
(where p, q > 0 and p + q = 1) with equality iff X = Y , and generalizing with induction.
76
(where k ≡ � � {b : σb = σa} � � ), since the σb which are not equal to σa are orthogonal to
σa. Taking the trace of both sides, and using tr σb = 1, we find that there are k = dad B
projectors Eb with σb = σa and thus that σa = 1
da Pa: the |φa〉 are da-dimensional Bell
states, and the σa are completely mixed states on their support subspaces.
(In general, we say that a bipartite state is a d-dimensional Bell state if it has Schmidt
decomposition 1
√ d
�d i=1 |i〉 A |i〉 B , where the |i〉 A and |i〉 B are orthonormal sets of vectors on
H A and H B . A Bell basis for a d × d-dimensional Hilbert space is an orthonormal
basis all of whose elements are d-dimensional—that is, full-rank—Bell states. The famous
standard Bell basis, for example, is the basis {|Φ ± 〉, |Ψ ± 〉}, with
�
�Φ ±� ≡ 1
�
� �
√ |00〉± |11〉
�Ψ
2
±� ≡ 1
�
�
√ |01〉± |10〉 ,
2
on a 2 × 2-dimensional Hilbert space.)
A causal superoperator is 1-causal in both directions, so such a superoperator induces
partitions {Pa} and {Qb} of both Alice’s and Bob’s Hilbert spaces H A and H B . Suppose
that not all the Pa and Qb have the same dimension d; then there must exist particular
Pa and Qb with m ≡ tr Pa �= tr Qb ≡ n. But now, since the |φi〉 form a basis for H AB ,
there must be mn states in {|φi〉} with support only on Pa × Qb. Such a state |φ〉 has (as
we saw in the discussion above)
tr B |φ〉〈φ| = 1
m Pa and tr A |φ〉〈φ| = 1
n Qb ;
but these density matrices have different Schmidt ranks, which is not possible. So all the
Pa and Qb must have the same dimension d (which, therefore, must divide both d A and
d B ), and we have:
Corollary 1 A causal, complete von Neumann measurement superoperator on H A ⊗ H B
partitions the space into d × d-dimensional subspaces, each of which contains d 2 pro-
jectors onto rank-d Bell states.
That is, on each d × d subspace, the superoperator acts by decoherence onto some Bell
basis for that subspace.
77
3.2.2 Localizability and 1-localizability
Thus we have a complete and convenient way to characterize those complete von Neumann
measurement superoperators which are causal or 1-causal. What can we say about the
localizable and 1-localizable superoperators?
Tensor-product measurement superoperators (of the form A A ⊗ B B , where both A A
and B B are measurement superoperators) are clearly localizable. 15 Less obvious is that
projections onto some Bell bases are also localizable. The following superoperator equiv-
alence demonstrates how this is done for the standard Bell basis {|Φ ± 〉, |Ψ ± 〉} in 2 × 2
dimensions; the extension to d × d dimensions is analogous. 16
/S Bell(ρ) = E Φ +ρE Φ + + E Φ −ρE Φ − + E Ψ +ρE Ψ + + E Ψ −ρE Ψ −
= 1
4 [ρ + (σx ⊗ σx)ρ(σx ⊗ σx) + (σy ⊗ σy)ρ(σy ⊗ σy) + (σz ⊗ σz)ρ(σz ⊗ σz)] .
(3.5)
Thus to implement this superoperator, Alice and Bob can use a pre-shared random number
or pre-shared entanglement to decide which of 1, σx, σy, σz (each with probability 1
4 ) to act
with. (Such a construction is not always possible; below we demonstrate that not all Bell-
basis measurement superoperators are localizable.) Note that this Bell basis “measurement
superoperator” differs from a measurement in the Bell basis; such a measurement of course
cannot be performed locally, since it is an entangling measurement.
This superoperator, causing decoherence in a Bell basis, can also be viewed as a special
15 Although the “measurement” (decoherence) of the causal superoperator is considered to be performed
by the environment, so that neither Alice nor Bob necessarily has any information about the particular
final state of the system (i.e., which projective result occurred), we will often speak loosely of simulating
such a measurement by actions which could result in Alice or Bob gaining some information about the
measurement’s result. In such a situation, rather than measure the system, Alice and Bob should correlate
it with an ancillary quantum system (postponing all measurements until the end of the computation, in
the usual way); at the end of the operation, instead of measuring the ancilla, it should just be discarded.
Similarly, if (as in the next example) a superoperator has a realization by bilocal Kraus operators, Alice
and Bob could gain information about which Kraus operator was implemented by reading their random
numbers; these should also be encoded in an ancilla and discarded after use. (Of course, in either case the
ensemble average of the resulting density matrix, over many such simulations, will agree with /S(ρ).)
16 This is closely related to the “twirling” operation of [19, 18]. I am grateful to David DiVincenzo for
pointing this out.
78
case of a superoperator which projects onto the stabilizer subspaces (eigenspaces) of an
arbitrary qudit stabilizer code: For a stabilizer subgroup S = {Sj : j = 1, . . . , n}, an
abelian subgroup of the Pauli group, 17 the superoperator
/S S (ρ) = 1
n
�
j
S j ρS †
j
(3.6)
is localizable (with the obvious extension of the definition to multipartite systems), since
elements of the Pauli group are tensor products of local operators; the technique is the
same as above. (This technique can be applied more generally; any superoperator which
can be written as a random local unitary superoperator,
/S(ρ) = �
i
piU i ρU †
i , U i ≡ (V i ) A ⊗ (W i ) B
�
, pi = 1 ,
is localizable. I do not know of an easy way of determining in general whether a given
superoperator has such a decomposition, though.)
A simple calculation shows that
Lemma 3 /S S , the superoperator implemented by applying a random unitary operator cho-
sen uniformly from the stabilizer group S < P ⊗n
d
into the stabilizer subspaces of S.
Proof of Lemma 3:
i
(as in (3.6)), induces decoherence
Let {|φi〉} be a simultaneous eigenbasis of all Sj ∈ S, with Sj |φa〉 = vaj |φa〉
(vaj ∈ {1, ω, . . . , ω d−1 }), and define the syndromes 18 va ≡ (vaj). Because S is a
17 We define the Pauli group Pd in U(d) as the group generated by the generalizations X and Z of the
Pauli matrices σx and σz, defined by their actions on the computational basis: X |a〉 ≡ |(a + 1) mod d〉 and
Z |a〉 ≡ ω a |a〉, where ω is a primitive dth root of unity. Pd has order d 3 ; as a projective group (ignoring
the U(1) phase factors) it has order d 2 . On a Hilbert space of n qudits, we define the Pauli group as P ⊗n
d .
(This group is commonly used in analyses of quantum error-correcting codes on d-dimensional Hilbert
spaces. There is a large body of work on this subject; see, e.g., [53, 64, 90] for more details.) There is
also extensive literature describing the theory of stabilizer codes; for example, see Gottesman’s readable
papers on binary [52] and nonbinary [53] stabilizer codes.
18 The syndromes vaj for a set of generators of S uniquely define the syndromes for the remaining group
elements, since S is abelian; in quantum error correction it is common to consider the syndromes over a
generating set. But for the proof it is useful to consider the syndromes of all group elements.
79
group, if va �= vb (so that |φa〉 and |φb〉 are in different stabilizer subspaces) then
va · vb ≡ �
c v∗
ac v bc = 0: Suppose va �= vb so that for some k, vak �= vbk. Then
va · vb |φb〉〈φa| = �
c
= �
c
S c |φb〉〈φa| S †
c
= �
c
(S c S k ) |φb〉〈φa| (S c S k ) †
(just a reordering of the sum, since S is a group)
ScSk |φb〉〈φa| S †
kS† c
0 = (1 − v ∗
ak v bk )va · vb |φb〉〈φa|
0 = va · vb .
So an initial pure state �
a αa |φa〉 decoheres to
� �
/S
ab
α ∗
a α b |φb〉〈φa|
�
= 1
n
= �
ab
= �
ab
�
jab
= v∗
ak v bk va · vb |φb〉〈φa|
α ∗
aαbSj |φb〉〈φa| S †
j
α ∗
a α b
� �
1
n
j
v ∗
ajv �
bj |φb〉〈φa|
α ∗
aα va · vb
b
|φb〉〈φa| :
n
the |φb〉〈φa| matrix element vanishes if |φa〉 and |φb〉 belong to different stabilizer
subspaces and is unaffected if they belong to the same stabilizer subspace. This is
precisely the operation of decoherence into the set of stabilizer subspaces. �
Measurement of stabilizer operators (e.g., the syndrome measurement for a stabilizer code)
is a 1-localizable operation, using the usual method of measuring a stabilizer operator
with one ancilla qudit, as long as the stabilizer generators are localizable. Alice does an
operation between her qudit and the ancilla and then passes the ancilla to Bob, who does
an operation on his qudit and the ancilla and then measures the ancilla qubit, allowing
Bob (but not, in general, Alice) to know the syndrome. We will discuss this further below.
3.2.3 Causal, nonlocalizable superoperators
However, not all causal measurement superoperators are localizable. Counterexamples
exist for d A × d B = 4 × 4. In order to demonstrate this, the following theorem will be
useful:
Theorem 2 Let /S be a localizable superoperator on a bipartite Hilbert space H ≡ H A ⊗H B ,
80
and suppose that |ψ〉, A ⊗ 1 |ψ〉, and 1 ⊗ B |ψ〉 are all eigenstates of /S (where A and B
are unitary operators on H A , H B respectively). Then A ⊗ B |ψ〉 is also an eigenstate
of /S.
(We say that a state |ψ〉 ∈ H is an eigenstate of superoperator /S iff |ψ〉〈ψ| is an eigenstate
(necessarily having eigenvalue 1) of /S. We will also say that an eigenstate of an operator
(or superoperator) is stabilized by that operator.) This result makes sense intuitively; it
means that local action by Alice, A ⊗ 1, and local action by Bob, 1 ⊗ B, can’t affect each
other.
The superoperators of interest to us here, derived from localizable protocols, can be
taken to have the form
/S(ρ) = tr RS /S ′ � ρ AB ⊗ |χ〉 RS RS 〈χ| �
for some localizable superoperator /S ′ = A AR ⊗ B BS and an ancilla state |χ〉 RS . An easy
lemma characterizing the eigenstates of /S ′ will be useful in further discussion of localizable
superoperators.
Lemma 4 A state |ψ〉 is an eigenstate of a superoperator /S of the form
/S(ρ) = tr R (/S ′ ) AR
iff for each Kraus operator M ′ i of /S′ we have
� |ψ〉A A 〈ψ| ⊗ |χ〉 R R 〈χ| �
M ′ i |ψ〉 A |χ〉 R = |ψ〉 A |χi〉 R
for some (not necessarily normalized) state |χi〉.
Proof of Lemma 4:
Suppose |ψ〉 is an eigenstate of such a superoperator. Then
|ψ〉〈ψ| = /S(|ψ〉〈ψ|) = tr R (/S ′ ) AR
= tr R
�
i
M ′ i |ψ〉 A |χ〉 RA 〈ψ| R 〈χ| (M ′ i) † .
� |ψ〉A A 〈ψ| ⊗ |χ〉 R R 〈χ| �
81
Define Pψ ≡ |ψ〉〈ψ| and P ⊥ ψ ≡ 1 − Pψ. Then
Hence all
so M ′ i |ψ〉 A |χ〉 R
0 = 〈ψ| P ⊥ ψ |ψ〉 = tr P ⊥ ψ |ψ〉〈ψ| = tr P ⊥ ψ /S(|ψ〉〈ψ|)
= �
i
= � ��
��
�
i
A 〈ψ| R 〈χ| (M ′ i) † P ⊥ ψ M ′ i |ψ〉 A |χ〉 R
�P ⊥ ψ M ′ i |ψ〉 A |χ〉 R
� P ⊥ ψ
�
� � ′
M A i
��
��
��
2
.
AR |ψ〉 A |χ〉 = 0,
R
has no component orthogonal to |ψ〉, and
M ′ i |ψ〉 A |χ〉 R = |ψ〉 A |χi〉 R
for some (possibly unnormalized) state |χi〉. (The converse is obvious.) �
In particular, we can take H R to be one-dimensional to see that |ψ〉 must be an eigenstate
of each Kraus operator Mi of /S.
With this result in hand we can prove Theorem 2:
Proof of Theorem 2:
Our first step is to write formally what we mean by a “localizable superoperator.”
We will account for entanglement (or classical randomness) previously shared be-
tween Alice and Bob by adjoining to the system a fixed pure state 19 |χ〉 RS in an
ancillary Hilbert space H RS ≡ H R ⊗ H S shared between them; Alice can operate
only on H AR and Bob can operate only on H BS . The action of /S on H AB can be
taken to consist of the action of a bilocal superoperator A AR ⊗ B BS on the aug-
mented bipartite system H AR ⊗ H BS , following which the ancillary system H RS is
discarded:
/S(ρ) = tr RS
�� �
AAR ⊗ B (ρAB ⊗ |χ〉 〈χ|)
BS
RS RS �
�� �
= tr
R
′
S
′ U
AR
′ ⊗ V
BS
′ (ρAB ⊗ � �χ ′�
R ′ S ′ R ′ S ′
� ′
χ � �) � . (3.7)
19 There is no loss of generality in restricting the ancilla to a pure state; it is easy to see that if a state
|ψ〉 is an eigenstate when a mixed-state ancilla ρ RS is used, then it must also be an eigenstate for each pure
state in any decomposition of ρ RS .
82
To get this second form, in which U and V are local unitary (super)operators and
|χ ′ 〉 ≡ |χ〉|0 · · · 0〉, we use the unitary representation [88, §3.2] of an arbitrary super-
operator (extending H R and H S as necessary by adding the ancilla states |0 · · · 0〉
required for the unitary simulations of the superoperators A and B). Henceforth
we will drop the primes on the ancillary system labels, and consider /S to consist
of a bilocal unitary superoperator U AR ⊗ V BS followed by a trace over the ancilla
space H RS .
Now the unitary superoperator U AR ⊗ V BS has a single Kraus operator, U AR ⊗
V BS . So by Lemma 4, a state |φi〉 ∈ H AB is an eigenstate of /S if and only if
for some |χi〉 RS ∈ H RS . 20
U AR V BS |φi〉 AB |χ〉 RS = |φi〉 AB |χi〉 RS
(3.8)
Now suppose that |ψ〉, A ⊗ 1 |ψ〉, and 1 ⊗ B |ψ〉 are all eigenstates of /S, so that
U V
AR BS
�
|ψ〉 |χ〉
AB RS
�
=
�
|ψ〉 |χ1〉
AB RS
�
U V A |ψ〉 |χ〉 AR BS A AB RS
�
�
= A |ψ〉 |χA〉 A AB
RS
�
�
U V AR BS B |ψ〉 B AB |χ〉 RS
= B |ψ〉
B AB
|χB〉 .
RS
The commutator ∆ AR of U AR V BS and A A has a particularly simple form when
acting on A A |ψ〉 AB |χ1〉 RS :
∆ AR
�
�
A |ψ〉 |χ1〉
A AB RS
=
�
U A U
AR A †
AR A†
�
A U V |ψ〉 |χ〉
A A AR BS AB RS
= U AR V BS A A |ψ〉 AB |χ〉 RS
= A A |ψ〉 AB |χA〉 RS
so ∆ AR acts on A A |ψ〉 AB |χ1〉 RS merely by changing the state of the ancilla sys-
tem, 21 from |χ1〉 to |χA〉. Thus we can write
�
�
A |ψ〉 |χA〉 = ∆ A AB RS AR A |ψ〉 |χ1〉 A AB RS = (1 ⊗ R )A |ψ〉 |χ1〉 A R A AB RS
20 Although the proof of this theorem does not require a characterization of the |χi〉, it is worth noting
that if /S is a complete projective measurement superoperator with (orthogonal) eigenstates |φi〉, then
necessarily 〈χi |χj〉 = δij.
21 Note that this is not an operator equation; it does not necessarily follow that ∆AR ≡ 1 A ⊗ R R .
83
for some unitary R R on H R . (The content of this statement is that |χA〉 RS =
R R |χ1〉 RS for some unitary operator R R acting trivially on H S , and nontrivially
only on H R .) Thus
U AR V BS A A |ψ〉 AB |χ〉 RS = A A R R U AR V BS |ψ〉 AB |χ〉 RS
U AR A A |ψ〉 AB |χ〉 RS = A A R R U AR |ψ〉 AB |χ〉 RS .
Similarly we can commute U AR V BS through B B at the cost of a unitary operator
S S acting only on H S :
So
V BS B B |ψ〉 AB |χ〉 RS = B B S S V BS |ψ〉 AB |χ〉 RS .
�
�
U V A B |ψ〉
AR BS A B AB
|χ〉 RS
= (V BS B B )(U AR A A ) |ψ〉 AB |χ〉 RS
= (V BS B B )(A A R R U AR ) |ψ〉 AB |χ〉 RS
= (A A R R U AR )(V BS B B ) |ψ〉 AB |χ〉 RS
= (A A R R U AR )(B B S S V BS ) |ψ〉 AB |χ〉 RS
= (A A B B )(R R S S )(U AR V BS ) |ψ〉 AB |χ〉 RS
= (A B )(R S ) |ψ〉 |χ1〉
A B R S AB RS
�
��
�
= A B |ψ〉 A B AB R S |χ1〉 R S RS ;
thus A ⊗ B |ψ〉 also has the form (3.8) and is also an eigenstate of /S. �
Example 1
With this result in hand, we discuss several cases of causal measurements which are not
localizable. First, a simple example will demonstrate the sort of difficulties encountered
in trying to bilocally simulate a causal operator. Consider a causal measurement super-
operator /S in 4 × 4 dimensions, with its 16 eigenstates consisting of four “Bell subbases,”
one rotated relative to the rest by a local unitary operator Υ B ∈ U(2) (acting nontrivially
84
only on the subspace spanned by |2〉 B and |3〉 B ):
Bob (H B )
� �
� ±
φ 00 ,
� �
� ±
φ 02 ,
Alice (H A )
� � � �
� ±
ψ �φ ±
00
20 ,
�
�ψ ± 02
� �
ΥB � ±
φ 22
� �
� ±
ψ 20
� �
, ΥB �ψ ± �
22
(3.9)
where we define |φ ± 1
ij 〉 ≡ √ (|i, j〉 ± |i + 1, j + 1〉) and |ψ
2 ± 1
ij 〉 ≡ √ (|i, j + 1〉 ± |i + 1, j〉).
2
(The boxes indicate the division of H AB into its four 2 × 2-dimensional Bell subbases.)
Theorem 1 immediately tells us that such a superoperator is causal.
�
But suppose that � �φ + � �
00 , �φ + �
20 = (X2 )A ⊗ 1 �φ B
+ � �
00 , and �φ + �
02 = 1A ⊗ (X2 ) �φ B
+ 00
�
� are all
eigenstates of a localizable superoperator /S ′ (where X |i〉 = |(i + 1) mod 4〉, an element of
� �
must also be an eigenstate
P4). Then Theorem 2 tells us that � �φ + �
22 = (X2 )A⊗(X 2 ) �φ B
+ 00
of /S ′ ; thus /S is not localizable unless Υ B merely permutes the Bell subbasis { � � φ ± 22
Corollary 2 Not all causal operators are localizable.
� � �
, �ψ ±
22 }.
Intuitively, if a causal measurement consists of several disjoint d × d-dimensional sub-
spaces of Bell-basis projectors, the projectors on each subspace may use Bell states in a
different basis: each subspace has complete freedom in the choice of Schmidt basis. For
generic choice of Schmidt bases, Alice and Bob cannot locally determine what the mea-
surement basis should be, so they are unable to implement the measurement superoperator
bilocally, even with shared entanglement.
/S is 1-localizable, however: Alice can make a partial measurement, with projectors
P01 = |0〉〈0| + |1〉〈1|, P23 = |2〉〈2| + |3〉〈3| (allowing her to determine which column the
state is in) and send the measurement result to Bob; then Bob, after making the anal-
ogous partial measurement to determine which row the state is in, knows what the Bell
measurement basis should be. Then Alice and Bob can apply the superoperator (3.5) on
the appropriate subspace, as determined by their measurement results; if Bob determines
that the state was in the P23 ⊗P23 subspace, he must rotate his basis by (Υ † ) B first. (This
construction is further generalized below.) Implementation of this superoperator requires
only a classical channel from Alice to Bob; they do not need to use any nonlocal quantum
resources.
Bell bases
85
In fact, not even all superoperators projecting onto a single Bell basis are localizable.
As noted above, the superoperator projecting onto the standard Bell basis in d ×
d dimensions, consisting of states |φab〉 ≡ X a Z b |Φ +
d
1
〉, where |Φ+ d 〉 ≡ √
d
�d−1 i=0 |i〉|i〉, is
localizable. However, in some dimensions there are other Bell bases inequivalent to the
standard basis under local unitary transformations; by mixing these inequivalent Bell bases
we can find a new Bell basis for which the measurement superoperator is not localizable.
Some examples in 4 × 4 dimensions are discussed below. 22
It will be convenient in further discussion of Bell bases to encode the basis information
as a set of operators, so that we need not explicitly work with the states |φi〉. Let us
characterize a Bell basis {|φi〉} in a d × d-dimensional Hilbert space H AB by the set W =
{Wi} of d2 unitary operators on HA with |φi〉 ≡ Wi ⊗ 1 |Φ +
d 〉. (Thus for the standard
Bell basis, we may take W = � X a Z b : a, b ∈ {0, . . . , d − 1} � .) The |φi〉 are orthonormal,
〈φi |φj〉 = δij, iff the Wi are orthonormal with respect to the Hilbert-Schmidt inner product
tr W †
i W j = d δij . (3.10)
Since we are interested here not strictly in Bell bases, but in projections onto these bases,
we will not concern ourselves with the U(1) phase of the Wi. (That is, we are interested in
U(1) cosets of elements of U(d): elements of the projective group U(d)�U(1) ≡ � {e iφ U :
φ ∈ [0, 2π)} : U ∈ U(d) � .) It will also be convenient to locally define the computational
bases in such a way that 1 ∈ W. We will call such a Bell basis normal.
The following well-known result will be useful in working further with Bell states:
Lemma 5 Let X and Y be arbitrary linear operators (X, Y ∈ GL(d, C)). Then
if and only if
X A ⊗ 1 B |Φ +
d 〉 AB = 1 A ⊗ Y B |Φ+
d 〉 AB
Y = X T ,
22 I don’t know whether there are examples in 3 × 3 dimensions.
86
where the transpose (a basis-dependent operation) is taken relative to the computational
bases (the Schmidt bases of |Φ +
d 〉 AB ).
Proof of Lemma 5:
We have
X A ⊗ 1 B |Φ +
d 〉 AB
1 A ⊗ Y B |Φ +
d 〉 AB
�
= X |j〉 |j〉 = A A B � � �
〈i| X |j〉 |i〉A |j〉 B
j
�
= |i〉 Y |i〉 =
A B B � � �
〈j| Y |i〉 |i〉A |j〉 ,
B
i
which are clearly equal if and only if 〈i| X |j〉 = 〈j| Y |i〉: that is, if Y = X T . �
Now we can derive a useful operator form of the stabilizer equation
S |φ〉 = e iχ |φ〉 ,
where |φ〉 is a full-rank Bell state with |φ〉 ≡ W ⊗ 1 |Φ +
d 〉, so that
SW |Φ +
d 〉 = eiχW |Φ +
d
Of particular interest to us will be the case where S is a bilocal unitary operator S AB =
U A ⊗ V B on H A ⊗ H B and W ∈ U(d) is a local unitary operator on H A . In this case we
have
which (via Lemma 5) implies
UW ⊗ V |Φ +
〉 .
ij
ij
d 〉 = eiχW ⊗ 1 |Φ +
d
V T = e iχ W † U † W .
If S also stabilizes |Φ +
d 〉 itself, S |Φ+ d 〉 = eiχ0 +
|Φ d
then we also have
V T = e iχ0 U † .
〉 ,
〉 (e.g., if S stabilizes a normal Bell basis),
We may take χ0 = 0 wlog by a local change of basis; this merely redefines the phases of
all the eigenstates of S, and these phases affect the projectors trivially. Thus we may take
V T = U † , and hence S = U ⊗ U ∗ (we call this the normal form for S), and arrive at our
desired stabilizer criterion
We will say that such a W is stabilized by S (and by U).
UW = e iχ W U . (3.11)
We make note now of the simple facts (which we will use later) that
87
Lemma 6 If a bilocal stabilizer (in its normal form, S = U ⊗ U ∗ ) stabilizes both of the
local unitary operators W and W ′ , with eigenvalues eiχ and eiχ′ , then S stabilizes their
product W W ′ , with eigenvalue e i(χ+χ′ ) ; and the operator inverse W −1 , with eigenvalue
e−iχ . Furthermore, if eiχ = eiχ′ , then any operator of the form αW + α ′ W ′ is also
stabilized by S, also with eigenvalue e iχ .
Proof of Lemma 6:
(Trivial.) �
Now suppose 1, U, V ∈ W, and suppose that superoperator /S induces decoherence in
the basis W; then the states |Φ + 〉, U ⊗1 |Φ + 〉, and V ⊗1 |Φ + 〉 = 1⊗V T |Φ + 〉 are eigenstates
of /S. If /S is localizable, then Theorem 2 applies, and (U ⊗1)(1⊗V T ) |Φ + 〉 = U ⊗V T |Φ + 〉 =
UV ⊗ 1 |Φ + 〉 is also an eigenstate of the superoperator. This means that if decoherence
onto the normal basis W ∋ U, V can be simulated by a localizable superoperator, then
for some phase φ, e iφ UV ∈ W—that is, the set U(1)W ≡ {e iφ W : φ ∈ [0, 2π), W ∈ W} is
closed under operator composition as long as 1 ∈ W. (This, along with the continuity of
/S, implies that U(1)W forms a group.)
Note that local rotations R and local basis transformations B on Alice’s Hilbert space
transform the Wi by Wi ↦→ RWiB † ; so local operations can ensure that in fact 1 ∈ W,
and Alice can choose a basis (or make a local unitary rotation) so that U(1)W is closed.
(In general, W forms a coset of a projective subgroup of U(d).) The condition that the
basis of d 2 operators {Wi} is projectively closed (i.e., up to phase) is equivalent to Knill’s
condition in [64] that an error basis be nice. We are primarily interested here in the
projectors |φi〉〈φi| and not the states |φi〉, so in fact the phases of the Wi are not relevant.
We may (partially) define this phase by the condition det Wi = 1 and restrict ourselves
(wlog) to Knill’s very nice bases.
Example 2
88
Now consider the superoperator /S in 4 × 4 dimensions projecting onto the (normal) Bell
basis
W = { 1 , Z , Z 2 , Z 3 ,
X , X Z , X Z 2 , X Z 3 ,
X 2 , X 2 Z , X 2 Z 2 , X 2 Z 3 ,
X 3 , X 3 Z1, X 3 Z2, X 3 Z1Z2 } ,
(3.12)
where X |a〉 ≡ |(a + 1) mod 4〉, Z |a〉 ≡ i a |a〉, Z1 |a〉 ≡ (−1) a |a〉 (so Z1 ≡ Z 2 ), and Z2 |a〉 ≡
(−1) ⌊a/2⌋ |a〉. (That is, X and Z are elements of the four-dimensional Pauli group P4, and
Z1 and Z2 are elements of two independent two-dimensional Pauli groups P2. This basis,
loosely speaking, is a mixture of the basis P4 and the basis P2 × P2.) It is easy to verify
that this set gives an orthogonal basis—that is, that the Wi ∈ W are orthonormal in
the Hilbert-Schmidt norm (3.10). However, W is not projectively closed under operator
composition: 23 The product X · X 2 Z = X 3 Z, for example, is not proportional to any
element of W. Hence /S is not localizable.
This superoperator is, however, 1-localizable (when we allow quantum communication
from Alice to Bob). Note that the basis states are all eigenstates of the stabilizer subgroup
with generators Z ⊗ Z −1 and X 2 ⊗ X 2 : the operators Wi all satisfy the stabilization
condition (3.11), both with U = Z and with U = X 2 . (The states in a standard Bell basis
are all eigenstates of the stabilizer subgroup generated by Z ⊗ Z −1 and X ⊗ X −1 . X and
Z2 do not satisfy (3.11), but X 2 and Z2 do.) Alice and Bob can measure these stabilizer
generators 1-localizably. 24 For example, to measure Z ⊗ Z −1 , Alice and Bob can perform
the unitary operation Λ(X −1 ) [B],R Λ(X) [A],R (where H R
sends to Bob after performing the Λ(X) [A],R
is an ancilla system which Alice
operation), after which Bob can perform
Λ(X −1 ) [B],R and measure H R in the computational basis to determine the value of Z⊗Z−1 .
Measurement of these stabilizer generators, a partial “syndrome measurement,” effects a
23 Thus more generally, normal bases which are not projectively closed (in the terminology of Knill [64],
bases which are not nice) are not localizable.
24 More precisely, Alice and Bob can perform a 1-localizable operation, after which Bob knows the
measurement result. The theory of stabilizer codes provides efficient algorithms for performing 1-local
measurements of stabilizer operators.
89
projection into one of eight two-dimensional subspaces, partitioning the set W above into
eight two-element subsets, tabulated here with the stabilizer generators’ eigenvalues as
shown:
X 2 ⊗ X 2 :
+1 −1
+1 {1, Z 2 } {Z, Z 3 }
Z ⊗ Z −1 : +i {X, XZ 2 } {XZ, XZ 3 }
−1 {X 2 , X 2 Z 2 } {X 2 Z, X 2 Z 3 }
−i {X 3 , X 3 Z 2 } {X 3 Z2, X 3 Z 2 Z2}
(3.13)
(the boxed subspace is the one causing all the problems). Now the remaining stabilizer
generator, which will remove the degeneracies of these eight subspaces and thus complete
the projection, depends on the subspace—if it didn’t, the operator would be localizable.
It can be written as X ⊗ X for seven of the eight subspaces; for the remaining (boxed)
subspace, with syndrome (X 2 ⊗ X 2 , Z ⊗ Z −1 ) = (−1, −i), the generator can be written
as X ⊗ Z2XZ2. Now Alice and Bob can perform the (localizable) projection onto the
subspaces of this stabilizer, using the construction of (3.6) above: Alice and Bob will
use a shared random number c, and Alice will always perform X c ; Bob, who does know
the result of the original measurement, can decide whether to perform Z2X c Z2 (for the
(−1, −i) subspace) or X c (for all others). 25 It’s important here that Alice’s stabilizer
generator X is the same for each of the subspaces, since she doesn’t know the results of
the projective measurements of the first two stabilizer generators and hence doesn’t know
what subspace the state is in.
For the superoperator in the first example, we exhibited a 1-localizable simulation
which used only classical resources. But any 1-localizable simulation of the superoperator
of Example 2 requires an “effective quantum channel” (that is, either a quantum channel
or a simulation of one using teleportation; thus, nonlocal quantum resources are required).
We can prove this using the following result:
Theorem 3 Suppose /S is a 1-localizable superoperator on H AB which can be implemented
25 Alternatively, they can perform a 1-localizable measurement of this stabilizer generator using the same
technique that they used to measure X 2 ⊗ X 2 and Z ⊗ Z † , allowing Bob to measure the eigenvalue of this
generator as well.
90
when the only nonlocal resources are classical (classical channel and classical shared
random bits), and suppose that /S has an eigenstate |φ〉 of Schmidt rank d A ≡ dim H A .
Then /S is localizable; the classical channel adds no power.
So the set of 1-localizable superoperators is strictly larger than the set of superoperators
which can be simulated when the only nonlocal resources available (i.e., the one-way
channel and the previously shared state) are classical resources. (The reason this argument
doesn’t apply to Example 1 is that it requires that the eigenstate |φ〉 be a full-rank Bell
state. The eigenstates (3.9) for Example 1 are Bell states, but not full-rank Bell states:
they have rank 2, not 4.)
Proof of Theorem 3:
Suppose we have only a one-way classical channel available (and no shared entan-
glement!). Now the most general operation that Alice and Bob can implement can
be written as a generalized measurement superoperator (POVM) by Alice, with
the measurement result sent to Bob, who performs a superoperator B i conditioned
on Alice’s measurement result. 26 We have left one possibility out here. The defini-
tion of a POVM does not specify the operator’s action on the Hilbert space being
measured (here H A ), and there are many ways that a POVM could be implemented
as the action of a superoperator on H AM (where H M is the space used to hold the
result of the POVM, initially in the state |0〉 M ). Clearly, however, these all differ
only by local actions of Alice, applied after her superoperator implementation of
the POVM, and so these will not affect whether the operator is 1-localizable.
We consider Alice’s POVM, with generalized projectors Fi = A †
i Ai , to act with
operator elements A i , and we define the Kraus operators B ij of Bob’s conditional
superoperator B i . Thus the most general operation satisfying these hypotheses
26 With a one-way classical channel, Alice can generate random bits as part of the POVM and send them
to Bob as part of her measurement result, so we need not explicitly include shared random bits in the
protocol; this is why H R does not appear here.
can be taken to have the form
/S(ρ) = �
where �
j B†
ij B ij
ij
= tr S
91
� � � � †
(Ai) (Bij) ρAB (Ai) (Bij)
A B
A B
� �
i
� ��
(Ai) (Vi) ρAB ⊗ |0〉 〈0|
A BS
S S �� � �
†
(Ai) (Vi)
A BS
= 1 for all i, �
i A†
i A i = 1, and Vi is a unitary implementation
on H BS for superoperator B i (implicitly extending H S as we did in the proof of
Theorem 2). An eigenstate |φ〉 of /S, by Lemma 4, is an eigenstate of each Kraus
operator (Ai) A (Vi) BS of /S:
(Ai) A (Vi) BS |φ〉 AB |0〉 S = αi |φ〉 AB |χi〉 S .
Now we can put restrictions on the allowed POVM operators Ai that Alice can
use: Suppose one of the eigenstates |φ〉 of /S is a rank-d A Bell state. 27 Then
AiA †
i = Ai1A† i
= d A (A i ) A tr BS
�
(V †
i ) BS (V i ) BS
� |φ〉AB AB 〈φ| |0〉 S S 〈0| ��
(A †
i ) A
��(Ai) ��
= d tr (Vi) |φ〉AB 〈φ| |0〉 〈0|
A BS
A BS
AB S S �� � �
†
(Ai) (Vi)
A BS
= d A tr BS
= |αi| 2 1
�
|αi| 2 �
|φ〉 〈φ| ⊗ |χi〉 〈χi|
AB AB S S
and hence, for a finite-dimensional Hilbert space H A , we may conclude that Ui ≡
1
αi Ai is unitary. 28 Thus /S has the form
� �
/S(ρ) = tr |αi|
S
2� ��
(Ui) (Vi) ρAB ⊗ |0〉 〈0|
A BS
S S �� � �
†
(Ui) (Vi)
A BS
i
But in this form, we see that /S can be simulated using random bilocal unitary
operations: with probability |αi| 2 , Alice applies Ui and Bob applies Vi. Such a su-
peroperator is localizable, not just 1-localizable (cf. the discussion following (3.6)).
27 By Theorem 1, of course, since /S is 1-causal, this implies that all of the eigenstates are rank-dA Bell
states.
28 If HA is infinite-dimensional, we can’t immediately conclude that 1
α i Ai is unitary; we can say only
that it is an isometry. I have not investigated this case further.
,
.
92
Thus Theorem 2 applies, and a one-way classical channel doesn’t allow implemen-
tation of any nonlocalizable Bell-basis measurement superoperators. �
This example illustrates a technique for simulating 1-localizable projective superoper-
ators which is more general than the stabilizer-decoherence technique of (3.6). The idea
is to recursively decompose a space into stabilizer subspaces. With a one-way quantum
channel, decoherence into stabilizer subspaces can be implemented directly, by ancilla-
based measurements of each of the stabilizer generators Si ≡ Ai ⊗ Bi in turn, instead
of by random bilocal unitary operators as in (3.6). Bob learns the eigenvalue si of each
stabilizer generator Si (the “syndrome”) and so implements the projection onto the sub-
spaces, as desired. But Bob can do something more general than this: Since, before the
measurement of the ith operator, Bob already knows the previous measurement results
si ≡ (s1, . . . , si−1), Bob can choose his unitary operator based on these measurement re-
sults; the ith operator measured by Alice and Bob can thus have the form Ssi ≡ Ai ⊗ Bsi .
(Alice’s operators Ai are still not allowed to depend on the measurement record si, because
she cannot in general know the result of the syndrome measurement; only Bob knows si.)
Instead of n bilocal stabilizer generators forming a commutative subgroup of U(d),
the stabilizers on different subspaces need not be the same. More precisely, consider a
tree of bilocal unitary stabilizer operators Si,si = Ai ⊗ Bi,si , where si ≡ (s1, . . . , si−1)
indicates the record of measurements made so far. (That is, si is the result measured
for the operator Si,si if the previous measurement results are si. Bob’s operators Bi,si
are allowed to depend on si, because he knows the measurement record; Alice’s are not,
because she cannot in general know the result of the syndrome measurement.)
For the above construction to work, the stabilizer operators Si,si
should have the
following characteristics: Each route from the root of the tree to a leaf should define
a stabilizer code (i.e., each operator in such a path should commute with all the other
operators along the path). The stabilizer subspace defined by a partial set of measurements
sk ≡ (s1, . . . , sk) (the measurement results of stabilizers S1, . . . , S k(s1,...,sk−1)) is precisely
the span of some subset Vsk of the set of eigenstates V = {|φi〉}. Furthermore, the stabilizer
S (k+1)sk
partitions Vsk by its measurement result. The stabilizer subspaces at the lowest
(leaf) levels have dimension at most 1.
Example 3
93
But for this “recursive decomposition” into stabilizer subspaces, we still need to be able
to find at least one nontrivial bilocal stabilizer element, in order that the “syndrome
tree” actually branches. It turns out that we can’t always do this; there are causal,
complete von Neumann measurement superoperators whose measurement bases cannot
be partitioned by any bilocal stabilizer. To show this we prove a result on the necessary
structure of eigenstates of a bilocal stabilizer operator.
Lemma 7 Suppose a unitary operator U ∈ U(d) stabilizes the linearly-independent uni-
tary operators P, P ′ ∈ U(d) (in the sense of (3.11)), with eigenvalues e iχ and e iχ′
respectively. Then either P and P ′ are orthogonal (i.e., tr P † P ′ = 0), or they have the
same eigenvalue (eiχ = eiχ′ ) and span a two-dimensional eigenspace of U.
That is, if |φ〉 ≡ (P ⊗ 1) |Φ +
d 〉 and |φ′ 〉 ≡ (P ′ ⊗ 1) |Φ +
d 〉 are both stabilized by S = U ⊗ U ∗ ,
then either 〈φ |φ ′ 〉 = 0, or |φ〉 and |φ ′ 〉 have the same eigenvalue and span a two-dimensional
eigenspace.
Proof of Lemma 7:
The proof is straightforward. We have
by assumption. Now we have
UP = e iχ P U
UP ′ = e iχ′
P ′ U
tr(P † P ′ ) = tr(P † U † UP ′ ) = tr � (UP ) † (UP ′ ) �
= e i(χ′ −χ) tr � (P U) † (P ′ U) � = e i(χ ′ −χ) tr(U † P † P ′ U)
= e i(χ′ −χ) † ′ †
tr(P P UU ) =
i(χ
e ′ −χ) † ′
tr(P P )
0 =
�
e i(χ′ �
−χ)
− 1 tr(P † P ′ )
from which we have either e iχ = e iχ′
(P and P ′ have the same eigenvalue) or
tr P † P ′ = 0 (P and P ′ are orthogonal in the Hilbert-Schmidt inner product), as
stated. �
Now we can use this result to prove
94
Theorem 4 There exist Bell bases for which the only bilocal stabilizer operators are pro-
portional to the identity.
Proof of Theorem 4:
To prove this, we want to find a Bell basis whose elements Wi are mixed in such
a way that Lemma 7 puts strong constraints on the form of any local stabilizer.
Now an arbitrary Bell basis W = {Wi} on a d × d-dimensional Hilbert space can
be written as a unitary transformation from the standard Bell basis Pd = {Pi},
Wi = �
UijPj , U ∈ U(d 2 ) . (3.14)
j
This is just because we can view the {Wi} and the {Pi} as two orthonormal bases
(with respect to the Hilbert-Schmidt inner product) for the vector space GL(d, C) of
d×d linear operators, and any two orthonormal bases on the same space are related
by a unitary transformation. Unfortunately, analysis of Bell bases is complicated
by the fact that not all U ∈ U(d 2 ) specify Bell bases: The Wi trivially continue
to satisfy the Hilbert-Schmidt orthogonality condition (3.10), but they are not
generally unitary. For the two-qubit Pauli group P ≡ P ⊗2
2
tensor product P ⊗n
2
(and in fact for any
of qubit Pauli groups), though, there is a simple class of
unitary transformations which preserves the unitarity of the elements, based on
the fact that for any P ∈ P ⊗n
2 , P 2 = ±1.
In particular, suppose P, Q ∈ P ≡ P ⊗2
2 . We can modify this standard basis by
replacing the elements P and Q by the two unitary operators
P cos θ + iQ sin θ
iP sin θ + Q cos θ
if P † Q − Q † P = 0 (i.e., (P † Q) 2 = +1); and by
P cos θ + Q sin θ
−P sin θ + Q cos θ
(3.15)
(3.16)
if P † Q + Q † P = 0 (i.e., (P † Q) 2 = −1). Here θ, in (3.15) and (3.16), is an
arbitrary free parameter. 29 This clearly gives us a lot of freedom in our choice of
29 Each of these transformations is a unitary transformation of the Pi, in the sense of (3.14); so the
95
bases: We can partition P into pairs in any way we like and mix the operators
in each two-element set by arbitrary and unrelated angles θ while maintaining the
orthogonality and unitarity of the elements.
Using these results we demonstrate a choice of basis for which the only bilocal
stabilizer elements are trivial:
W =
0 1 2 3,4
1 Z1 Z2 Z1Z2
5,6 7,8
X1 X1Z1 X1Z2 X1Z1Z2
9 10 11,12
X2 X2Z1 X2Z2 X2Z1Z2
13,14 15
X1X2 X1X2Z1 X1X2Z2 X1X2Z1Z2
(3.17)
In this graphical representation of the new Bell basis, ovals are drawn around
pairs of elements which are to be mixed according to the appropriate mixing equa-
tions (3.15), (3.16) above; thus, for instance, W3 = Z1Z2 cos θ + iX1Z1Z2 sin θ
and W4 = iZ1Z2 sin θ + X1Z1Z2 cos θ, since � (Z1Z2) † (X1Z1Z2) � 2 = +1. The mix-
ing angles are arbitrary (except that of course they should not be multiples of
π
2 ). (Ovals enclosing a single element indicate, of course, that that element is not
modified: e.g., W1 = Z1.)
Now we can follow a twisted path through this Bell basis to show that any
bilocal stabilizer S for the basis W acts only as a global phase. First, since W0 =
1 ∈ W (this Bell basis is normal), Lemma 5 tells us that any bilocal stabilizer has
the form S = e iχ0 U ⊗ U ∗ . As usual we will assume wlog that χ0 = 0. Now the
stabilization criterion (3.11) applies, and UWi = e iχi WiU for all the Wi ∈ W.
Since both W1 = Z1 and W2 = Z2 are stabilized by U, so is Z1Z2 (by Lemma 6).
orthogonality of these operators, with each other and with the rest of the Pi, is immediate in both cases.
The unitarity of these operators is what depends on these “commutation relations” of P and Q. For
higher-order Pauli groups, the same sort of relation will probably hold, but the linear combinations need
to have more elements in order to satisfy the unitarity constraints.
96
But now since W3 = α3Z1Z2 + β3X1Z1Z2 is also stabilized by U (where α3 and β3
are the nonzero mixing coefficients, here given by (3.15)), Lemma 7 implies that
both Z1Z2 and X1Z1Z2 are stabilized by U, with the same eigenvalue. Applying
Lemma 6 twice, we find that X1 = (X1Z1Z2)(Z1Z2) −1 is also stabilized by U, with
eigenvalue 1.
Now since both X1 and W5 = α5X1+β5X2 are stabilized by U, Lemma 7 shows
that X1 and X2 are both stabilized by U, with the same eigenvalue. Hence X2
also has eigenvalue 1. Next, considering W13 = α13X1X2 + β13X1X2Z1 and using
an argument analogous to that with W3 above, we find that Z1 is also stabilized
by U with eigenvalue 1. Finally, since X1Z1 is stabilized by U with eigenvalue 1,
the eigenstate W7 = α7X1Z1 + β7X1Z2 shows us that X1Z2 also has eigenvalue 1,
and hence Z2 has eigenvalue 1 as well.
So X1, X2, Z1, and Z2 all are stabilized by U with eigenvalue 1, and hence
every element in P is stabilized by U with eigenvalue 1. Since P forms a basis
for GL(4, C), all linear operators are stabilized by U with eigenvalue 1. Thus by
Schur’s lemma U must be only a phase, and the stabilizer S = U ⊗ U ∗ = 1 is
trivial. �
For this Bell basis, then, we can’t use the “recursive stabilizer” idea, because we can’t
even find a single nontrivial stabilizer to partition the eigenstates.
3.2.4 DiVincenzo’s conjecture
The above examples of causal operators have made a progression away from the set of lo-
calizable operators; each has been more complicated to simulate even 1-localizably. What
of DiVincenzo’s conjecture, then, that not only all causal operators but all 1-causal oper-
ators are 1-localizable?
For the restricted class of operators discussed here (the complete von Neumann mea-
surement superoperators), the conjecture holds true. Even the superoperator of Exam-
ple 3, for which there are no nontrivial bilocal stabilizer operators, can be simulated
1-localizably.
97
Theorem 5 All 1-causal complete von Neumann measurement superoperators (allowing
no communication from Bob to Alice) can be simulated 1-localizably (with a one-way
quantum channel from Alice to Bob).
Since of course all 1-localizable superoperators are 1-causal, this means that these two
classes of superoperators are equivalent.
To prove this, we will display a protocol for 1-localizably simulating any 1-causal
superoperator of the form (3.2). The idea behind the protocol is that since all full-rank
Bell states are locally (not just bilocally) interconvertible, Alice can send to Bob her half
of the original system, H A , along with H S , containing half of a newly-minted Bell state
|Φ +
d 〉 RS ; then Bob, having all of HAB , can perform the measurement onto the basis of
eigenstates of /S and locally (via some unitary operator on H S ) rotate the new Bell state
into the correct basis state, to match the result of his projective measurement. For the
more general case of Theorem 1, in which Alice’s Hilbert space is partitioned into disjoint
subspaces spanned by Bell subbases, Alice first does a partial measurement to determine
which subspace the system is in (and hence which Bell subbasis is needed); she then sends
to Bob her partially-projected system and an appropriate Bell state.
The complete proof is little more than a rewriting of this argument:
Proof of Theorem 5:
Let /S be a 1-causal complete von Neumann measurement superoperator, allowing
no signal to be sent from Bob to Alice. We wish to demonstrate a 1-localizable
protocol (allowing quantum communication from Alice to Bob) by which /S can be
simulated.
By Theorem 1, we know that the eigenstates 30 |φai〉 are rank-da Bell states on
orthogonal subspaces Ea × H B ; that is, tr B |φai〉〈φai| = 1
da Ea, where EaEb = δabEb
and �
a Ea = 1. Since the Bell state |φai〉 has support entirely on the subspace
Ea × 1, we may write (where Eai ≡ |φai〉〈φai|) EaiEa = Eai = EaEai.
30 Here we have given the eigenstates two indices. The first index a explicitly labels the subspace of HA
on which the projector has its support; the second runs over all of the states with support on this subspace.
Now by Lemma 5, we can write 31
98
B
|φai〉 = 1 ⊗ Wai |Φ A
a B
〉 , A
where |Φ a 〉 A B is a particular fixed Bell state on Ea × H B (say,
|Φ a da−1
B 1
〉 ≡ √da
A
i=0
�
|a, i〉 |a, i〉
A B ,
where {|a, i〉 A } is an orthonormal basis for Ea and {|a, i〉 B } is a set of da orthonormal
states in H B ).
The protocol is as follows:
B
ρA Alice performs the partial (local projective) measurement
defined by the projectors {Ea}, storing the (classical)
measurement result in a classical ancilla system HM .
↓
�
B
(Ea) ρ (Ea) ⊗ |a〉 〈a|
A A
A M M
a
�
a
�
a
Alice uses her measurement result to create the state |Φ a 〉 RS
on an additional bipartite Hilbert space H RS (a copy of the
original Hilbert space H A B , so that dim HR = dim H A and
dim HS = dim H
↓
B ), and swaps the states of systems A and R.
� B �
(Ea) ρ (Ea) ⊗ |a〉M 〈a| ⊗ |Φ
R R
R
M a 〉 〈Φ
AS AS a |
Alice sends her measurement result (system HM ), her half of
the partially-measured original system (now in HR ), and
Bob’s half of the new-minted Bell state (in HS ), to Bob.
↓
(Ea) R ρ RB (Ea) R ⊗ |a〉 M M 〈a| ⊗ |Φ a S S a
〉 〈Φ |
A A
31 Throughout the description of this protocol, the indices for systems which Alice holds will be sub-
scripted, while the indices for systems Bob holds will be superscripted.
�
ai
�
a
�
abi
99
(Ea) R ρ RB (Ea) R ⊗ |a〉 M M 〈a| ⊗ |Φ a S S a
〉 〈Φ |
A A
Bob performs the complete von Neumann measurement
{(Ebi)
↓
RB }, storing the measurement results (b, i) in a second
ancilla system HN .
�
Ebi RB Ea R ρ RB Ea R Ebi RB� ⊗|a〉 M M 〈a|⊗|b, i〉 N N 〈b, i|⊗|Φ a S S a
〉 〈Φ |
A A
= �
ai
�
Eai RB ρ RB Eai RB� ⊗ |a〉 M M 〈a| ⊗ |a, i〉 N N 〈a, i| ⊗ |Φ a S S a
〉 〈Φ |
A A
(since Ebi RB Ea R = δabEbi RB )
Bob performs the local unitary rotation Wai on H
↓
S
(conditioned on measurement result |a, i〉 N ), and swaps HB and HS .
(Wai) B |Φ a B B a †
〉 〈Φ | (W A A
ai )B⊗(Eai RS ρ RS Eai RS )⊗|a〉 M M 〈a|⊗|a, i〉 N N 〈a, i|
�
ai
= �
ai
B B
|φai〉 〈φai| ⊗ (Eai A A
RS ρ RS Eai RS ) ⊗ |a〉 M M 〈a| ⊗ |a, i〉 N N 〈a, i|
↓ Bob discards the ancilla systems HMNRS .
tr � Eaiρ � B B
〈a |a〉〈a, i |a, i〉|φai〉 〈φai|
A A
= �
= �
B B B B
〈φai| ρ |φai〉 |φai〉 〈φai|
A
A
A A
ai
B B
(Eai) ρ(Eai)
A
A
ai
which is precisely the action of /S.
The ancilla systems H M and H N are not actually necessary for the protocol, but
they make the argument a little easier to follow. �
This is clearly a much more powerful protocol than the earlier classical-only protocols
discussed; the simplicity of the protocol may make DiVincenzo’s conjecture more plausible.
The equivalence of local unitary operations on the two halves of a bipartite Bell state means
that Bob can perform operations that it initially seems must be performed by Alice.
Bell bases in 2 × 2 dimensions
As a final aside to this section, we note that
100
Lemma 8 In 2 × 2 dimensions, all (complete) Bell-basis measurement superoperators are
localizable.
Proof of Lemma 8:
As before, let us consider a set W ≡ {Wi} of elements of SU(2) such that the
eigenstates of the superoperator /S are |φi〉 ≡ Wi ⊗ 1 |Φ + 〉. Now any element
U ∈ SU(2) can be written (uniquely) as U = ˆvU · � P , where � P ≡ (Pβ) ≡ (P0, P ) ≡
(1, iσx, iσy, iσz) is the vector of SU(2) Pauli matrices and ˆvU is a unit vector in
R 4 . Matrices U, V ∈ SU(2) can be seen to be orthogonal in the Hilbert-Schmidt
norm (tr U † V = 0) iff the corresponding vectors are orthogonal in Euclidean R 4 ,
ˆvU · ˆvV = 0. Hence any Bell basis Wα in 2 × 2 dimensions can be derived by a
(real) orthogonal transformation M = (Mαβ) ∈ SO(4) from the Pauli basis {Pβ}:
Wα = �
and any such M defines a Bell basis.
β
MαβPβ ,
Let us characterize the set W. First, let us use our basis freedom W ↦→ WU to
ensure that 1 ≡ W0 ∈ W. Now we can (by permuting the rows of M if necessary)
consider 1 to be preserved by the orthogonal transformation M. Hence M is of
the form M = 1 ⊕ M ′ , where M ′ ∈ SO(3), and the three Wa are of the form
Wa = ma · P (where the ma are the three orthonormal row vectors of M ′ ).
The set W now can be seen to be isomorphic to the projective group {Pβ} (to
form a group we must take {±Pβ} ∼ Q8, the quaternionic group): We have (for
a, b ∈ {1, 2, 3}, a �= b) W †
a = −Wa , WaWa = −1, and
WbWa = �
mbimajPiPj = − �
ij
ij
mbimaj(δij1 + εijkPk)
= −(ma · mb + mb × ma · P ) = ma × mb · P
= �
εabcmc · P = �
εabcWc .
c
We see that the case of 2 × 2 dimensions is special because there are only three
c
101
nontrivial basis states, any two of them determining the third up to a sign. 32 This
is precisely the group formed by the {Pβ}; hence we can use the method of (3.5),
/S(ρ) = 1
4
�
α
(W α ⊗ W ∗
α )ρ(W α
⊗ W ∗
α )† ,
to implement the decoherence superoperator for this Bell basis. �
3.3 Unitary superoperators
Another special case, that of unitary superoperators /S(ρ) = UρU † , turns out to be trivial;
every causal unitary superoperator is in fact a tensor product of local unitary superoper-
ators (and so trivially localizable):
Theorem 6 (Nielsen [12]) A unitary operator U AB is causal iff it is a tensor product
operator, U AB = U A ⊗ U B .
Proof of Theorem 6:
The “if” direction is trivial. Now suppose that U AB is not a tensor product oper-
ator on H A ⊗ H B . Then the Schmidt decomposition for U AB ,
U AB ≡ �
has at least two terms.
i
ζ i A i ⊗ B i , tr(A †
i A j ) = d A δ ij , d A ≡ dim H A
(all ζ i > 0) tr(B †
i B j ) = d B δ ij , d B ≡ dim H B ,
We show that Bob may send a signal to Alice. Let Alice introduce an ancilla
system H R (of dimension d A ), and let the initial state be
|Ψ〉 ARB ≡ 1
� dA
d A�
i=1
|i〉 |i〉 |ψ〉 ≡ |Φ A R B +
d 〉 |ψ〉 .
A AR B
After action of U AB on |Ψ〉 ARB , Alice’s density matrix (on H AR ) is
σ(|ψ〉) ≡ tr
B
= �
ab
�
U |Ψ〉 〈Ψ| (U
AB ARB ARB † �
)
AB
ζaζbAa |Φ +
d 〉 〈Φ
A AR AR +
d | A
A
†
b 〈ψ| B†
B bBa |ψ〉 B .
32 In the proof we have assumed wlog that (m1, m2, m3) form a right-handed coordinate system (oth-
erwise we could just rename W2 ↔ W3 to get one).
102
The {Aa |Φ +
d 〉
A AR } are orthonormal states:
AR 〈Φ+ d | A
A
†
bAa |Φ+ d 〉 =
A AR 1
dA = 1
d A
Case I: There is some B i not unitary.
�
ij
�
i
〈j| A†
A bAa |i〉 AR 〈j |i〉 R
〈i| A†
A bAa |i〉 1
= tr A A dA †
bAa = δab .
Then Bi has two distinct singular values, and B †
i Bi has two distinct
eigenvalues, λ0 and λ1. Let |ψ0〉 B and |ψ1〉 B be eigenvectors of B †
i Bi with these two distinct eigenvalues.
Case II: All B i are unitary.
tr(B †
i Bj ) = 0 for i �= j. Then since B†
i Bj is unitary it must have two
distinct eigenvalues, λ0 and λ1. Let |ψ0〉 B and |ψ1〉 B be eigenvectors of
B †
i B j
with these two distinct eigenvalues.
Bob can prepare either |ψ0〉 B or |ψ1〉 B (clearly he can locally rotate between
these two states), and this affects Alice’s density matrix σ(|ψn〉), and in particular
the matrix element between Ai |Φ +
d 〉
A AR and Aj |Φ +
d 〉
A AR :
AR 〈Φ+ d | A
A
†
i σ(|ψn〉)A j |Φ +
d 〉 = ζiζj 〈ψn| B
A AR B †
j Bi |ψn〉
B
= ζiζjλn
distinct for n = 0, 1. �
3.4 More general superoperators
It remains to consider the case of more general superoperators. Let us consider only
incomplete von Neumann measurement superoperators (i.e., those in which the Kraus
operators, while still projectors, need not be one-dimensional). (Note that since a POVM
may be implemented as a von Neumann measurement on an augmented Hilbert space, we
may immediately say that if the POVM allows signalling, so must any implementation of
the POVM as a von Neumann measurement. In this sense, allowing POVMs does not add
any surprises.)
103
As an example of the new subtleties for this case, consider the three-outcome incom-
plete Bell measurement superoperator with
E1 = � � Φ + �� Φ +� �
E2 = � �Ψ +�� Ψ +� � + � �Ψ −�� Ψ −� �
E3 = � �Φ −�� Φ −� � ;
tr A Ea ∝ 1 for all Ea, but signalling is possible (e.g., with initial state |ψ〉 = |00〉 and Bob’s
local operator either 1 or U = σx), even though signalling was not possible in the case of
the complete Bell measurement superoperator (which was in fact localizable—recall (3.5)).
The difference is that this superoperator has another completion which (from Theorem 1)
can be used for signalling:
E2,1 = |01〉〈01|
E2,2 = |10〉〈10| .
It is true, though, that an incomplete measurement can be used for signalling only if
it has a completion which can be used for signalling (and so satisfies the hypotheses of
Theorem 1):
Theorem 7 A von Neumann measurement operator /S with (orthogonal) projectors Ea
can be used for signalling from Bob to Alice only if there exists a completion � /S of /S
(i.e., a superoperator derived from a set of one-dimensional orthogonal projectors � Eai
with Ea ≡ �
i � Eai) which allows signalling from Bob to Alice, with the same initial
state |ψ〉 AB and optional unitary operator U B .
Proof of Theorem 7:
Fix an initial state |ψ〉 AB and optional unitary operator U B , from a signalling
protocol for /S. Let us define |0a〉, |1a〉, the projections of the two possible initial
states onto the measurement eigenspaces: √ αa0 |0a〉 ≡ Ea |ψ〉 and √ αa1 |1a〉 ≡
Ea(1 ⊗ U) |ψ〉, where |0a〉 and |1a〉 are normalized states—but not necessarily
orthogonal—and the αai ≥ 0. Further, let σa0 = tr B |0a〉〈0a| and σa1 = tr B |1a〉〈1a|
be Alice’s density matrices corresponding to these states. Hence Alice’s density
104
matrix is either σ(1) = �
a αa0σa0 or σ(U) = �
a αa1σa1; these differ (so that Alice
can distinguish them, and this protocol allows signalling) iff
If the subspace E � a ≡
0 �= σ(1) − σ(U) = �
δσa ≡ �
(αa0σa0 − αa1σa1) . (3.18)
a
�
�
Ea |ψ〉, EaU |ψ〉
a
is one-dimensional (i.e., αa0 = 0, αa1 =
0, or |〈0a |1a〉| = 1), then we can trivially complete Ea to a set of one-dimensional
projectors while leaving the superoperator’s action on the states |ψ〉 and 1 ⊗ U |ψ〉
unchanged (we just take � Ea1 to be the projector onto this one-dimensional space,
and complete this basis in Ea). We call such an Ea a degenerate projector
(though clearly this “degeneracy” depends also on the chosen protocol (|ψ〉, U)).
Suppose now that E � a is two-dimensional (Ea a nondegenerate projector).
Again, we may complete the orthogonal complement (in Ea) of E � a however we
please, without affecting the superoperator’s action. So without loss of generality
we may assume that the projectors of the given incomplete von Neumann mea-
surement operator are all either one-dimensional, or two-dimensional and nonde-
generate (again, relative to the chosen signalling protocol).
Let E be a nondegenerate two-dimensional projector; as before let √ α0 |0〉 ≡
E |ψ〉 and √ α1 |1〉 ≡ EU |ψ〉. We will show that either we can complete E to one-
dimensional projectors E ≡ � Ea + � Eb in such a way that the completion has no
effect on Bob’s density matrix, or we can complete E to one-dimensional projectors
in two ways with different effects on Bob’s ability to distinguish the states |ψ〉 and
U |ψ〉. Then if all of the Ea are of the first type, we can complete all of them without
affecting Bob’s density matrices at all, and signalling is clearly still possible. If at
least one of the Ea is of the second type, we can complete all of the other projectors
however we want; then at least one of the two completions of Ea must allow Bob
to distinguish the resulting density matrices �σ(1), �σ(U).
For convenience, we define two particular complete von Neumann measure-
ments on this space: the randomizing measurement E ≡ � E+ + � E− (providing no
105
|+〉
✻
|¯1〉
❅■
|¯0〉
❅
❅
❅
❅
❅
❅
❅ ��������✒
❅�✂
✲ |−〉
✂✂✂✂✂✂✂✂✂✂✂✍
eiα e |0〉
❇❇▼
❇
❇
❇
❇
❇
❇
❇
❇
❇
❇
−iα |1〉
|¯0(φ)〉
◗❦
◗
◗
◗
◗
◗
◗
◗
◗
◗✡
✡✡✡✡✡✡✡✡✡✣
φ
|¯1(φ)〉 φ θ θ
❦
✢
π
4
Figure 3.3: States used in the completions of E.
information distinguishing |0a〉, and |1a〉), where � E± ≡ |±〉〈±| and
|±a〉 ≡ eiα |0a〉± e−iα |1a〉
�
� � �eiα |0a〉± e−iα |1a〉 � � � � , 〈0a |1a〉 ≡ e 2iα cos 2θ ; (3.19)
and the optimal von Neumann measurement E ≡ � E¯0 + � E¯1 ( � Eā ≡ |ā〉〈ā|), where
|¯0〉 ≡ 1
� �
√ |+〉+ |−〉
2
|¯1〉 ≡ 1
√ 2
� |+〉− |−〉 � .
(3.20)
The phases have been chosen so that the states e iα |0〉, e −iα |1〉, |+〉, |−〉, |¯0〉, and
|¯1〉 all lie in a real two-dimensional vector space (shown in Figure 3.3). Thus we
can define a one-parameter family of unitary operators V (φ) on H AB , acting by
rotation within E and as the identity on 1 − E. (Explicitly, take V (φ) to have
eigenvectors |¯0〉± i |¯1〉 with eigenvalues e ∓iφ , and eigenvalue 1 on 1 − E.)
We can now use V (φ) to define a family of complete measurements on the
subspace E: We take E ≡ � E¯0(φ) + � E¯1(φ), where
E ā(φ) = V (φ)EāV (φ) † = |ā(φ)〉〈ā(φ)| , (3.21)
the measurement whose projectors are rotated by φ from the projectors � E¯0 and
�E¯1, as shown in Figure 3.3.
Consider, for this completed measurement superoperator, the difference � δσ(φ)
106
in Alice’s density matrices, depending on whether Bob performed U:
�δσ(φ) ≡ α0�σ0(φ) − α1�σ1(φ)
�
= α0 tr B
− α1 tr B
= α0−α1
2
+ � α0−α1
2
cos 2 � π
4 − θ − φ� |¯0(φ)〉〈¯0(φ)| + sin 2 � π
4 − θ − φ� �
|¯1(φ)〉〈¯1(φ)|
�
cos 2 � π
4 + θ − φ� |¯0(φ)〉〈¯0(φ)| + sin 2 � π
4 + θ − φ� |¯1(φ)〉〈¯1(φ)|
tr B E
· � cos 2φ tr B
sin 2φ cos 2θ + α0+α1
2
cos 2φ sin 2θ �
� |¯0〉〈¯0| − |¯1〉〈¯1| � + sin 2φ tr B
� |+〉〈+| − |−〉〈−| �� .
If � δσ(φ) is a nonconstant function of φ, then we can certainly choose some φa
so that the sum �σ(1) − �σ(U) = �
a � δσa(φa) is nonzero, and Alice can determine
whether Bob performed U.
Thus it remains only to consider the case in which all � δσa(φa) are constant.
The above equation gives three conditions for which � δσ(φ) is constant:
I: cos 2θ = 0 and α0 + α1 = 0.
This case is immediately excluded since by assumption (E being nonde-
generate) α0 > 0 and α1 > 0.
II: sin 2θ = 0 and α0 − α1 = 0.
This case is also excluded, since E is degenerate if 2θ = nπ (n ∈ Z).
�
III: trB |¯0〉〈¯0| − |¯1〉〈¯1| � � �
= 0 and trB |+〉〈+| − |−〉〈−| = 0.
Only this final case remains; so suppose that
0 = tr B
� � �
|+〉〈+| − |−〉〈−| = trB |¯0〉〈¯1| + |¯1〉〈¯0| �
and so Alice’s density matrix, associated with a vector |¯0(φ)〉 in this subspace, is
tr B
�
V (φ) |¯0〉〈¯0| V (φ) †�
= tr B
� cos 2 φ |¯0〉〈¯0| + sin 2 φ |¯1〉〈¯1| + cos φ sin φ � |¯0〉〈¯1| + |¯1〉〈¯0| ��
= tr B |¯0〉〈¯0| ≡ τ ,
independent of φ. So for the incomplete measurement superoperator (with projec-
�
107
tor E), one of the terms in the Kraus expansion is
tr B E |ψ〉〈ψ| E ≡ α0σ0 ≡ α0 tr B |0〉〈0| = α0τ
tr B EU |ψ〉〈ψ| U † E ≡ α1σ1 ≡ α1 tr B |1〉〈1| = α1τ ,
(depending on whether or not Bob performs U). Meanwhile for the completed
measurement superoperator (with projectors � E¯0(φ), � E¯1(φ)), this term in the Kraus
expansion is replaced by the two terms
tr B
tr A
�
�E¯0(φ) |ψ〉〈ψ| � E¯0(φ) + � E¯1(φ) |ψ〉〈ψ| � �
E¯1(φ)
� � �
= tr �E¯0(φ) E |ψ〉〈ψ| E �E¯0(φ) + B
� � � �
E¯1(φ) E |ψ〉〈ψ| E �E¯1(φ)
�
�E¯0(φ) |0〉〈0| � E¯0(φ) + � E¯1(φ) |0〉〈0| � �
E¯1(φ)
= α0 tr B
= α0
�
cos 2 � π
4 − θ − φ� tr A |¯0(φ)〉〈¯0(φ)| + sin 2 � π
4 − θ − φ� tr A |¯1(φ)〉〈¯1(φ)|
= α0τ , and similarly
� �E¯0(φ)U |ψ〉〈ψ| U † � E¯0(φ) + � E¯1(φ)U |ψ〉〈ψ| U † � E¯1(φ)
= α1τ .
So these terms in the sum are unchanged by any such completion of E.
Thus if � δσa(φa) is constant for all (nondegenerate) Ea, we can complete the Ea
using (say) the optimal von Neumann measurement without changing the values
of σ(1) or σ(U); so the distinguishability of these states is preserved. �
In this sense, there are no real surprises in considering incomplete measurements; such
a measurement only allows signalling if it is part of a complete measurement scheme which
does. But this condition is less satisfactory than the condition for the complete case: it is
computationally more difficult to determine, and it is only a necessary condition.
3.5 Related work
In this paper we have concentrated mainly on proving results using the basic physical
principle of causality—information cannot be transmitted outside of the forward light
cone—for the restricted class of complete projective superoperators. For this case we were
�
�
108
able to find interesting and intuitive results; this class of superoperators also provided
enough generality to demonstrate the nonequivalence of causal and localizable superoper-
ators (unlike the class of unitary superoperators, §3.3).
Several papers by Aharonov, Albert, Popescu, and Vaidman have studied the issues
of causality and localizability for nonlocal measurements, with primary interest in nonde-
molition measurements of a single given (multipartite, pure) state |ψ〉: that is, for super-
operators satisfying
/S � |ψ〉 AB AB 〈ψ| � = |ψ〉 AB AB 〈ψ| ⊗ |1〉 RS RS 〈1|
/S � |ψ ⊥ 〉 〈ψ
AB AB ⊥ | � = | ˜ ψ ⊥ 〉 〈
AB AB ˜ ψ ⊥ � �
| ⊗ |0〉 〈0| , for any
RS RS RS ψ �ψ ⊥�
RS = 0 .
Note that the superoperator is only required to be nondemolition for the distinguished
state |ψ〉; we place no special restrictions on the states | ˜ ψ ⊥ 〉. The “measurement” is pre-
sumed to be nonlocal in time, as it is in this work, with prior state preparation extending
arbitrarily far back in time and the readout of the measurement result extending arbitrar-
ily far forward in time; only the quantum interaction between the system to be observed
and the measurement apparatus is required to be constrained in time. The “measurement
result,” in system RS, may be inaccessible to local measurements by either Alice or Bob.
Aharonov and Albert [5] investigated such measurements for the particular case of a pair
of spin- 1
2 systems; in [4] they considered measurements of the nonlocal observable x B − x A
(where x A is the position operator on system A). Aharonov, Albert, and Vaidman [6]
generalized the case of [5] to arbitrary finite-dimensional Hilbert spaces, obtaining a re-
sult for single-state nondemolition measurements analogous to Corollary 1. Popescu and
Vaidman [87] consider more general measurement superoperators, again for the case of a
pair of spin- 1
2
systems, and prove Corollary 1 for this case.
The inequivalence of causal and localizable operations (Corollary 2) is intriguing. Such
a class of forbidden operations hints at further physical principles or conservation laws,
beyond causality, restricting the class of operations we may perform. Another class of
restrictions is provided by Bell-type inequalities, which place general restrictions on ex-
pectation values of random variables arising from classical probability distributions or from
quantum-mechanical amplitude distributions. Some such results are presented in [12, §VI],
109
using the (classical) CHSH and (quantum-mechanical) Cirel’son inequalities to constrain
the fidelity with which a particular causal operator can be localizably implemented.
The constructions of nonlocalizable superoperators in this paper, especially the con-
struction (3.17) of Example 3, are reminiscent of the construction of the “domino ba-
sis” [18] of Bennett et al. Their prime consideration is the implementation of measurements
on a bipartite Hilbert space, using only local operations and classical communication with
no shared entanglement, rather than our “measurement superoperators” implemented us-
ing arbitrary prior entanglement. Nevertheless, the multipartite states considered in [18]
are excellent examples of the sorts of states which we wish to consider in understanding
the distinctions between causality and localizability. (Note, however, that their domino
basis is not a semicausal measurement basis.)
The restriction to the class of complete measurement operators means that our proof of
DiVincenzo’s conjecture on the equivalence of 1-causal and 1-localizable operators (The-
orem 5) also has restricted applicability. The general form of DiVincenzo’s Conjecture
(conjecture 2) has now been proved by Eggeling, Schlingemann, and Werner [40]. They
also remark on some limitations of the model of quantum mechanics used here: for exam-
ple, the tensor-product structure we have postulated breaks down at short distances.
Nielsen [12] has derived a formula for determining whether a general bipartite super-
operator is causal: /S AB is causal iff the mutual information, between systems H R and H S ,
of the state
� �� +
/SAB ⊗ 1 |Φ 〉AR 〈Φ
RS
AR + | ⊗ |Φ + 〉 〈Φ
BS BS + | �
is zero. This formula is computationally tractable, an advantage over the causality con-
dition described here. It is not clear whether this approach provides similarly simple
methods for determining (for example) localizability, however.
3.6 Discussion
As noted in [18], there are several different operations that can reasonably be called
measurement superoperators in the bipartite case considered here. In the most powerful
form of measurement, both Alice and Bob have local knowledge of the measurement results
after the superoperator acts:
110
/S(ρ) ≡ �
(EaρEa) ⊗ |i〉 〈i| ⊗ |i〉 〈i| , (3.22)
R R S S
a
where R and S are measurement records created by the superoperator and held by (respec-
tively) Alice and Bob. A more general, and less powerful, form of measurement causes the
measurement record to be split nonlocally, in a form which may be inaccessible to Alice
or Bob alone but which is readable when Alice and Bob bring their measurement records
together:
/S(ρ) ≡ �
(EaρEa) ⊗ |i〉 〈i| . (3.23)
RS RS
a
The “von Neumann measurement superoperators” considered here (3.2) are a third form of
measurement, in which neither Alice nor Bob have any information about the measurement
results. However, it is clear that if Alice and Bob may implement such a superoperator
then they may implement it by unitary operations, using the unitary implementation
of an arbitrary superoperator, while saving their ancilla states rather than discarding
them (as we have been assuming in taking the partial trace tr RS [· · · ]). Thus in fact they
may implement it in such a way as to preserve the measurement results, as in (3.23),
and this form of measurement is equivalent to the form we have considered. The first
form (3.22) is, however, strictly more powerful than this—and rather unrealistic in the
situation considered here: even the superoperator projecting on the computational basis
{|i〉 A |j〉 B } allows signalling between Alice and Bob, since Alice can locally modify i while
Bob can locally modify j, and both Alice and Bob learn the pair (i, j) as a result of
the measurement. Because it is a more powerful form of measurement, of course, the
superoperators which we have ruled out on grounds of causality immediately rule out the
corresponding measurements of the first form (3.22).
Another way of weakening the measurement is to consider measurements which need
not leave the measurement subspaces undisturbed; these have been considered by Gro-
isman and Reznik [54], for example. For this destructive measurement procedure to be
theoretically interesting, however, it must be restricted somewhat; otherwise arbitrary
measurements can easily be performed in this formalism by the “exchange measurement”
in which Alice and Bob merely exchange their subsystems with their ancilla systems,
111
transferring all information about the state into the ancillas (and placing the original sys-
tem H AB in a fixed state which is independent of the original state). When these ancillas
are brought together, an arbitrary measurement can trivially be performed. (This type of
measurement can be ruled out, for example, by requiring that the measurement records
be classical: Alice and Bob must measure their ancilla states, in some basis, at the close
of their measurement protocol.)
The related questions in the previous section have given some indications of oppor-
tunities for related research. In particular, the relation between causal and localizable
operators (i.e., the gap between operations allowed by special relativity and those allowed
by quantum mechanics) would be interesting to explore further. We have shown that not
all causal operators are localizable, and thus that the rules of quantum mechanics provide
stricter requirements on the local evolution of a state than those provided by causality.
The Bell-type inequalities provide one such source of stricter requirements; it would be
interesting to know whether this type of additional constraint is sufficient to close the gap
between the causal and the localizable superoperators, or whether some other physical
limitations are required.
The questions asked here have some of the same flavor as questions asked in the
field of quantum communication complexity, where we are concerned with the shared
resources required to solve a particular problem. Here we have considered primarily the
question of whether the quantum operation in question could be implemented. Another
extension of this work would be to consider the corresponding complexity bounds, i.e.,
to ask more quantitatively how much of each given resource is required to implement an
allowed operation.
112
Bibliography
113
[1] ACM. 12th Annual ACM Symposium on Theory of Computing, 1980.
[2] Leonard M. Adleman, Jonathan Demarrais, and Ming-Deh A. Huang. Quantum
computability. SIAM Journal on Computing, 26(5):1524–1540, 1997.
[3] Dorit Aharonov, 1997. Private communication.
[4] Yakir Aharonov and David Z. Albert. States and observables in relativistic quantum
field theories. Physical Review D, 21(12):3316–3324, 1980.
[5] Yakir Aharonov and David Z. Albert. Can we make sense out of the measurement
process in relativistic quantum mechanics? Physical Review D, 24(2):359–370, 1981.
[6] Yakir Aharonov, David Z. Albert, and Lev Vaidman. Measurement process in rela-
tivistic quantum theory. Physical Review D, 34(6):1805–1813, 1986.
[7] Huzihiro Araki and Mutsuo M. Yanase. Measurement of quantum mechanical oper-
ators. Physical Review, 120(2):622–626, 1960.
[8] William Aspray and Arthur Burks, editors. Papers of John von Neumann on Com-
puting and Computer Theory, volume 12 of Charles Babbage Institute reprint series
for the History of Computing. The MIT Press, 1987.
[9] László Babai. Local expansion of vertex-transitive graphs and random generation
in finite groups. In 23rd Annual ACM Symposium on Theory of Computing, pages
164–174. ACM, 1991.
114
[10] László Babai, D. Yu. Grigoryev, and David M. Mount. Isomorphism of graphs with
bounded eigenvalue multiplicity. In 14th Annual ACM Symposium on Theory of
Computing, pages 310–324. ACM, 1982.
[11] David Beckman, Amalavoyal N. Chari, Srikrishna Devabhaktuni, and John Preskill.
Efficient networks for quantum factoring. Physical Review A, 54:1034–1063, 1996.
Available from the LANL preprint archive [71]: quant-ph/9602016.
[12] David Beckman, Daniel Gottesman, M. A. Nielsen, and John Preskill. Causal and
localizable quantum operations. Physical Review A, 64:052309, 2001. Available from
the LANL preprint archive [71]: quant-ph/0102043.
[13] Jacob D. Bekenstein. Energy cost of information transfer. Physical Review Letters,
46(10):623–626, 1981.
[14] Jacob D. Bekenstein. Entropy content and information flow in systems with limited
energy. Physical Review D, 30(6):1669–1679, 1984.
[15] C. H. Bennett. Logical reversibility of computation. IBM Journal of Research and
Development, 17:525–532, 1973.
[16] Charles H. Bennett. The thermodynamics of computation—a review. International
Journal of Theoretical Physics, 21(12):905–940, 1982.
[17] Charles H. Bennett, Ethan Bernstein, Gilles Brassard, and Umesh Vazirani.
Strengths and weaknesses of quantum computing. SIAM Journal on Comput-
ing, 26(5):1510–1523, 1997. Available from the LANL preprint archive [71]:
quant-ph/9701001.
[18] Charles H. Bennett, David P. DiVincenzo, Christopher A. Fuchs, Tal Mor, Eric
Rains, Peter W. Shor, John A. Smolin, and William K. Wootters. Quantum nonlo-
cality without entanglement. Physical Review A, 59(2):1070–1091, 1999. Available
from the LANL preprint archive [71]: quant-ph/9804053.
[19] Charles H. Bennett, David P. DiVincenzo, John A. Smolin, and William K.
Wootters. Mixed state entanglement and quantum error correction. Physical Re-
115
view A, 54(5):3824–3851, 1996. Available from the LANL preprint archive [71]:
quant-ph/9604024.
[20] Ethan Bernstein and Umesh Vazirani. Quantum complexity theory. SIAM Journal
on Computing, 26(5):1411–1473, 1997.
[21] Rajendra Bhatia. Matrix Analysis. Springer, 1997.
[22] Eli Biham, Ofer Biham, David Biron, Markus Grassl, and Daniel A. Lidar. Grover’s
quantum search algorithm for an arbitrary initial amplitude distribution. 1999.
Available from the LANL preprint archive [71]: quant-ph/9807027.
[23] David Biron, Ofer Biham, Eli Biham, Markus Grassl, and Daniel A. Lidar. Gen-
eralized Grover search algorithm for arbitrary initial amplitude distribution. In
Williams [104], pages 140–147. Available from the LANL preprint archive [71]:
quant-ph/9801066.
[24] Gilles Brassard and Peter Høyer. An exact quantum polynomial-time algorithm for
Simon’s problem. In Fifth Israeli Symposium on Theory of Computing and Systems,
pages 12–23. IEEE, IEEE Computer Society Press, 1997. Available from the LANL
preprint archive [71]: quant-ph/9704027.
[25] Gilles Brassard, Peter Høyer, Michele Mosca, and Alain Tapp. Quantum amplitude
amplification and estimation. 2000. Available from the LANL preprint archive [71]:
quant-ph/0005055.
[26] David M. Bressoud. Factorization and Primality Testing. Springer, 1989.
[27] Arthur Burks, editor. Theory of Self-Reproducing Automata. University of Illinois
Press, 1966.
[28] Nicolas J. Cerf, Lov K. Grover, and Colin P. Williams. Nested quantum search
and NP-complete problems. Physical Review A, 61:032303, 2000. Available from the
LANL preprint archive [71]: quant-ph/9806078.
[29] Alonzo Church. An unsolvable problem of elementary number theory. American
Journal of Mathematics, 58(2):345–363, 1936. Included in [33].
116
[30] The Quantum Computation Collective. What makes quantum computers powerful?
1997. Available from John Preskill’s Quantum Computation home page as
http://www.theory.caltech.edu/people/preskill/ph229/manifesto5.ps.
[31] Derek Corneil and Mark Goldberg. A non-factorial algorithm for canonical number-
ing of a graph. Journal of Algorithms, 5:345–362, 1984.
[32] Martin Davis. Computability and Unsolvability. McGraw-Hill, 1958. Reprinted by
Dover, 1982.
[33] Martin Davis, editor. The Undecidable: Basic Papers on Undecidable Proposi-
tions,Unsolvable Problems and Computable Functions. Raven Press, Hewlett, N.Y.,
1965.
[34] Martin Davis. Hilbert’s tenth problem is unsolvable. American Mathematical
Monthly, 80:233–269, 1973. Explanation of result of Ju. V. Matijasevič.
[35] David Deutsch. Quantum theory, the Church-Turing principle, and the universal
quantum computer. Proceedings of the Royal Society of London A, 400(1818):97–
117, 1985.
[36] David Deutsch. Quantum computational networks. Proceedings of the Royal Society
of London A, 425(1868):73–90, 1989.
[37] David Deutsch and Richard Jozsa. Rapid solution of problems by quantum compu-
tation. Proceedings of the Royal Society of London A, 439(1907):553–558, 1992.
[38] D. Dieks. Communication by EPR devices. Physics Letters A, 92(6):271–272, 1982.
[39] David P. DiVincenzo, 2000. Private communication.
[40] T. Eggeling, D. Schlingemann, and R. F. Werner. Semicausal operations are semilo-
calizable. 2001. Available from the LANL preprint archive [71]: quant-ph/0104027.
[41] A. Ekert, R. Jozsa, and R. Penrose, editors. Proceedings of Royal Society Discussion
Meeting “Quantum Computation: Theory and Experiment”, volume 356(1743). The
117
Royal Society of London, Philosophical Transactions of the Royal Society (London)
A, 1998.
[42] Artur Ekert and Richard Jozsa. Quantum algorithms: entanglement-enhanced in-
formation processing. In Ekert et al. [41]. Available from the LANL preprint
archive [71]: quant-ph/9803072.
[43] Mark Ettinger and Peter Høyer. A quantum observable for the graph isomorphism
problem. 1999. Available from the LANL preprint archive [71]: quant-ph/9901029.
[44] Edward Farhi, Jeffrey Goldstone, Sam Gutmann, and Michael Sipser. A limit
on the speed of quantum computation in determining parity. Physical Review
Letters, 81(24):5442–5444, 1998. Available from the LANL preprint archive [71]:
quant-ph/9802045.
[45] Edward Farhi and Sam Gutmann. Quantum mechanical square root speedup in a
structured search problem. 1997. Available from the LANL preprint archive [71]:
quant-ph/9711035.
[46] Richard P. Feynman. Simulating physics with computers. International Journal of
Theoretical Physics, 21(6/7):467–488, 1982.
[47] Richard P. Feynman. Quantum mechanical computers. Foundations of Physics,
16(6):507–531, 1986.
[48] I. S. Filotti and Jack N. Mayer. A polynomial-time algorithm for determining the
isomorphism of graphs of fixed genus. In 12th Annual ACM Symposium on Theory
of Computing [1], pages 236–243.
[49] Edward Fredkin and Tommaso Toffoli. Conservative logic. International Journal of
Theoretical Physics, 21(3/4):219–253, 1982.
[50] Michael R. Garey and David S. Johnson. Computers and Intractability: A Guide to
the Theory of NP-Completeness. Freeman, 1979.
[51] G. C. Ghirardi, F. Miglietta, A. Rimini, and T. Weber. Limitations on quantum
measurements (i; ii). Physical Review D, 24(2):347–352; 353–358, 1981.
118
[52] Daniel Gottesman. Theory of fault-tolerant quantum computation. Physical Re-
view A, 57(1):127–137, 1998. Available from the LANL preprint archive [71]:
quant-ph/9702029.
[53] Daniel Gottesman. Fault-tolerant quantum computation with higher-dimensional
systems. In Williams [104], pages 302–313. Available from the LANL preprint
archive [71]: quant-ph/9802007.
[54] Berry Groisman and Benni Reznik. Measurements of semi-local and non-maximally
entangled states. Physical Review A, 66:022110, 2002. Available from the LANL
preprint archive [71]: quant-ph/0111012.
[55] Lov K. Grover. A fast quantum mechanical algorithm for database search. In 28th
Annual ACM Symposium on Theory of Computing, pages 212–219. ACM, 1996.
Available from the LANL preprint archive [71]: quant-ph/9605043.
[56] Lov K. Grover. Quantum search on structured problems. In Williams [104], pages
126–139. Available from the LANL preprint archive [71]: quant-ph/9802035.
[57] David Hilbert. Mathematical problems. Bulletin of the American Mathematical So-
ciety, 8:437–479, 1901–1902.
[58] Peter Høyer. On arbitrary phases in quantum amplitude amplification. 2000. Avail-
able from the LANL preprint archive [71]: quant-ph/0006031.
[59] Richard Jozsa. Quantum effects in algorithms. In Williams [104], pages 103–112.
Available from the LANL preprint archive [71]: quant-ph/9805086.
[60] Richard Jozsa. Searching in Grover’s algorithm. 1999. Available from the LANL
preprint archive [71]: quant-ph/9901021.
[61] A. Yu. Kitaev. Quantum measurements and the abelian stabilizer problem. 1995.
Available from the LANL preprint archive [71]: quant-ph/9511026.
[62] A. Yu. Kitaev. Quantum computations: algorithms and error correction. Russian
Mathematical Surveys, 52(6):1191–1249, 1997.
119
[63] A. Yu. Kitaev, 1998. Private communication.
[64] E. Knill. Non-binary unitary error bases and quantum codes. 1996. Available from
the LANL preprint archive [71]: quant-ph/9608048.
[65] Emanuel Knill and Raymond Laflamme. A theory of quanum error-correcting
codes. Physical Review A, 55(2):900–911, 1997. Available from the LANL preprint
archive [71]: quant-ph/9604034.
[66] Johannes Köbler, Uwe Schöning, and Jacobo Torán. Graph isomorphism is low
for PP. Journal of Computational Complexity, 2(4):301–330, 1992. Available at
http://citeseer.nj.nec.com/obler92graph.html.
[67] Johannes Köbler, Uwe Schöning, and Jacobo Torán. The Graph Isomorphism Prob-
lem: Its Structural Complexity. Birkhäuser, 1993.
[68] Luděk Kučera. Combinatorial Algorithms. Adam Hilger, 1990.
[69] R. Landauer. Irreversibility and heat generation in the computing process. IBM
Journal of Research and Development, 5:183–191, 1961.
[70] R. Landauer. Uncertainty principle and minimal energy dissipation in the computer.
International Journal of Theoretical Physics, 21(3/4):283–297, 1982.
[71] LANL arXiv: The Los Alamos National Laboratory e-print archive. At
http://xxx.lanl.gov/.
[72] A. K. Lenstra, H. W. Lenstra, Jr., M. S. Manasse, and J. M. Pollard. The number
field sieve. In 22rd Annual ACM Symposium on Theory of Computing, pages 564–
572. ACM, 1990. Available at http://citeseer.ist.psu.edu/lenstra90number.html.
[73] Harry R. Lewis and Christos H. Papadimitriou. Elements of the Theory of Compu-
tation. Prentice Hall, 1981.
[74] Seth Lloyd. A potentially realizable quantum computer. Science, 261:1569–1571,
1993.
120
[75] Seth Lloyd. Ultimate physical limits to computation. Nature, 406:1047–1054, 2000.
Available from the LANL preprint archive [71]: quant-ph/9908043.
[76] Hoi-Kwong Lo and Sandu Popescu. Concentrating entanglement by local actions—
beyond mean values. 1997. Available from the LANL preprint archive [71]:
quant-ph/9707038.
[77] Eugene M. Luks. Isomorphism of graphs of bounded valence can be tested in poly-
nomial time. Journal of Computer and System Sciences, 25:42–65, 1982.
[78] Ju. V. Matijasevič. Enumerable sets are Diophantine. Soviet Mathematics—Doklady,
11(2):354–358, 1970. Translation from the original Russian, in Doklady Akademii
Nauk SSSR, 191(2):279–282, 1970.
[79] James D. Meindl, Qiang Chen, and Jeffrey A. Davis. Limits on silicon nanoelectron-
ics for terascale integration. Science, 293:2044–2049, 2001.
[80] Gary Miller. Isomorphism testing for graphs of bounded genus. In 12th Annual ACM
Symposium on Theory of Computing [1], pages 225–235.
[81] Gary L. Miller. Graph isomorphism, general remarks. Journal of Computer and
System Sciences, 18:128–142, 1979.
[82] M. A. Nielsen. Computable functions, quantum measurements, and quantum dy-
namics. Physical Review Letters, 79(15):2915–2918, 1997. Available from the LANL
preprint archive [71]: quant-ph/9706006.
[83] Michael A. Nielsen and Isaac L. Chuang. Quantum Computation and Quantum In-
formation. Cambridge University Press, 2000.
[84] Yuri Ozhigov. Quantum computer can not speed up iterated applications of a
black box. In Williams [104], pages 152–159. Available from the LANL preprint
archive [71]: quant-ph/9712051.
[85] Christos H. Papadimitriou. Computational Complexity. Addison Wesley, 1994.
[86] Asher Peres. Quantum Theory: Concepts and Methods. Kluwer Academic, 1995.
121
[87] Sandu Popescu and Lev Vaidman. Causality constraints on nonlocal quantum mea-
surements. Physical Review A, 49(6):4331–4338, 1994.
[88] John Preskill. Quantum information and computation. 1997–1998. Lecture notes for
Caltech Physics 229.
[89] Eric M. Rains. Entanglement purification via separable superoperators. 1997. Avail-
able from the LANL preprint archive [71]: quant-ph/9707002.
[90] Eric M. Rains. Nonbinary quantum codes. 1997. Available from the LANL preprint
archive [71]: quant-ph/9703048.
[91] Peter W. Shor. Algorithms for quantum computation: Discrete logarithms and fac-
toring. In 35th Annual Symposium on Foundations of Computer Science, pages 124–
134. IEEE, IEEE Computer Society Press, 1994. Available from the LANL preprint
archive [71]: quant-ph/9508027.
[92] Peter W. Shor. Polynomial-time algorithms for prime factorization and discrete
logarithms on a quantum computer. SIAM Journal on Computing, 26:1484–1509,
1997. Expanded version of [91]; Available from the LANL preprint archive [71]:
quant-ph/9508027.
[93] Daniel R. Simon. On the power of quantum computation. SIAM Journal on Com-
puting, 26(5):1474–1483, 1997.
[94] A. M. Steane. Introduction to quantum error correction. In Ekert et al. [41], pages
1739–1758.
[95] A. M. Steane. A quantum computer only needs one universe. In Studies in History
and Philosophy of Science Part B: Studies in History and Philosophy of Modern
Physics, chapter 8, pages 469–478. Elsevier Science, 2003. Available from the LANL
preprint archive [71]: quant-ph/0003084.
[96] Wu-Ki Tung. Group Theory in Physics. World Scientific, 1985.
122
[97] A. M. Turing. On computable numbers, with an application to the Entschei-
dungsproblem. Proceedings, London Mathematical Society, series 2, 42:230–265,
1936. Corrections and clarifications in 43:544–546. Included in [33].
[98] Umesh Vazirani. On the power of quantum computation. In Ekert et al. [41], pages
1759–1768.
[99] V. Vedral and M. B. Plenio. Entanglement measures and purification procedures.
Physical Review A, 57(3):1619–1633, 1998. Available from the LANL preprint
archive [71]: quant-ph/9707035.
[100] John von Neumann. Computers and information. Collected in [8, Part IV, Ch. 11
(pp.433–490)] and in [27], 1949. Lectures at the University of Illinois.
[101] John Watrous. Succinct quantum proofs for properties of finite groups. In 41st
Annual Symposium on Foundations of Computer Science, pages 537–546. IEEE,
IEEE Computer Society Press, 2000. Available from the LANL preprint archive [71]:
cs.CC/0009002.
[102] Steven Weinberg. The Quantum Theory of Fields; Volume I: Foundations. Cam-
bridge University Press, 1995.
[103] E. P. Wigner. Die Messung quantenmechanischer Operatoren. Zeitschrift für Physik,
133:101–108, 1952. Especially useful if you can read German.
[104] C. P. Williams, editor. Quantum Computing and Quantum Communications: First
NASA International Conference, QCQC’98, volume 1509. Springer (Lecture Notes
in Computer Science), 1999.
[105] W. K. Wootters and W. H. Zurek. A single quantum cannot be cloned. Nature,
299:802–803, 1982.
