Kraus Representation Theorem
Formal statement
Let \(\mathcal H_A\) and \(\mathcal H_B\) be finite-dimensional complex Hilbert spaces, and let
\[ \mathcal N:L(\mathcal H_A)\to L(\mathcal H_B) \]
be a linear map from operators on \(A\) to operators on \(B\). In quantum information theory, \(\mathcal N\) is called a quantum channel when it is completely positive and trace preserving. The Kraus representation theorem says that \(\mathcal N\) is a quantum channel if and only if there exist linear operators
\[ E_k:\mathcal H_A\to\mathcal H_B \]
such that
\[ \mathcal N(\rho)=\sum_k E_k\rho E_k^\dagger \]
for every input operator \(\rho\), and
\[ \sum_k E_k^\dagger E_k=I_A. \]
The operators \(E_k\) are called Kraus operators, operation elements, or noise operators. The condition
\[ \sum_k E_k^\dagger E_k=I_A \]
is exactly the trace-preservation condition. If instead
\[ \sum_k E_k^\dagger E_k\leq I_A, \]
then the same formula defines a completely positive trace-nonincreasing map, which represents a probabilistic branch of a quantum operation, such as one outcome of a measurement.
The finite-dimensional theorem is closely related to Kraus's work on general state changes in quantum theory, Choi's characterization of completely positive maps, and the Stinespring dilation theorem. In modern quantum information, the Kraus representation is one of the standard equivalent descriptions of a quantum channel.
Why this theorem matters
A closed quantum system evolves by a unitary rule,
\[ \rho\mapsto U\rho U^\dagger. \]
But realistic quantum systems are rarely closed. A qubit may decay, dephase, leak photons, interact with a detector, or become correlated with an environment. The state transformation of the subsystem alone is usually not unitary. The Kraus theorem gives the most important algebraic form of such open-system dynamics.
The formula
\[ \mathcal N(\rho)=\sum_k E_k\rho E_k^\dagger \]
should be read as follows. Each operator \(E_k\) describes one coherent branch of the system-environment interaction. The output state is obtained by adding the branch contributions after the environment label has been ignored. The index \(k\) is not necessarily a classical random variable available to the observer. It becomes classical only if the environment or apparatus is actually measured in a way that reveals that label.
This is the operational mental image: a quantum channel is what remains when a larger coherent process is viewed only through the system we keep. Kraus operators are the algebraic footprints of the larger environment.
First direction: Kraus form implies a quantum channel
Suppose a map is given by
\[ \mathcal N(\rho)=\sum_k E_k\rho E_k^\dagger \]
with
\[ \sum_k E_k^\dagger E_k=I_A. \]
We first show that \(\mathcal N\) is positive. If \(\rho\geq0\), then for any vector \(|\varphi\rangle\in\mathcal H_B\),
\[ \langle\varphi|\mathcal N(\rho)|\varphi\rangle = \sum_k \langle\varphi|E_k\rho E_k^\dagger|\varphi\rangle. \]
Each term can be rewritten as
\[ \langle\varphi|E_k\rho E_k^\dagger|\varphi\rangle = \langle E_k^\dagger\varphi|\rho|E_k^\dagger\varphi\rangle. \]
Since \(\rho\geq0\), every term is nonnegative. Therefore
\[ \mathcal N(\rho)\geq0. \]
But quantum theory requires more than positivity. If \(\mathcal N\) acts on one half of a larger entangled system, then
\[ I_R\otimes\mathcal N \]
must also preserve positivity for every reference system \(R\). This is complete positivity. The Kraus form gives this automatically because
\[ (I_R\otimes\mathcal N)(X) =\sum_k (I_R\otimes E_k)X(I_R\otimes E_k^\dagger). \]
If \(X\geq0\), then each term
\[ (I_R\otimes E_k)X(I_R\otimes E_k^\dagger) \]
is positive, and the sum is positive. Hence \(\mathcal N\) is completely positive.
Now we prove trace preservation. For any \(\rho\),
\[ \begin{aligned} \operatorname{Tr}\mathcal N(\rho) &=\operatorname{Tr}\left(\sum_kE_k\rho E_k^\dagger\right)\\ &=\sum_k\operatorname{Tr}(E_k\rho E_k^\dagger)\\ &=\sum_k\operatorname{Tr}(E_k^\dagger E_k\rho)\\ &=\operatorname{Tr}\left[\left(\sum_kE_k^\dagger E_k\right)\rho\right]\\ &=\operatorname{Tr}(I_A\rho)\\ &=\operatorname{Tr}\rho. \end{aligned} \]
Thus the Kraus formula with \(\sum_kE_k^\dagger E_k=I_A\) defines a completely positive trace-preserving map.
Second direction: every finite-dimensional channel has a Kraus form
Now suppose \(\mathcal N\) is completely positive and trace preserving. We show that it admits Kraus operators.
Choose an orthonormal basis \(\{|i\rangle_A\}\) for \(\mathcal H_A\), and introduce a copy \(A'\) of the input space. Define the unnormalized maximally entangled vector
\[ |\Omega\rangle_{A'A} =\sum_i |i\rangle_{A'}|i\rangle_A. \]
The Choi operator of \(\mathcal N\) is
\[ J(\mathcal N) = (I_{A'}\otimes\mathcal N)(|\Omega\rangle\langle\Omega|). \]
Written out explicitly,
\[ J(\mathcal N) = \sum_{i,j}|i\rangle\langle j|_{A'}\otimes\mathcal N(|i\rangle\langle j|)_B. \]
Because \(\mathcal N\) is completely positive and \(|\Omega\rangle\langle\Omega|\geq0\), the Choi operator is positive semidefinite:
\[ J(\mathcal N)\geq0. \]
By the spectral theorem, \(J(\mathcal N)\) can be decomposed as
\[ J(\mathcal N)=\sum_k |v_k\rangle\langle v_k|_{A'B}, \]
where the vectors \(|v_k\rangle\) are chosen to absorb the positive eigenvalues. That is, if
\[ J(\mathcal N)=\sum_k \lambda_k |w_k\rangle\langle w_k|, \]
with \(\lambda_k>0\), then we set
\[ |v_k\rangle=\sqrt{\lambda_k}|w_k\rangle. \]
Each vector \(|v_k\rangle\in\mathcal H_{A'}\otimes\mathcal H_B\) defines a linear operator
\[ E_k:\mathcal H_A\to\mathcal H_B \]
by the rule
\[ E_k|i\rangle_A=(\langle i|_{A'}\otimes I_B)|v_k\rangle_{A'B}. \]
Equivalently,
\[ |v_k\rangle=(I_{A'}\otimes E_k)|\Omega\rangle_{A'A}. \]
Substituting this into the Choi decomposition gives
\[ J(\mathcal N) = \sum_k (I\otimes E_k)|\Omega\rangle\langle\Omega|(I\otimes E_k^\dagger). \]
But the Choi operator uniquely determines the map. Comparing the coefficient of \(|i\rangle\langle j|_{A'}\), we obtain
\[ \mathcal N(|i\rangle\langle j|) = \sum_k E_k|i\rangle\langle j|E_k^\dagger. \]
Since every operator \(\rho\in L(\mathcal H_A)\) is a linear combination of the matrix units \(|i\rangle\langle j|\), linearity gives
\[ \mathcal N(\rho)=\sum_k E_k\rho E_k^\dagger. \]
Finally, because \(\mathcal N\) is trace preserving, the argument from the first direction shows that the Kraus operators must satisfy
\[ \sum_k E_k^\dagger E_k=I_A. \]
Thus every finite-dimensional completely positive trace-preserving map has a Kraus representation.
Relation to the Choi theorem
The proof above is really a constructive version of Choi's theorem. In finite dimensions, a linear map is completely positive exactly when its Choi operator is positive semidefinite. Once the Choi operator is positive, the spectral theorem allows us to split it into rank-one positive pieces. Each rank-one piece becomes one Kraus operator.
This gives an important computational recipe. To find Kraus operators from a channel, form the Choi matrix, diagonalize it, and reshape each nonzero eigenvector into a Kraus operator. The number of nonzero eigenvalues of the Choi matrix gives the minimal number of Kraus operators required. This number is called the Choi rank or Kraus rank of the channel.
Thus the theorem is not only existential. It gives a practical algorithm:
\[ \text{positive Choi matrix} \quad\longrightarrow\quad \text{spectral decomposition} \quad\longrightarrow\quad \text{Kraus operators}. \]
Relation to Stinespring dilation
The Kraus representation is also the algebraic shadow of Stinespring dilation. Given
\[ \mathcal N(\rho)=\sum_k E_k\rho E_k^\dagger, \]
introduce an environment with orthonormal basis \(\{|k\rangle_E\}\), and define
\[ V|\psi\rangle_A = \sum_k E_k|\psi\rangle_B|k\rangle_E. \]
The trace-preserving condition implies
\[ V^\dagger V=I_A, \]
so \(V\) is an isometry. Then
\[ \operatorname{Tr}_E(V\rho V^\dagger) =\sum_k E_k\rho E_k^\dagger =\mathcal N(\rho). \]
This is exactly the Stinespring form. The Kraus index \(k\) is the environment label. If the environment is discarded, the branch label disappears and the channel becomes the sum over \(k\). If the environment is measured in the \(|k\rangle_E\) basis, one can interpret the terms as conditional outcomes.
This is the right way to think about Kraus operators. They are not necessarily classical random events. They are basis-dependent components of a coherent system-environment isometry.
Example 1: a unitary channel
Let
\[ \mathcal N(\rho)=U\rho U^\dagger, \]
where \(U\) is unitary. This is a channel with a single Kraus operator
\[ E_1=U. \]
The completeness condition becomes
\[ E_1^\dagger E_1=U^\dagger U=I. \]
This is the simplest possible case. There is only one branch, so no information leaks into an environment. A unitary channel is the special case of a Kraus representation with Kraus rank one.
Conversely, if a channel has only one Kraus operator \(E\), then trace preservation gives
\[ E^\dagger E=I. \]
So \(E\) is an isometry. If the input and output Hilbert spaces have the same dimension, then \(E\) is unitary. Thus a one-Kraus channel is reversible on its image, while noisy irreversible behavior requires multiple Kraus branches.
Example 2: complete dephasing
The complete dephasing channel in the computational basis is
\[ \Delta(\rho) =|0\rangle\langle0|\rho|0\rangle\langle0| +|1\rangle\langle1|\rho|1\rangle\langle1|. \]
Its Kraus operators are
\[ E_0=|0\rangle\langle0|, \qquad E_1=|1\rangle\langle1|. \]
They satisfy
\[ E_0^\dagger E_0+E_1^\dagger E_1 =|0\rangle\langle0|+|1\rangle\langle1| =I. \]
For an input pure state
\[ |\psi\rangle=\alpha|0\rangle+\beta|1\rangle, \]
we have
\[ |\psi\rangle\langle\psi| = |\alpha|^2|0\rangle\langle0| +\alpha\overline\beta |0\rangle\langle1| +\overline\alpha\beta |1\rangle\langle0| +|\beta|^2|1\rangle\langle1|. \]
After the channel,
\[ \Delta(|\psi\rangle\langle\psi|) =|\alpha|^2|0\rangle\langle0|+|\beta|^2|1\rangle\langle1|. \]
The off-diagonal coherence terms vanish. Operationally, the Kraus operators say that the environment has learned the computational-basis label. If that label is ignored, the phase relation between \(|0\rangle\) and \(|1\rangle\) is no longer visible on the system.
Example 3: bit-flip noise
The bit-flip channel flips a qubit with probability \(p\):
\[ \mathcal N(\rho)=(1-p)\rho+pX\rho X. \]
A Kraus representation is
\[ E_0=\sqrt{1-p}\,I, \qquad E_1=\sqrt p\,X. \]
The completeness condition is
\[ E_0^\dagger E_0+E_1^\dagger E_1 =(1-p)I+pX^\dagger X =I. \]
This example looks like an ordinary classical random process: with probability \(1-p\), nothing happens; with probability \(p\), a bit flip happens. Indeed, this channel can be implemented by sampling a classical random bit and applying either \(I\) or \(X\). But this classical interpretation is special. Not every Kraus representation should be interpreted as a classical lottery. The Kraus representation is generally not unique, and different Kraus sets for the same channel may not correspond to the same apparent branches.
Example 4: depolarizing noise
The single-qubit depolarizing channel can be written as
\[ \mathcal D_p(\rho) =(1-p)\rho+\frac p3\left(X\rho X+Y\rho Y+Z\rho Z\right). \]
A Kraus representation is
\[ E_0=\sqrt{1-p}\,I, \]
\[ E_1=\sqrt{\frac p3}\,X, \qquad E_2=\sqrt{\frac p3}\,Y, \qquad E_3=\sqrt{\frac p3}\,Z. \]
The completeness condition follows from
\[ I^\dagger I=X^\dagger X=Y^\dagger Y=Z^\dagger Z=I. \]
Therefore
\[ \sum_{k=0}^3 E_k^\dagger E_k =(1-p)I+\frac p3I+\frac p3I+\frac p3I =I. \]
Operationally, this channel randomizes the qubit by applying one of the three Pauli errors. It is a basic model of symmetric noise. In quantum error correction, this representation is useful because the code can be designed to detect and correct the error operators \(X\), \(Y\), and \(Z\).
Example 5: amplitude damping
Amplitude damping models energy relaxation, such as an excited qubit decaying from \(|1\rangle\) to \(|0\rangle\). For damping probability \(\gamma\), one Kraus representation is
\[ E_0= |0\rangle\langle0|+ \sqrt{1-\gamma}|1\rangle\langle1|, \]
\[ E_1= \sqrt\gamma |0\rangle\langle1|. \]
In matrix form,
\[ E_0= \begin{pmatrix} 1&0\\ 0&\sqrt{1-\gamma} \end{pmatrix}, \qquad E_1= \begin{pmatrix} 0&\sqrt\gamma\\ 0&0 \end{pmatrix}. \]
Check trace preservation:
\[ E_0^\dagger E_0 = \begin{pmatrix} 1&0\\ 0&1-\gamma \end{pmatrix}, \qquad E_1^\dagger E_1 = \begin{pmatrix} 0&0\\ 0&\gamma \end{pmatrix}. \]
Thus
\[ E_0^\dagger E_0+E_1^\dagger E_1=I. \]
The action on basis states has a clear physical meaning. If the qubit is in \(|0\rangle\), it remains in \(|0\rangle\). If it is in \(|1\rangle\), part of the amplitude remains excited, and part decays to \(|0\rangle\). In the Stinespring picture,
\[ |1\rangle|0\rangle_E \mapsto \sqrt{1-\gamma}|1\rangle|0\rangle_E + \sqrt\gamma |0\rangle|1\rangle_E. \]
The environment state \(|1\rangle_E\) records that an excitation was emitted. Tracing out the environment gives the irreversible damping channel.
Example 6: a measurement outcome is not trace preserving
Let
\[ E_0=|0\rangle\langle0|. \]
The map
\[ \Phi_0(\rho)=E_0\rho E_0^\dagger \]
is completely positive, but it is not trace preserving. Its trace is
\[ \operatorname{Tr}\Phi_0(\rho) =\operatorname{Tr}(E_0^\dagger E_0\rho) =\operatorname{Tr}(|0\rangle\langle0|\rho), \]
which is the probability of obtaining measurement outcome \(0\). Here
\[ E_0^\dagger E_0=|0\rangle\langle0|\leq I, \]
not equal to \(I\). This is a trace-nonincreasing completely positive map. It represents one branch of a measurement instrument, not a full deterministic channel.
The full computational-basis measurement instrument has two branches,
\[ \Phi_0(\rho)=|0\rangle\langle0|\rho|0\rangle\langle0|, \]
and
\[ \Phi_1(\rho)=|1\rangle\langle1|\rho|1\rangle\langle1|. \]
The sum
\[ \Phi_0+\Phi_1 \]
is trace preserving. Thus deterministic evolution requires the total set of branches to satisfy the Kraus completeness relation.
Non-uniqueness of Kraus operators
Kraus representations are not unique. If
\[ \mathcal N(\rho)=\sum_k E_k\rho E_k^\dagger, \]
and \(U=(u_{jk})\) is a unitary matrix acting on the Kraus index space, then the operators
\[ F_j=\sum_k u_{jk}E_k \]
define the same channel. Indeed,
\[ \begin{aligned} \sum_jF_j\rho F_j^\dagger &= \sum_j\sum_{k,\ell}u_{jk}\overline{u_{j\ell}}E_k\rho E_\ell^\dagger\\ &= \sum_{k,\ell}\left(\sum_j u_{jk}\overline{u_{j\ell}}\right)E_k\rho E_\ell^\dagger\\ &= \sum_{k,\ell}\delta_{k\ell}E_k\rho E_\ell^\dagger\\ &= \sum_kE_k\rho E_k^\dagger. \end{aligned} \]
This non-uniqueness is not a nuisance; it is a clue. It reflects the fact that the environment basis in a Stinespring dilation is arbitrary. Rotating the environment basis changes the Kraus operators but not the channel seen by the system.
Minimal Kraus representations are unique up to a unitary mixing of the Kraus operators. Non-minimal representations are related by isometries on the Kraus index space. This is the channel analogue of uniqueness of purification.
Common mistakes
A common mistake is to think that the Kraus operators are uniquely determined by the channel. They are not. The channel is the sum, not the individual terms. Individual Kraus operators depend on a choice of environment basis or measurement realization.
Another common mistake is to confuse trace preservation with unitality. In the Schrödinger picture, where channels act on states, trace preservation is
\[ \sum_k E_k^\dagger E_k=I. \]
Unitality is the different condition
\[ \sum_k E_kE_k^\dagger=I, \]
which means
\[ \mathcal N(I)=I. \]
A channel can be trace preserving without being unital. Amplitude damping is the standard example: it preserves trace, but it does not preserve the maximally mixed state.
A third mistake is to think positivity is enough. A merely positive map sends positive operators to positive operators, but it may fail when applied to part of an entangled state. Quantum channels must be completely positive because systems may be entangled with external references.
Final mental image
The Kraus representation theorem says that every finite-dimensional quantum channel can be decomposed into operator branches:
\[ \mathcal N(\rho)=\sum_k E_k\rho E_k^\dagger. \]
The completeness relation
\[ \sum_kE_k^\dagger E_k=I \]
says that total probability is conserved. The individual terms describe possible ways information can flow into an environment or apparatus. If the environment label is ignored, the branch contributions are summed. If the environment label is measured, the same operators may describe conditional outcomes.
Thus the theorem gives the working language of open quantum systems. Stinespring dilation tells us that every channel is unitary dynamics on a larger system followed by discarding the environment. The Kraus representation is what that story looks like after we choose a basis for the environment and write the resulting reduced dynamics directly on the system.
References
Kraus, Karl. “General State Changes in Quantum Theory.” Annals of Physics 64, no. 2 (1971): 311–335. DOI: 10.1016/0003-4916(71)90108-4.
Choi, Man-Duen. “Completely Positive Linear Maps on Complex Matrices.” Linear Algebra and Its Applications 10, no. 3 (1975): 285–290. DOI: 10.1016/0024-3795(75)90075-0.
Stinespring, W. Forrest. “Positive Functions on C-Algebras.” Proceedings of the American Mathematical Society* 6, no. 2 (1955): 211–216.
Nielsen, Michael A., and Isaac L. Chuang. Quantum Computation and Quantum Information. Cambridge University Press, 10th anniversary edition, 2010.
Watrous, John. The Theory of Quantum Information. Cambridge University Press, 2018.
Preskill, John. Lecture Notes for Physics 219/Computer Science 219: Quantum Computation, Chapter 3.