Monotonicity of Fidelity
Formal statement
Let \(\rho\) and \(\sigma\) be density operators on a finite-dimensional Hilbert space \(\mathcal H_A\), and let
\[ \mathcal N:L(\mathcal H_A)\to L(\mathcal H_B) \]
be a quantum channel, meaning a completely positive trace-preserving linear map. Define the root fidelity by
\[ F(\rho,\sigma) = \left\|\sqrt{\rho}\sqrt{\sigma}\right\|_1 = \operatorname{Tr}\sqrt{\sqrt{\rho}\sigma\sqrt{\rho}}. \]
The monotonicity of fidelity says that
\[ F(\mathcal N(\rho),\mathcal N(\sigma)) \ge F(\rho,\sigma). \]
Thus a quantum channel cannot decrease fidelity. Since fidelity is a closeness measure, this means that applying the same physical process to two states cannot make them less close.
Some authors define fidelity using the squared convention
\[ F_{\mathrm{sq}}(\rho,\sigma) = \left\|\sqrt{\rho}\sqrt{\sigma}\right\|_1^2. \]
With that convention, the same theorem is written as
\[ F_{\mathrm{sq}}(\mathcal N(\rho),\mathcal N(\sigma)) \ge F_{\mathrm{sq}}(\rho,\sigma). \]
The theorem is the fidelity-side companion of the contractivity of trace distance. Trace distance measures distinguishability, so it cannot increase under channels. Fidelity measures closeness, so it cannot decrease under channels.
Operational meaning
The theorem says that physical processing cannot make two quantum states more distinguishable in the fidelity sense. A channel may add noise, discard a subsystem, erase coherence, measure and forget outcomes, or couple the system to an environment and then ignore that environment. All of these operations can make two states harder to tell apart. None of them can create a new separation between the states when the same operation is applied to both.
The simplest mental image is this. Fidelity measures how well two states can be aligned as purifications. If two states have purifications with overlap \(F(\rho,\sigma)\), and we send the visible system through the same channel, then those purifications can be sent through a Stinespring dilation of the channel. Their overlap is unchanged at the larger system level. After discarding the environment, Uhlmann's theorem says the output fidelity is at least that overlap. Therefore the output states are at least as close as the input states.
In one sentence:
\[ \text{the same quantum processing can blur distinctions, but it cannot sharpen them.} \]
Proof using Uhlmann's theorem and Stinespring dilation
We prove the theorem in finite dimensions. By Uhlmann's theorem,
\[ F(\rho,\sigma) = \max_{|\psi_\rho\rangle,|\psi_\sigma\rangle} |\langle\psi_\rho|\psi_\sigma\rangle|, \]
where the maximum is over purifications of \(\rho\) and \(\sigma\) on a sufficiently large reference system.
Choose purifications
\[ |\psi_\rho\rangle_{AR}, \qquad |\psi_\sigma\rangle_{AR} \]
that achieve this maximum, so that
\[ |\langle\psi_\rho|\psi_\sigma\rangle| = F(\rho,\sigma). \]
Since \(\mathcal N\) is a quantum channel, Stinespring dilation gives an environment \(E\) and an isometry
\[ V:A\to B E \]
such that
\[ \mathcal N(\tau) = \operatorname{Tr}_E(V\tau V^\dagger) \]
for every input state \(\tau\). Now apply this isometry to the \(A\) part of the purifications:
\[ |\Psi_\rho\rangle_{BER} = (V\otimes I_R)|\psi_\rho\rangle_{AR}, \]
and
\[ |\Psi_\sigma\rangle_{BER} = (V\otimes I_R)|\psi_\sigma\rangle_{AR}. \]
These are purifications of the output states \(\mathcal N(\rho)\) and \(\mathcal N(\sigma)\), because tracing out \(E R\) gives
\[ \operatorname{Tr}_{ER}(|\Psi_\rho\rangle\langle\Psi_\rho|) = \mathcal N(\rho), \]
and
\[ \operatorname{Tr}_{ER}(|\Psi_\sigma\rangle\langle\Psi_\sigma|) = \mathcal N(\sigma). \]
Their overlap is
\[ \begin{aligned} \langle\Psi_\rho|\Psi_\sigma\rangle &= \langle\psi_\rho|(V^\dagger V\otimes I_R)|\psi_\sigma\rangle \\ &= \langle\psi_\rho|\psi_\sigma\rangle, \end{aligned} \]
because \(V^\dagger V=I_A\). Therefore
\[ |\langle\Psi_\rho|\Psi_\sigma\rangle| = F(\rho,\sigma). \]
But Uhlmann's theorem says that the fidelity between the output states is the maximum overlap over all their purifications. The two purifications above are only one allowed pair. Hence
\[ F(\mathcal N(\rho),\mathcal N(\sigma)) \ge |\langle\Psi_\rho|\Psi_\sigma\rangle| = F(\rho,\sigma). \]
This proves the monotonicity of fidelity.
Proof using measurements
There is another proof that gives a very operational picture. Fidelity also has a measurement characterization. If a POVM \(\{M_y\}\) is applied to both states, it produces classical probability distributions
\[ p_y=\operatorname{Tr}(M_y\rho), \qquad q_y=\operatorname{Tr}(M_y\sigma). \]
The classical fidelity, or Bhattacharyya coefficient, is
\[ B(p,q)=\sum_y\sqrt{p_yq_y}. \]
For quantum states, the root fidelity satisfies
\[ F(\rho,\sigma) = \min_{\{M_y\}}\sum_y \sqrt{\operatorname{Tr}(M_y\rho)\operatorname{Tr}(M_y\sigma)}. \]
Now let Bob measure the output states \(\mathcal N(\rho)\) and \(\mathcal N(\sigma)\) with an arbitrary POVM \(\{M_y\}\). The outcome probabilities are
\[ p_y'=\operatorname{Tr}(M_y\mathcal N(\rho)), \qquad q_y'=\operatorname{Tr}(M_y\mathcal N(\sigma)). \]
Using the adjoint channel \(\mathcal N^\dagger\), these can be rewritten as
\[ p_y'=\operatorname{Tr}(\mathcal N^\dagger(M_y)\rho), \qquad q_y'=\operatorname{Tr}(\mathcal N^\dagger(M_y)\sigma). \]
Because \(\mathcal N\) is trace preserving, \(\mathcal N^\dagger\) is unital:
\[ \mathcal N^\dagger(I_B)=I_A. \]
Because \(\mathcal N\) is positive, \(\mathcal N^\dagger\) maps positive operators to positive operators. Therefore
\[ \widetilde M_y=\mathcal N^\dagger(M_y) \]
is a POVM on the input system. Thus every measurement after the channel is equivalent to some measurement before the channel.
The fidelity after the channel is the minimum classical overlap over output measurements. But output measurements correspond only to a subset of all possible input measurements. Minimizing over a smaller set cannot produce a smaller value than minimizing over the full set. Therefore
\[ F(\mathcal N(\rho),\mathcal N(\sigma)) \ge F(\rho,\sigma). \]
This proof makes the data-processing meaning transparent. Measuring after a channel cannot reveal a smaller classical overlap than the best measurement that could already have been performed before the channel.
Example: unitary channels preserve fidelity exactly
Let
\[ \mathcal U(\rho)=U\rho U^\dagger \]
for a unitary \(U\). Then
\[ F(U\rho U^\dagger,U\sigma U^\dagger)=F(\rho,\sigma). \]
This follows immediately from the trace-norm formula:
\[ \sqrt{U\rho U^\dagger}=U\sqrt\rho U^\dagger, \]
so
\[ \sqrt{U\rho U^\dagger}\sqrt{U\sigma U^\dagger} = U\sqrt\rho\sqrt\sigma U^\dagger. \]
The trace norm is invariant under unitary conjugation, hence the fidelity is unchanged.
Operationally, a unitary transformation is reversible. It may rotate the coordinate system in which states are written, but it does not erase or create distinguishability. Therefore the closeness between the two states is exactly preserved.
Example: dephasing can increase fidelity
Consider the states
\[ |+\rangle=\frac{|0\rangle+|1\rangle}{\sqrt2}, \qquad |-\rangle=\frac{|0\rangle-|1\rangle}{\sqrt2}. \]
They are orthogonal, so
\[ F(|+\rangle\langle+|,|-\rangle\langle-|)=0. \]
Now apply the complete dephasing channel in the computational basis:
\[ \Delta_Z(\rho) = |0\rangle\langle0|\rho|0\rangle\langle0| + |1\rangle\langle1|\rho|1\rangle\langle1|. \]
Both states become maximally mixed on the diagonal:
\[ \Delta_Z(|+\rangle\langle+|)=\frac I2, \qquad \Delta_Z(|-\rangle\langle-|)=\frac I2. \]
Therefore
\[ F\left(\Delta_Z(|+\rangle\langle+|),\Delta_Z(|-\rangle\langle-|)\right) = F\left(\frac I2,\frac I2\right) =1. \]
The channel has erased exactly the phase information that distinguished \(|+\rangle\) from \(|-\rangle\). The states were perfectly distinguishable before dephasing and identical afterward. Fidelity increased from \(0\) to \(1\).
This example captures the operational meaning of the theorem. Noise can make states more alike.
Example: partial trace can increase fidelity
Consider two Bell states:
\[ |\Phi^+\rangle_{AB} = \frac{|00\rangle+|11\rangle}{\sqrt2}, \]
and
\[ |\Phi^-\rangle_{AB} = \frac{|00\rangle-|11\rangle}{\sqrt2}. \]
They are orthogonal pure states, so
\[ F(|\Phi^+\rangle\langle\Phi^+|,|\Phi^-\rangle\langle\Phi^-|)=0. \]
Now discard system \(B\). Both reduced states on \(A\) are
\[ \operatorname{Tr}_B(|\Phi^+\rangle\langle\Phi^+|)=\frac I2, \]
and
\[ \operatorname{Tr}_B(|\Phi^-\rangle\langle\Phi^-|)=\frac I2. \]
Thus
\[ F\left( \operatorname{Tr}_B|\Phi^+\rangle\langle\Phi^+|, \operatorname{Tr}_B|\Phi^-\rangle\langle\Phi^-| \right) =1. \]
The two global states differ by a relative phase stored in the joint correlation between \(A\) and \(B\). If we keep only \(A\), that difference disappears. Partial trace is a quantum channel, so fidelity cannot decrease; here it increases maximally.
Example: depolarizing noise
For a qubit, define the depolarizing channel
\[ \mathcal D_p(\rho)=(1-p)\rho+p\frac I2, \qquad 0\le p\le1. \]
Take the two orthogonal states
\[ \rho=|0\rangle\langle0|, \qquad \sigma=|1\rangle\langle1|. \]
Before the channel,
\[ F(\rho,\sigma)=0. \]
After the channel,
\[ \mathcal D_p(\rho)= \begin{pmatrix} 1-p/2&0\\ 0&p/2 \end{pmatrix}, \]
and
\[ \mathcal D_p(\sigma)= \begin{pmatrix} p/2&0\\ 0&1-p/2 \end{pmatrix}. \]
These states commute, so their fidelity is the classical Bhattacharyya coefficient:
\[ F(\mathcal D_p(\rho),\mathcal D_p(\sigma)) = 2\sqrt{\left(1-\frac p2\right)\frac p2}. \]
For \(p=0\), this is \(0\), as expected. For \(p=1\), both states become \(I/2\), and the fidelity becomes \(1\). Depolarizing noise continuously pushes distinct states toward the same maximally mixed state.
Example: amplitude damping
Amplitude damping with damping probability \(\gamma\) has Kraus operators
\[ E_0=|0\rangle\langle0|+\sqrt{1-\gamma}|1\rangle\langle1|, \]
and
\[ E_1=\sqrt\gamma |0\rangle\langle1|. \]
Apply this channel to
\[ \rho=|0\rangle\langle0|, \qquad \sigma=|1\rangle\langle1|. \]
The ground state remains fixed:
\[ \mathcal A_\gamma(|0\rangle\langle0|)=|0\rangle\langle0|. \]
The excited state becomes
\[ \mathcal A_\gamma(|1\rangle\langle1|) = (1-\gamma)|1\rangle\langle1|+ \gamma |0\rangle\langle0|. \]
The fidelity between a pure state \(|0\rangle\langle0|\) and a state \(\tau\) is
\[ F(|0\rangle\langle0|,\tau)=\sqrt{\langle0|\tau|0\rangle}. \]
Therefore
\[ F(\mathcal A_\gamma(|0\rangle\langle0|), \mathcal A_\gamma(|1\rangle\langle1|)) = \sqrt\gamma. \]
Before damping, the fidelity was \(0\). After damping, the states become less distinguishable because the excited state has partially decayed into the ground state. At \(\gamma=1\), both inputs end as \(|0\rangle\), and the fidelity becomes \(1\).
Classical stochastic maps as a special case
Suppose \(\rho\) and \(\sigma\) are diagonal in the same basis:
\[ \rho=\sum_x p_x|x\rangle\langle x|, \qquad \sigma=\sum_x q_x|x\rangle\langle x|. \]
Then
\[ F(\rho,\sigma)=\sum_x\sqrt{p_xq_x}. \]
A classical stochastic map \(T(y|x)\) sends these distributions to
\[ p'_y=\sum_xT(y|x)p_x, \qquad q'_y=\sum_xT(y|x)q_x. \]
The quantum monotonicity theorem reduces to the classical inequality
\[ \sum_y\sqrt{p'_yq'_y} \ge \sum_x\sqrt{p_xq_x}. \]
Classical random processing cannot make two probability distributions less overlapping. It can merge outcomes, add noise, or forget information. All of these operations can only increase their classical overlap. The quantum theorem is the noncommutative version of this fact.
Relation to trace distance and the Fuchs-van de Graaf inequalities
The monotonicity of fidelity and the contractivity of trace distance are two sides of the same data-processing principle. Trace distance is large when states are easy to distinguish, so it decreases under channels:
\[ D(\mathcal N(\rho),\mathcal N(\sigma)) \le D(\rho,\sigma). \]
Fidelity is large when states are close, so it increases under channels:
\[ F(\mathcal N(\rho),\mathcal N(\sigma)) \ge F(\rho,\sigma). \]
The Fuchs-van de Graaf inequalities connect the two quantities:
\[ 1-F(\rho,\sigma) \le D(\rho,\sigma) \le \sqrt{1-F(\rho,\sigma)^2}. \]
Thus one can translate a fidelity guarantee into a trace-distance guarantee, and vice versa. In applications, this is extremely useful. Fidelity is often easier to prove using purifications and Uhlmann's theorem. Trace distance is often easier to interpret operationally because it controls measurement distinguishability.
Equality and strict increase
Equality holds for unitary channels and, more generally, for reversible isometries. In those cases, no information is lost, so the fidelity is exactly preserved.
The fidelity can strictly increase when the channel discards information that helped distinguish the states. Dephasing can erase phase differences. Partial trace can erase correlations. Depolarization can push states toward the maximally mixed state. Amplitude damping can send different energy states toward the same ground state.
A useful way to remember this is:
\[ \text{reversible processing preserves fidelity; irreversible processing may increase it.} \]
The theorem says only that fidelity cannot go down. It does not say that it must go up.
Common mistakes
A common mistake is to reverse the inequality. Fidelity is a closeness measure, not a distinguishability measure. Therefore channels satisfy
\[ F(\mathcal N(\rho),\mathcal N(\sigma)) \ge F(\rho,\sigma), \]
not the opposite.
A second mistake is to forget the convention. If fidelity is defined as the squared quantity, the same monotonicity holds, but all formulas involving Fuchs-van de Graaf inequalities and pure-state overlaps must be adjusted accordingly.
A third mistake is to think monotonicity means noise always makes states identical. Noise may increase fidelity, but it can also preserve fidelity for some pairs of states. For example, dephasing preserves the fidelity between \(|0\rangle\) and \(|1\rangle\), because those states are already perfectly distinguished in the dephasing basis.
A fourth mistake is to apply the theorem when different channels act on the two states. The theorem assumes the same channel \(\mathcal N\) is applied to both \(\rho\) and \(\sigma\). If different maps are applied, distinguishability can be artificially changed by the maps themselves.
Final mental image
Fidelity measures how close two quantum states are. A quantum channel is a physical processing step applied equally to both states. Since every channel can be represented by an isometric interaction with an environment followed by discarding that environment, and since discarding information cannot reveal a new difference between the states, the output states must be at least as close as the input states:
\[ F(\mathcal N(\rho),\mathcal N(\sigma)) \ge F(\rho,\sigma). \]
So the theorem can be remembered as follows:
\[ \text{quantum processing cannot reduce closeness.} \]
Or, equivalently:
\[ \text{noise may hide distinctions, but it cannot create them.} \]
This is why monotonicity of fidelity is a basic data-processing theorem in quantum information. It makes fidelity stable under later physical operations, which is exactly what one needs in quantum communication, error correction, cryptography, and approximation theory.
References
Uhlmann, Armin. “The ‘transition probability’ in the state space of a -algebra.” Reports on Mathematical Physics* 9, no. 2 (1976): 273–279.
Jozsa, Richard. “Fidelity for Mixed Quantum States.” Journal of Modern Optics 41, no. 12 (1994): 2315–2323.
Fuchs, Christopher A., and Carlton M. Caves. “Ensemble-Dependent Bounds for Accessible Information in Quantum Mechanics.” Physical Review Letters 73, no. 23 (1994): 3047–3050.
Fuchs, Christopher A., and Jeroen van de Graaf. “Cryptographic Distinguishability Measures for Quantum-Mechanical States.” IEEE Transactions on Information Theory 45, no. 4 (1999): 1216–1227.
Nielsen, Michael A., and Isaac L. Chuang. Quantum Computation and Quantum Information. Cambridge University Press, 10th anniversary edition, 2010.
Watrous, John. The Theory of Quantum Information. Cambridge University Press, 2018.
Wilde, Mark M. Quantum Information Theory. Cambridge University Press, 2nd edition, 2017.