Quantum Neyman-Pearson Lemma

Formal statement

Let \(\rho\) and \(\sigma\) be density operators on a finite-dimensional Hilbert space \(\mathcal H\). We want to decide between two hypotheses:

\[ H_0:\text{ the state is }\rho, \qquad H_1:\text{ the state is }\sigma. \]

A binary quantum test is described by an operator \(T\) satisfying

\[ 0\le T\le I. \]

We interpret \(T\) as the POVM effect for accepting \(H_0\), namely for deciding that the state was \(\rho\). The complementary effect \(I-T\) corresponds to accepting \(H_1\), namely deciding that the state was \(\sigma\).

The type-I error is the probability of rejecting \(H_0\) when \(H_0\) is true:

\[ \alpha(T)=\operatorname{Tr}[(I-T)\rho] =1-\operatorname{Tr}(T\rho). \]

The type-II error is the probability of accepting \(H_0\) when \(H_1\) is true:

\[ \beta(T)=\operatorname{Tr}(T\sigma). \]

For a fixed allowed type-I error \(\varepsilon\), the asymmetric hypothesis-testing problem is

\[ \beta_\varepsilon(\rho\|\sigma) = \min_{0\le T\le I} \left\{ \operatorname{Tr}(T\sigma): \operatorname{Tr}(T\rho)\ge 1-\varepsilon \right\}. \]

The quantum Neyman-Pearson lemma says that an optimal test is obtained from the positive spectral subspace of a weighted operator difference

\[ \rho-\lambda\sigma \]

for a suitable threshold parameter \(\lambda\ge0\). More precisely, for each \(\lambda\ge0\), define

\[ \Delta_\lambda=\rho-\lambda\sigma. \]

Let \(\Pi_+^\lambda\), \(\Pi_-^\lambda\), and \(\Pi_0^\lambda\) denote the projectors onto the positive, negative, and zero eigenspaces of \(\Delta_\lambda\), respectively. Then every test of the form

\[ T_\lambda = \Pi_+^\lambda+Q_0, \qquad 0\le Q_0\le \Pi_0^\lambda, \]

is optimal for the corresponding Lagrange problem

\[ \max_{0\le T\le I}\operatorname{Tr}\!\left[T(\rho-\lambda\sigma)\right]. \]

If \(Q_0\) is chosen so that

\[ \operatorname{Tr}(T_\lambda\rho)=1-\varepsilon, \]

then \(T_\lambda\) is optimal for the constrained Neyman-Pearson problem \(\beta_\varepsilon(\rho\|\sigma)\). If exact equality cannot be reached without boundary randomization, the freedom \(0\le Q_0\le\Pi_0^\lambda\) is precisely the quantum analogue of randomizing on the likelihood-ratio boundary.

This is the asymmetric version of binary quantum state discrimination. The symmetric Bayesian version with priors is the Helstrom-Holevo theorem. The same positive-part idea appears there with the weighted operator \(p\rho-(1-p)\sigma\).

What problem the theorem solves

The Helstrom theorem answers the symmetric Bayesian question: if \(\rho\) is prepared with prior probability \(p\) and \(\sigma\) with prior probability \(1-p\), what measurement minimizes the average error probability? The Neyman-Pearson version answers a different question. It says: suppose we care more about one error than the other. Suppose we are willing to tolerate type-I error at most \(\varepsilon\), and among all such tests we want the smallest possible type-II error. What measurement should we use?

This asymmetric formulation is extremely important in statistics and information theory. In many scientific or engineering decisions, the two mistakes are not equally costly. In quantum communication, quantum cryptography, and quantum Shannon theory, one often fixes the probability of rejecting the true state \(\rho\) and then studies how fast the probability of mistaking \(\sigma\) for \(\rho\) can decay over many copies.

The operational mental image is this. The operator

\[ \Delta_\lambda=\rho-\lambda\sigma \]

is a quantum likelihood-ratio comparison. On subspaces where \(\Delta_\lambda\) is positive, the weighted evidence favors \(\rho\). On subspaces where it is negative, the weighted evidence favors \(\sigma\). The parameter \(\lambda\) controls how conservative the test is about accepting \(\rho\). A larger \(\lambda\) makes the test demand stronger evidence before accepting \(\rho\), which usually reduces type-II error at the cost of increasing type-I error.

Proof of the positive-part optimization

The core of the theorem is a simple operator inequality. Fix \(\lambda\ge0\), and write the Jordan decomposition of the Hermitian operator

\[ \Delta_\lambda=\rho-\lambda\sigma \]

\[ \Delta_\lambda=(\Delta_\lambda)_+-(\Delta_\lambda)_-, \]

where

\[ (\Delta_\lambda)_+\ge0, \qquad (\Delta_\lambda)_-\ge0, \]

and the supports of the two positive operators are orthogonal. The operator \((\Delta_\lambda)_+\) is the positive part of \(\Delta_\lambda\), and \((\Delta_\lambda)_-\) is the negative part with its sign removed.

Let \(T\) be any test, so \(0\le T\le I\). Then

\[ \operatorname{Tr}(T\Delta_\lambda) = \operatorname{Tr}\bigl(T(\Delta_\lambda)_+\bigr) - \operatorname{Tr}\bigl(T(\Delta_\lambda)_-\bigr). \]

Because \(T\ge0\) and \((\Delta_\lambda)_-\ge0\), we have

\[ \operatorname{Tr}\bigl(T(\Delta_\lambda)_-\bigr)\ge0. \]

Therefore

\[ \operatorname{Tr}(T\Delta_\lambda) \le \operatorname{Tr}\bigl(T(\Delta_\lambda)_+\bigr). \]

Since \(T\le I\), we also have

\[ \operatorname{Tr}\bigl(T(\Delta_\lambda)_+\bigr) \le \operatorname{Tr}(\Delta_\lambda)_+. \]

Combining the inequalities gives

\[ \operatorname{Tr}(T\Delta_\lambda) \le \operatorname{Tr}(\Delta_\lambda)_+. \]

This upper bound is achieved by choosing

\[ T=\Pi_+^\lambda, \]

the projector onto the positive eigenspace of \(\Delta_\lambda\). More generally, any choice

\[ T=\Pi_+^\lambda+Q_0, \qquad 0\le Q_0\le \Pi_0^\lambda, \]

also achieves the same value, because the zero eigenspace contributes nothing to \(\operatorname{Tr}(T\Delta_\lambda)\). Thus

\[ \max_{0\le T\le I}\operatorname{Tr}(T\Delta_\lambda) = \operatorname{Tr}(\Delta_\lambda)_+. \]

This proves the Lagrange form of the lemma.

From the Lagrange form to the constrained test

Now suppose \(T_\lambda\) is a positive-part test satisfying

\[ \operatorname{Tr}(T_\lambda\rho)=1-\varepsilon. \]

We show that it minimizes the type-II error among all tests whose type-I error is at most \(\varepsilon\).

Let \(T\) be any feasible test, so

\[ \operatorname{Tr}(T\rho) \ge 1-\varepsilon = \operatorname{Tr}(T_\lambda\rho). \]

Because \(T_\lambda\) maximizes \(\operatorname{Tr}(T\Delta_\lambda)\), we have

\[ \operatorname{Tr}(T_\lambda\Delta_\lambda) \ge \operatorname{Tr}(T\Delta_\lambda). \]

Substituting \(\Delta_\lambda=\rho-\lambda\sigma\), this becomes

\[ \operatorname{Tr}(T_\lambda\rho)-\lambda\operatorname{Tr}(T_\lambda\sigma) \ge \operatorname{Tr}(T\rho)-\lambda\operatorname{Tr}(T\sigma). \]

Rearranging gives

\[ \lambda\left[\operatorname{Tr}(T\sigma)-\operatorname{Tr}(T_\lambda\sigma)\right] \ge \operatorname{Tr}(T\rho)-\operatorname{Tr}(T_\lambda\rho). \]

The right-hand side is nonnegative because \(T\) is feasible. Therefore, when \(\lambda>0\),

\[ \operatorname{Tr}(T\sigma) \ge \operatorname{Tr}(T_\lambda\sigma). \]

This says

\[ \beta(T) \ge \beta(T_\lambda). \]

Thus \(T_\lambda\) has the smallest possible type-II error among all tests satisfying the type-I constraint. This proves the constrained Neyman-Pearson statement.

The special boundary cases \(\lambda=0\), zero eigenspaces, or inactive constraints are handled by the same positive-part principle together with the freedom to randomize on the boundary. In finite dimension the feasible set \(0\le T\le I\) is compact and convex, so an optimal test exists; the supporting-threshold argument above identifies its structure.

Connection with the classical Neyman-Pearson lemma

The classical Neyman-Pearson lemma says that the most powerful test at a fixed significance level is a likelihood-ratio test. If the classical distributions are \(r(x)\) under \(H_0\) and \(s(x)\) under \(H_1\), then one accepts \(H_0\) when

\[ r(x)>\lambda s(x), \]

rejects \(H_0\) when

\[ r(x)<\lambda s(x), \]

and possibly randomizes when equality holds.

The quantum theorem is exactly the noncommutative version of this rule. If \(\rho\) and \(\sigma\) commute, they can be diagonalized in the same basis:

\[ \rho=\sum_x r_x|x\rangle\langle x|, \qquad \sigma=\sum_x s_x|x\rangle\langle x|. \]

Then

\[ \rho-\lambda\sigma = \sum_x(r_x-\lambda s_x)|x\rangle\langle x|. \]

The positive projector \(\Pi_+^\lambda\) is exactly the projector onto those classical outcomes \(x\) for which

\[ r_x>\lambda s_x. \]

So the quantum test reduces to the classical likelihood-ratio test.

The only new difficulty in quantum theory is that \(\rho\) and \(\sigma\) may not commute. Then there is no single classical outcome space on which both states are simultaneously diagonal. The theorem says that the correct replacement for the likelihood ratio is the spectral sign of the Hermitian operator \(\rho-\lambda\sigma\).

Example: a classical commuting test

Let

\[ \rho= \begin{pmatrix} 0.8&0\\ 0&0.2 \end{pmatrix}, \qquad \sigma= \begin{pmatrix} 0.3&0\\ 0&0.7 \end{pmatrix}. \]

These are diagonal states, so this is a classical problem in quantum notation. Suppose we want type-I error at most

\[ \varepsilon=0.2. \]

The test that accepts \(H_0\) only on the first outcome is

\[ T=|0\rangle\langle0|. \]

Its type-I error is

\[ \alpha(T)=1-\operatorname{Tr}(T\rho)=1-0.8=0.2. \]

Its type-II error is

\[ \beta(T)=\operatorname{Tr}(T\sigma)=0.3. \]

This test is produced by the positive part of \(\rho-\lambda\sigma\). For example, take \(\lambda=1\). Then

\[ \rho-\sigma= \begin{pmatrix} 0.5&0\\ 0&-0.5 \end{pmatrix}. \]

The positive subspace is spanned by \(|0\rangle\), so the Neyman-Pearson test is

\[ T=|0\rangle\langle0|. \]

Operationally, this says: accept \(H_0\) only when the observed outcome is much more likely under \(\rho\) than under \(\sigma\).

Example: the noncommuting two-state test

Let

\[ \rho=|0\rangle\langle0|, \qquad \sigma=|+\rangle\langle+|, \qquad |+\rangle=\frac{|0\rangle+|1\rangle}{\sqrt2}. \]

These states do not commute. There is no basis in which both are simultaneously classical. Take \(\lambda=1\). Then

\[ \Delta=\rho-\sigma. \]

In the computational basis,

\[ \rho= \begin{pmatrix} 1&0\\ 0&0 \end{pmatrix}, \qquad \sigma= \frac12 \begin{pmatrix} 1&1\\ 1&1 \end{pmatrix}, \]

\[ \Delta= \begin{pmatrix} 1/2&-1/2\\ -1/2&-1/2 \end{pmatrix}. \]

The eigenvalues of \(\Delta\) are

\[ +\frac1{\sqrt2} \qquad\text{and}\qquad -\frac1{\sqrt2}. \]

The positive eigenvector is proportional to

\[ |0\rangle+(1-\sqrt2)|1\rangle. \]

Therefore the optimal threshold test for \(\lambda=1\) is not a measurement in the \(|0\rangle,|1\rangle\) basis and not a measurement in the \(|+\rangle,|-\rangle\) basis. It is the projective measurement onto the positive and negative eigenspaces of \(\rho-\sigma\).

This is the first genuinely quantum lesson of the theorem. The best asymmetric test is not generally obtained by measuring either state in its own eigenbasis. It is obtained by diagonalizing the weighted difference operator.

Example: relation to the Helstrom theorem

If the goal is not to constrain one error but to minimize the average error with priors \(p\) and \(q=1-p\), then the success probability for a test \(T\) that accepts \(\rho\) is

\[ P_{\mathrm{succ}}(T) = p\operatorname{Tr}(T\rho)+q\operatorname{Tr}((I-T)\sigma). \]

This can be rewritten as

\[ P_{\mathrm{succ}}(T) = q+ \operatorname{Tr}[T(p\rho-q\sigma)]. \]

Therefore the optimal Bayesian test is the positive projector of

\[ p\rho-q\sigma. \]

This is the Helstrom measurement. Thus the Helstrom theorem is the Bayesian version of the same positive-part principle, while the Neyman-Pearson lemma is the constrained asymmetric version.

For equal priors, \(p=q=1/2\), the relevant operator is proportional to

\[ \rho-\sigma. \]

The optimal success probability becomes

\[ P_{\mathrm{succ}}^{\mathrm{opt}} = \frac12\left(1+\frac12\|\rho-\sigma\|_1\right). \]

This is the usual trace-distance form of binary quantum state discrimination.

Boundary randomization

In finite classical hypothesis testing, the likelihood-ratio test sometimes needs randomization at the threshold in order to hit a prescribed type-I error exactly. The same thing happens quantum mechanically.

If \(\rho-\lambda\sigma\) has a nontrivial zero eigenspace, then vectors in that eigenspace satisfy exact weighted balance between the two hypotheses. Including or excluding that subspace does not change the Lagrange objective

\[ \operatorname{Tr}[T(\rho-\lambda\sigma)]. \]

Therefore the theorem allows

\[ T=\Pi_+^\lambda+Q_0, \qquad 0\le Q_0\le\Pi_0^\lambda. \]

The operator \(Q_0\) is a randomized or partial decision on the boundary subspace. It is chosen to make

\[ \operatorname{Tr}(T\rho)=1-\varepsilon \]

when exact saturation of the type-I constraint is desired.

Operationally, the zero eigenspace is the region where the test is exactly indifferent between the two weighted hypotheses. Randomization there changes the error probabilities without changing the optimality of the threshold rule.

How this theorem is used in quantum information

The single-shot quantity

\[ \beta_\varepsilon(\rho\|\sigma) = \min_{0\le T\le I} \left\{ \operatorname{Tr}(T\sigma): \operatorname{Tr}(T\rho)\ge1-\varepsilon \right\} \]

is one of the basic objects of one-shot quantum information theory. Its logarithmic version is the hypothesis-testing relative entropy,

\[ D_H^\varepsilon(\rho\|\sigma) = -\log \beta_\varepsilon(\rho\|\sigma). \]

The Neyman-Pearson lemma tells us the structure of the optimizing test. For many-copy states, one applies the same idea to

\[ \rho^{\otimes n} \qquad\text{and}\qquad \sigma^{\otimes n}. \]

The asymptotic behavior of the optimal type-II error under a fixed type-I constraint leads to quantum Stein's lemma, whose exponent is the quantum relative entropy \(D(\rho\|\sigma)\). Quantum Stein's lemma and its strong converse were developed by Hiai-Petz and Ogawa-Nagaoka, and later asymmetric error-exponent refinements include the quantum Hoeffding and Chernoff bounds.

Thus the Neyman-Pearson lemma is not only a one-shot measurement rule. It is the local building block behind asymptotic quantum hypothesis testing, one-shot information theory, converse bounds, coding theorems, and many resource-theoretic distinguishability arguments.

Common mistakes

A common mistake is to confuse the Neyman-Pearson problem with the Helstrom problem. The Helstrom problem minimizes average error with fixed priors. The Neyman-Pearson problem fixes or bounds one kind of error and minimizes the other. Both are solved by positive parts of weighted differences, but the interpretation of the weight is different.

Another common mistake is to forget which POVM element means which decision. In this presentation, \(T\) means “accept \(H_0\),” so

\[ \alpha(T)=1-\operatorname{Tr}(T\rho) \]

and

\[ \beta(T)=\operatorname{Tr}(T\sigma). \]

Some books use the opposite convention, where the test operator means “reject \(H_0\).” The formulas then look transposed, but the mathematics is identical.

A third mistake is to assume that the optimal test is a measurement in the eigenbasis of \(\rho\) or \(\sigma\). This is generally false when the states do not commute. The correct operator to diagonalize is

\[ \rho-\lambda\sigma. \]

A fourth mistake is to ignore randomization on the zero eigenspace. Boundary randomization is not an artificial trick. It is required when the allowed type-I error lies between two discrete acceptance probabilities.

Final mental image

The quantum Neyman-Pearson lemma is the asymmetric decision rule for two quantum states. It says that the optimal test is obtained by forming a weighted evidence operator

\[ \rho-\lambda\sigma, \]

diagonalizing it, accepting \(\rho\) on the positive subspace, accepting \(\sigma\) on the negative subspace, and possibly randomizing on the zero subspace.

In one sentence:

\[ \text{the optimal asymmetric quantum test is the sign measurement of a weighted state difference.} \]

The theorem is useful because it converts an optimization over all binary POVMs into a spectral decomposition of one Hermitian operator. It is the quantum analogue of the classical likelihood-ratio test, and it is the foundation of asymmetric quantum hypothesis testing.

References

Neyman, Jerzy, and Egon S. Pearson. “On the Problem of the Most Efficient Tests of Statistical Hypotheses.” Philosophical Transactions of the Royal Society of London. Series A 231 (1933): 289–337.

Helstrom, Carl W. Quantum Detection and Estimation Theory. Academic Press, 1976.

Holevo, Alexander S. Probabilistic and Statistical Aspects of Quantum Theory. North-Holland, 1982; Springer reprint, 2011.

Watrous, John. The Theory of Quantum Information. Cambridge University Press, 2018.

Ogawa, Tomohiro, and Hiroshi Nagaoka. “Strong Converse and Stein’s Lemma in Quantum Hypothesis Testing.” IEEE Transactions on Information Theory 46, no. 7 (2000): 2428–2433.

Hayashi, Masahito. Quantum Information Theory: Mathematical Foundation. Springer, 2017.

Audenaert, K. M. R., M. Nussbaum, A. Szkoła, and F. Verstraete. “Asymptotic Error Rates in Quantum Hypothesis Testing.” Communications in Mathematical Physics 279 (2008): 251–283.

AI tools