Noise-induced shallow circuits and the absence of barren plateaus

Summarize this article with:
MainUnderstanding the impact of noise is one of the central questions for today’s quantum computers1. A key issue is whether noisy devices can already provide an advantage—either for practically relevant problems2,3,4,5 or as proof-of-principle demonstrations6—or whether error-corrected logical qubits are ultimately required4,7. Recent years have seen a tussle between demonstrations of quantum advantage3,6,8,9,10 and subsequent efficient classical simulation11,12,13,14,15,16,17,18,19,20.Noise plays a multifaceted role across near-term quantum computing. In quantum machine learning, certain noise models can induce barren plateaus, flattening optimization landscapes and suppressing quantum signals21,22. In random circuit sampling10, noise can render the dynamics efficiently classically simulable23. Yet, most prior work assumes local, unital and primitive noise (for example, depolarizing), whereas in many physical platforms it is more realistic to consider non-unital noise6,24,25,26, which can decrease entropy and make depolarizing models misleading, as already observed in fault tolerance27 and random circuit sampling28.In this work, we develop a unified picture of how possibly non-unital noise affects typical quantum circuits. We assume only that the noise is local and incoherent, meaning it has a tensor-product structure and is non-unitary. Our main results provide guidance on the impact of quantum noise and are as follows.Effective depthWe show that arbitrarily deep random quantum circuits, under any uncorrected, possibly non-unital noise, effectively get ‘truncated’: the influence of gates on observable expectation values decreases exponentially with their distance from the last layer, so only the last layers contribute significantly. In particular, for typical circuits this implies that, within the task of estimating expectation values, non-unital and unital noise lead to essentially the same effective complexity. It is well known that, under certain unital noise models, such as depolarizing noise, meaningful computation must be confined to logarithmic depth29. We prove that the same logarithmic-depth limitation holds for typical circuits under arbitrary local noise—even non-unital: all but the last \(\log n\) layers can be discarded without affecting predicted expectation values.Lack of barren plateausUnder non-unital noise, we get a provable lack of barren plateaus for cost functions made out of local observables—that is, the cost landscape is never flat and gradients do not vanish—at any depth. This also implies that local expectation values of arbitrary deep random circuits with non-unital noise are not too concentrated towards a fixed value, in stark contrast to the unital-noise scenario21. This phenomenon, however, is not good news for variational quantum algorithms22, as we show that such circuits behave like shallow circuits, which have limited computational power.Classical simulationFurthermore, exploiting this effective shallowness, we show how to efficiently classically simulate—on average over the circuit ensemble—expectation values of any local observable up to constant additive precision, at any depth and in any circuit architecture.In this work, we focus on the problem of estimating expectation values of observables, rather than sampling tasks. This focus is motivated by two considerations. First, in condensed-matter physics and quantum simulation, the physically relevant quantities that diagnose phases of matter, reveal order parameters or determine response functions are expectation values of local or few-body observables. Second, in variational quantum algorithms and quantum machine learning, the central task is to estimate cost functions made by observables expectation values on noisy near-term devices. For these reasons, expectation values are the natural figure of merit for the questions we target, and form the basis of our analysis.In summary, our results show that most quantum circuits with non-unital noise at any depth behave qualitatively as (noisy) shallow circuits for estimating observable expectation values. Beyond this task, we further establish that the majority of noisy quantum circuits Φ with depth at least linear in the number of qubits become independent of the initial state: for any two states ρ and σ, the trace distance between Φ(ρ) and Φ(σ) vanishes exponentially in the number of qubits. Although our noise model is significantly more general than in much prior work, our results hold only on average over a well-motivated class of circuits and do not apply to every circuit. However, this limitation is necessary; specifically, it reflects the fact that not every quantum circuit without access to fresh auxiliary systems becomes computationally trivial after a certain number of operations under more general noise, unlike circuits subjected to depolarizing noise30. For instance, ref. 27 has shown that it is possible to perform exponentially long quantum computations under non-unital noise, with specially constructed circuits. Because of this, we cannot expect to prove our statements for all quantum circuits with non-unital noise.From a technical perspective, our results rely on bounding various second moments of observable expectation values under noisy random quantum circuits. In particular, we show how combining a normal form of qubit channels31 with a reduction to ensembles of random Clifford circuits renders most computations tractable. The only assumption we require is satisfied for any architecture in which the local gates form 2-designs32,33, making our results widely applicable.Taken together, our findings substantially advance the understanding of noise in near-term quantum computation and indicate that, unless circuits are carefully engineered to exploit non-unital effects (for example, as in ref. 27), a quantum computer with non-unital noise is unlikely to outperform one with depolarizing noise.Noise-induced effective shallow circuitsWe now present our main results on the effective depth of noisy random quantum circuits with respect to estimating observable expectation values. Our findings show that the influence of gates decays exponentially with their distance from the final layer. We formalize this phenomenon as follows.Theorem 1 (effective logarithmic depth)Let O be an observable, ρ0 an arbitrary initial state, L the circuit depth and \(m\in {\mathbb{N}}\). Assume the noise is local and decomposes into single-qubit non-unitary channels. Then$${{\mathbb{E}}}_{{\varPhi }_{[L-m,L]}}| {\rm{T}}{\rm{r}}\,(O\varPhi ({\rho }_{0}))-{\rm{T}}{\rm{r}}\,(O{\varPhi }_{[L-m,L]}({\sigma }_{0}))| \le \parallel O{\parallel }_{\infty }{{\rm{e}}}^{-\alpha m}.$$ (1) Here, σ0 is any fixed reference state and α > 0 depends only on the noise parameters, and Φ[L−m, L] denotes the noisy circuit obtained by restricting Φ to its last m layers, that is, discarding all gates and noise operations preceding layer L − m. The expectation \({{\mathbb{E}}}_{{\varPhi }_{[L-m,L]}}\) is taken over the randomness of the two-qubit gates in these last m layers, assumed to form a local 2-design.Thus, with high probability over the circuit ensemble, only the last \(\varTheta (\log n)\) layers influence observable expectation values up to inverse-polynomial precision. In this sense, deep noisy random circuits behave effectively as shallow ones. The previous theorem actually follows from a stronger second-moment estimate.Theorem 2For general scaling, let ρ, σ be arbitrary states and P ∈ {I, X, Y, Z}⊗n a Pauli operator of weight ∣P∣. For circuit depth m$${{\mathbb{E}}}_{\varPhi }[{({\rm{T}}{\rm{r}}(P\varPhi (\rho -\sigma )))}^{2}]\le {{\rm{e}}}^{-\varOmega (m+| P| )}.$$ (2) The proof proceeds in the Heisenberg picture by iteratively applying adjoint noisy layers to P, showing that Pauli expectations contract exponentially in depth. By Jensen’s inequality$${{\mathbb{E}}}_{{\Phi }}[| {\rm{T}}{\rm{r}}(P\varPhi (\rho ))-{\rm{T}}{\rm{r}}(P\varPhi (\sigma ))| ]\le {{\rm{e}}}^{-\varOmega (m+| P| )}.$$ (3) As a consequence, for linear depth m = Ω(n)$${{\mathbb{E}}}_{\varPhi }[\parallel \varPhi (\rho )-\varPhi (\sigma ){\parallel }_{1}]\le {{\rm{e}}}^{-\varOmega (m)}.$$ (4) This implies that the application of the same linear depth random circuit affected by any amount of noise on two different input states renders them effectively indistinguishable (because of the Holevo–Helstrom theorem34). To our knowledge, this kind of result was not known before; except for the result of ref. 35, which applies only to exponential depths but holds for worst-case circuits, whereas our statement holds on average. We remark that it is in principle not possible to prove our result for worst-case non-unital noisy circuits, because there are some special classes of circuits27 that would violate a worst-case version of our inequality (that is, equation (4) without the expectation value). However, in a sufficiently high-noise regime, we can also prove a worst-case contraction bound in trace distance (that is, without averaging over circuits). Concretely, there exists an explicit constant \(b=b({\mathcal{N}})\in (0,1)\), depending only on the single-qubit noise channel \({\mathcal{N}}\), such that for any depth-m noisy circuit Φ and any two states ρ, σ, we find$$\parallel \varPhi (\rho )-\varPhi (\sigma ){\parallel }_{1}\,\le \,n\,{b}^{m}\,\parallel \rho -\sigma {\parallel }_{1}.$$ (5) In particular, for any ε > 0, if \(m=\varOmega (\log (n/\varepsilon ))\), then ∥Φ(ρ) − Φ(σ)∥1 ≤ ε.The proof relies on contraction properties of the quantum Wasserstein distance of order 1 (ref. 36). Bounds of this form are often referred to as reverse threshold theorems, as they show that, above a certain noise level, long computations (and, hence, error correction) become impossible37,38,39. Here, we extend this type of statement to non-unital noise.Classical simulation of random quantum circuits with possibly non-unital noiseWe consider the classical task of estimating expectation values produced by noisy random quantum circuits. Given a depth-L noisy circuit instance Φ, whose two-qubit gates are sampled uniformly at random from a fixed architecture, together with an initial state ρ0 and an observable O, the goal is to estimate \({\rm{T}}{\rm{r}}(O\varPhi ({\rho }_{0}))\) to additive accuracy ε with high probability over the choice of Φ. We assume that O is a linear combination of M = poly(n) local Pauli operators. Because the runtime scales linearly in M, it suffices to treat the case M = 1, that is, estimating the expectation value of a single local Pauli operator P. High-Pauli-weight components are exponentially suppressed and can be handled separately.The effective-depth picture yields an average-case truncation guarantee. Let$$C(\varPhi ):={\rm{T}}{\rm{r}}\,(P\,\varPhi ({\rho }_{0})),$$ (6) $${C}_{m}(\varPhi ):={\rm{T}}{\rm{r}}\,(P\,{\varPhi }_{[L-m,L]}(| {0}^{n}\rangle \langle {0}^{n}| {0}^{n})),$$ (7) where Φ[L−m, L] denotes the noisy circuit obtained by restricting Φ to its last m layers. Then, theorem 2 implies$${{\mathbb{E}}}_{{\varPhi }_{[L-m,L]}}[| C(\varPhi )-{C}_{m}(\varPhi )| ]\le \rm{e}^{-\varOmega (m+| P| )}.$$This suggests the following estimator. In the Heisenberg picture, compute \({P}_{m}:={\varPhi }_{[L-m,L]}^{* }(P)\), and output$$\widehat{C}:=Tr\,({P}_{m}| {0}^{n}\rangle \langle {0}^{n}| {0}^{n}).$$ (8) The runtime is governed by the size of the light cone of P under \({\varPhi }_{[L-m,L]}^{* }\).Proposition 3For the average classical simulation of local expectation values, let ε, δ > 0. Fix a local Pauli operator P and an arbitrary initial state ρ0. Let Φ be a noisy circuit of depth L drawn from the described random-circuit ensemble. Then there exists a classical algorithm that outputs a value \(\widehat{C}\) such that$$| \widehat{C}-{\rm{T}}{\rm{r}}(P\varPhi ({\rho }_{0}))| \le \varepsilon$$ (9) with probability at least 1 − δ over the choice of the random circuit. One valid output is the estimator (8) with$$m=\left\lceil \frac{1}{\log ({c}^{-1})}\log \,\left(\frac{4}{{\rm{\delta }}{\varepsilon }^{2}}\right)\right\rceil,$$ (10) where δ denotes the allowed failure probability, so that the estimator succeeds with probability at least 1 − δ and c 0 and an arbitrary noise channel with ∥t∥2 < 1, that is, \({\tilde{\mathcal{N}}}:={\mathcal{N}}\circ {{\mathcal{N}}}_{p}^{({\rm{d}}{\rm{e}}{\rm{p}})}\). Thus, we obtain$${\rm{T}}{\rm{r}}[\varPhi (\rho ){\varPhi }^{{\prime} }(\rho )]\le {2}^{n({\delta }_{L}-1)},$$ (28) where \({\delta }_{L}:={(1-p)}^{2L}+\parallel {\bf{t}}{\parallel }_{2}\frac{1-{(1-p)}^{2L}}{2p-{p}^{2}}\). We emphasize that, when the noise is purely depolarizing, that is, ∥t∥2 = 0, this bound predicts exponential concentration at any depth, thus improving a previous result given in ref. 47, predicting exponential concentration at linear depth.
