quantum-computing

Researchers Use Reinforcement Learning to Prepare Quantum States Efficiently

Quantum Zeitgeist

7 min read

0 likes

⚡ Quantum Brief

AI Audio Summary

0:00 / 0:00

Click to play

Researchers Use Reinforcement Learning to Prepare Quantum States Efficiently

Summarize this article with:

A new method for preparing quantum states advances both quantum simulation and quantum computing. Xiaotian Nie and colleagues at Intelligent Quantum Inception Co, in collaboration with Tsinghua University, present an adaptive measurement-feedback protocol driven by reinforcement learning that operates even with limited information. The method addresses a key challenge by avoiding the need for complete quantum-state knowledge, instead utilising only the history of measurement outcomes to guide the process. By employing a new stochastic reward system, the team successfully prepared ground states of the Bose-Hubbard model and generated GHZ states, demonstrating a scalable and experimentally viable pathway towards strong quantum state preparation. Adaptive reinforcement learning accelerates ground state preparation and entanglement generation A six-fold improvement in energy convergence speed for ground-state preparation was achieved, reducing the required measurement time from γT greater than 3 to γT of 1.2 in non-interacting systems. This breakthrough surpasses previous methods reliant on fixed protocols, which struggled to reach comparable low-energy states, particularly in strongly interacting and near-critical regimes where control is typically more challenging.

Intelligent Quantum Inception Co and Tsinghua University developed an adaptive measurement-feedback protocol using reinforcement learning, enabling scalable quantum state preparation without requiring full knowledge of the quantum system.

The team also successfully prepared GHZ states, a key resource for quantum information processing, using only single-qubit measurements and feedback rotations, achieving energies close to the ground state of -4. However, current results do not demonstrate performance with larger systems or address the challenges of maintaining coherence in real-world hardware. Discretised weak measurement and feedback control implementation Time is treated in a discretized form, partitioning it into short intervals of duration δt. During each interval, the system undergoes weak measurement with respect to the observable ct, followed by application of a feedback unitary generated by Ft. A weak measurement of the Hermitian observable ct, with measurement strength γ, is represented by the Kraus operator: Mt(cm’t) = 4γδt π 1/4 e−2γδt (ct−cm,t)2, where cm’t is the corresponding noisy measurement outcome that follows a normal distribution: P (cm,t) ∼N μ = ⟨ct⟩, σ2 = 1 8γδt. The parameter γ controls the trade-off between information gain and measurement backaction; a larger value improves measurement precision but also increases disturbance to the quantum state. Based on the measurement result cm’t, the feedback operator Ft is chosen according to the policy and applied to modify the system’s evolution. The time-evolution unitary operator is given by Ut = e−i( H+ Ft)δt, where H is the original system Hamiltonian. Consequently, the system state |ψ(t)⟩ evolves over one time step to |ψ(t + δt)⟩∝Ut Mt (cm,t)|ψ(t)⟩. The aim is to start from an experimentally accessible initial state, such as a product state or a fully polarized configuration, and drive the system toward its ground state through a measurement, feedback control process guided by a learned policy. This task is cast as a partially observable Markov decision process (POMDP), where the controller receives only the noisy stream of measurement outcomes, not full knowledge of the quantum state. The measurement observable ct and feedback operator Ft are parameterised in fixed operator bases: ct = X i αt,i c(i), Ft = X i βt,i F (i), where {c(i)}, { F (i)} form fixed basis operators. This means the weight vectors αt and βt fully specify the measurement and feedback actions at step t. A GRU recurrent network observes the measurement weights αt and the measurement outcome cm’t, then outputs the feedback weights βt and the next measurement weights αt+1. Because the feedback evolution is deterministic after cm’t is registered, αt+1 can be produced in the same forward pass. To maintain experimental compatibility, the reward must also be accessible. Ideally, the reward would be the negative energy expectation ⟨−H⟩ at the final state, but this is infeasible within a single trajectory as expectation values require averaging over multiple trajectories. Furthermore, non-commuting Hamiltonian terms require incompatible measurement settings; in the Bose, Hubbard model, the hopping term Hkin is measured via time-of-flight imaging while the interaction term Hint uses in-situ imaging. Therefore, the terminal reward is constructed from a single randomly sampled term. Writing H = P k Hk, one term Hk is chosen with probability pk and measured, yielding eigenvalue Eki. The importance-weighted reward R = −1 pk Eki is an unbiased estimator of the negative total energy, since E[R] = −P k⟨Hk⟩= −⟨H⟩. To improve training stability, each term is centred at its target ground-state expectation, defining Hk = Hk −⟨Hk⟩0 with ⟨· · · ⟩0 the expectation in the target ground state. The reward R = −(1/pk) Eki then has zero mean at the ground state for every sampled term, changing the objective only by a constant. Centering and optimal term sampling substantially suppress the variance while keeping the reward unbiased and experimentally compatible. With this reward design, training of the measurement, feedback control policy is no longer confined to simulation environments that rely on privileged access to the full quantum state and are ultimately limited by the exponential growth of Hilbert space. Instead, the same training framework becomes compatible with experimental trajectories. The parameters of the recurrent policy are optimised using proximal policy optimisation (PPO, implemented with the PureJaxRL library), a stable policy-gradient method that limits excessively large updates between successive iterations. In each training round, the agent interacts with the measurement, feedback loop to collect trajectories, estimates the corresponding returns and advantages from the stochastic terminal reward, and updates the policy accordingly. Repeating this procedure yields a closed-loop measurement, feedback strategy that progressively drives the system toward the target low-energy state. Numerical demonstrations illustrate the proposed framework on two representative tasks: ground-state preparation in the Bose, Hubbard model (BHM) and GHZ-state preparation. The ground state of the one-dimensional four-site Bose, Hubbard model at unit filling is considered, with the Hamiltonian containing only hopping and on-site interaction terms, HBHM = −J X i a† iai+1 + H.c. + U 2X i ni(ni−1). Following the operator choice of Wu et al., the measured observable and feedback operator are ct = X j αt,j nj, Ft = (βt,1 + iβt,2) X j a† jaj+1 + H.c., so that the policy adaptively chooses the density-weighted measurement profile and the complex hopping-feedback amplitude. Three regimes, non-interacting, strong-interaction, and near-critical, are studied at fixed measurement strength γ/J = 0.3, initialising in the unit-filling product state |1, 1, 1, 1⟩ with a 10% admixture of single particle, hole excitations to model perturbations. Quantum state preparation from routine measurements bypasses detailed system characterisation Scientists are increasingly focused on preparing quantum states, a vital process for both simulating complex systems and building future quantum computers. This work offers a distinct advantage through its practicality, as the new method circumvents the need for exhaustive quantum-state knowledge, a long-standing obstacle in the field. Instead, it relies on readily obtainable measurement results to guide the process. While acknowledging the proliferation of competing quantum control techniques like shortcuts to adiabaticity and reinforcement learning, the method prepares quantum states using only measurement results routinely collected during experiments. This sidesteps the need for detailed knowledge of the system’s quantum state, which is often inaccessible. Scientists demonstrated a new method for preparing ground states of quantum systems and generating GHZ states using only routinely collected measurement data. This is important because existing techniques often require complete knowledge of a system’s quantum state, which is impractical for larger, more complex systems. The researchers successfully applied this adaptive measurement-feedback protocol to the Bose-Hubbard model and GHZ-state preparation, establishing a scalable approach to quantum state preparation. The authors indicate this method avoids reconstructing the full quantum state while still accurately estimating the target energy. 👉 More information🗞 Experiment-compatible measurement–feedback quantum state preparation with reinforcement learning🧠 ArXiv: https://arxiv.org/abs/2606.13005 Stay current. See today’s quantum computing news on Quantum Zeitgeist for the latest breakthroughs in qubits, hardware, algorithms, and industry deals. Tags:

Read Original

Source Information

Source: Quantum Zeitgeist

Website: https://quantumzeitgeist.com/feed/