Grover’s Search Advances Massive MIMO User Scheduling for 5G and B5G

Summarize this article with:
Researchers are tackling the considerable computational challenges of 5G and Beyond 5G (B5G) massive MIMO user scheduling, a problem complicated by its scale and the need for efficient Channel State Information (CSI) management. Ruining Fan from University College Dublin, alongside Xingyu Huang and Mouli Chakraborty et al., present a novel framework integrating Grover’s search algorithm with Quantum Reinforcement Learning (QRL) to navigate the exponentially large scheduling space more effectively. Their work is significant because it demonstrates a practical implementation using a quantum-gate-based circuit mirroring reinforcement learning layers, and crucially, simulation results show substantial performance gains over both classical Convolutional Networks and existing Quantum Deep Learning benchmarks. Quantum reinforcement learning for massive MIMO scheduling The model is implemented using a quantum gate-based circuit, imitating the layered architecture of reinforcement learning, where quantum operations act as policy updates and decision-making units . Efficient user scheduling remains a significant algorithmic challenge in massive MIMO systems . Existing works address this problem using various methods, including traditional machine learning (ML) and some quantum computation-assisted proposals . Due to the constrained computational capabilities of base stations (BSs), traditional approaches are restricted to forecasting CSI for upcoming time intervals based on existing datasets . Previous studies often require supplementary methods, such as long short-term memory networks, for support . Powered by quantum-based algorithms, CSI database traversal becomes more achievable, resulting in a higher average sum rate for hybrid quantum communication systems compared to those using traditional ML methods . The scheduling task can be viewed as a combinatorial search for optimal user, antenna allocations, where Grover’s amplitude amplification could highlight high-reward policies effectively . Additionally, the oracle can ensure the exploration, exploitation trade-off in reinforcement learning, making it a natural foundation for the proposed QRL scheme . Grover’s Search Algorithm is a QRL-based searching algorithm that establishes a quantum search technique capable of amplifying the probability of finding optimal solutions within an unsorted database quadratically faster than classical algorithms . The oracle identifies feasible solutions, and amplitude amplification directs the learning path towards nearly optimal policies . The implementation involves a quantum gate-based circuit which imitates the function of the layered structure of a reinforcement learning architecture, where one-to-one quantum operations play the role of decision maker and policy updates . This integration reduces the computational burden of CSI traversal and enables faster convergence toward high-throughput scheduling strategies, overcoming the scalability barrier faced by classical methods . It also lays the foundation for a Grover’s search-inspired quantum communication algorithm, potentially playing a significant role in 5G and B5G . Simultaneously, the system achieves near-optimal scheduling with higher throughput than classical methods . Furthermore, the model’s evaluation and scalability are illustrated, comparing the proposed QRL model with conventional CNN and Quantum Neural Networks (QNN) benchmarks under various scenarios . The scalability of the method is demonstrated by consistently achieving higher average sum rates as the user-antenna configurations are scaled up, potentially essential for B5G networks . The remaining sections present the detailed design of the proposed algorithm, followed by numerical analysis and comparative evaluation against existing methods . A single-cell massive MIMO downlink system is considered, containing A antennas at the BS, serving T terminal users with a single antenna . For each user, the downlink channel is denoted as nt ∈CA . The antenna configurations at the BS have flexibility, but a rectangular configuration is adopted: the BS contains X rows of antenna strings and Y antennas in each row . The number of users served by the BS simultaneously is T ≤ A = X · Y . The received signal at each terminal user t can be presented as yt = nt · x + gt, where x = PT t=1 √pt · st are the signals transmitted through all the antennas in the BS, with E[|st|2] = 1 being the data symbols of user t . The noise component gt follows the pattern of Additive White Gaussian Noise (AWGN) that gt ∼CN(0, σ2 t) . For fairness, each user receives the same amount of energy P = √pt = p Ptotal/T, ∀t . The ergodic sum rate can be presented as Ssum = T X t=St, where St is the ergodic rate, deduced by the Shannon capacity and the corresponding concept of Signal-to-interference-plus-noise Ratio (SINR): St ≈log2[1 + Et(SINRt)] . Due to the large-scale Grover’s search problem, the approximation form of the ergodic rate for each user t can be used without affecting the final training outcome of the system . Compared to Rayleigh fading channel conditions used widely in large-scale MIMO systems, the Rician fading channel model has a Rician-K factor which balances channel conditions between pure Line of Sight (LoS) and full scattering, giving more robustness to the proposed algorithm by yielding a flexible feedback under structured channels . The stacked channel of the system is written as N = [n1, ., nu] ∈CA×T, known as conventional CSI . In the correlated Rician fading channel, the channel vector is defined as Nr t = K · nLoS t + K · nNLoS t p Mt, where K and K are coefficients deducted from the Rician K-factor, nLoS t ∈CT ×1 denotes the LoS channel vector components, with nNLoS t representing the Non Line-of-sight (NLoS) mean-centred components with normalized variance . Moreover, Mt ∈CT ×T is the channel correlation matrix of the NLoS components . With the assistance of the Rician fading channel, equation (1) can be represented in a detailed and beam-forming manner: yt = T X t=1 PNr tftSt + gt = PNr tftSt + T X x=1,x=t PNr xfxSx . Note that the time (t + 1) represents the long-term achievable ergodic sum rate, and ω ∈(0, 1) is the forgetting factor for the historical rate, and a is a small number that prevents division by zero . Grover’s search is an algorithm that utilises quantum superposition and parallelism, traversing the database quadratically faster than classical computers, assisted by the quantum circuit . The abstract architecture is illustrated in Fig0.1, with solid lines representing its three main layers . The initial superposition layer, also known as the Hadamard layer, is responsible for initialising the state of each input qubit . Hadamard gates create a uniform superposition over all possible states, allowing the Grover’s agent to search the entire solution space in parallel . With T users, the same number of Hadamard gates and wires are required . Initially, N = T qubits are used to encode all the users, |ψ0⟩= |0⟩ N N, which are then fed to the Hadamard gates with the corresponding dimension, resulting in |ψ1⟩= H N N |ψ0⟩= A 2N−1X i=0 · |θ⟩, where A = 1 √ 2N, is the uniformed probability amplitude for each superposed states . |θ⟩= |θ1θ2 . θN⟩ is the computational basis states, converted from the scheduling vector in equation (8) . Furthermore, in the second oracle layer, some positive action in the iteration will be marked, flipping the phase of the states using Pauli-X pre-processing and a followed multi-controlled Z gate . The marked states are stored in a space M ⊆{0, 1}N, and the oracle marking action obeys the rule OM |θ⟩= ( −|θ⟩, θ ∈M, |θ⟩, θ /∈M . Moreover, the diffusion operator layer mirrors and reflects the oracles from the last layer, but centred around the uniform state . This layer relies on a diffusion operator, defined as Diff = 2 · |U⟩⟨U| −I . Note that |U⟩ is the uniform state from equation (12), and I denotes the identity matrix, ensuring the core amplification steps in Grover’s search, which perform inversion about the mean . The detailed design of the proposed Grover’s search quantum circuit is illustrated in Fig0.2, using only 5 qubits for simplicity and can be extended to higher dimensions . Grover’s search enhances quantum reinforcement learning scheduling efficiency Results demonstrate a 51% performance improvement over CNN and a 43% improvement over QDL, establishing a substantial advancement in mMIMO user scheduling capabilities. The research focused on a single-cell massive MIMO downlink system containing A antennas at the base station, serving T terminal users each equipped with a single antenna. The system adopted a rectangular antenna configuration with X rows of antenna strings and Y antennas per row, where the number of served users, T, is less than or equal to the total number of antennas, A, defined as X multiplied by Y. The study measured the received signal at each terminal user, represented as yt = nt · x + gt, where x denotes the signals transmitted through all antennas and gt represents Additive White Gaussian Noise following a complex normal distribution with zero mean and variance σ2 t. Scientists calculated the ergodic sum rate to evaluate system performance, considering the impact of channel conditions and transmit power. Tests prove the scalability of the method, consistently achieving higher average sum rates as the user-antenna configurations are scaled up, which is potentially essential for future B5G networks. The breakthrough delivers a Grover’s search-inspired quantum communication algorithm that could play a significant role in enhancing the efficiency and capacity of wireless communication systems. This work establishes a foundation for more efficient and scalable mMIMO systems, paving the way for improved performance in next-generation wireless technologies. Quantum reinforcement learning boosts mMIMO scheduling performance Specifically, the QRL model exhibits a more sensitive response to high Signal-to-Noise Ratio (SNR) environments, achieving a higher average sum rate, over 5 dB improvement in certain configurations, and maintaining robustness across varying user and antenna scaling scenarios. The authors acknowledge that the current validation is limited to a configuration of six users and eight antennas, which may not fully represent all potential mMIMO deployments. Future research could focus on extending the model to larger and more complex network configurations and exploring its application in future quantum communication networks. 👉 More information 🗞 Grover’s Search-Inspired Quantum Reinforcement Learning for Massive MIMO User Scheduling 🧠 ArXiv: https://arxiv.org/abs/2601.20688 Tags:
