New Software Accelerates Complex Calculations by up to 500times

Summarize this article with:
Scientists have developed a new software package, lrux, to accelerate a key computational step in Monte Carlo methods. Ao Chen from the Division of Chemistry and Chemical Engineering at the California Institute of Technology, alongside Ao Chen and Christopher Roth from the Center for Computational Quantum Physics at the Flatiron Institute, present a JAX-based solution for fast low-rank updates of determinants and Pfaffians.
This research significantly reduces the computational cost of wavefunction evaluations, potentially achieving speedups of up to 100x for large matrices and enabling more scalable, high-performance simulations of quantum systems. lrux’s native integration with JAX transformations and support for both real and complex data types position it as a versatile component for a broad range of quantum Monte Carlo workflows. This innovation enables scalable, high-performance evaluation of antisymmetric wavefunctions, crucial for modelling complex quantum systems. The core of this achievement lies in the implementation of low-rank updates for both determinants and Pfaffians, alongside delayed-update strategies that optimise performance on modern accelerator hardware. By leveraging JAX transformations, including just-in-time compilation, vectorisation, and automatic differentiation, lrux seamlessly integrates into existing QMC workflows. The package supports both real and complex data types, broadening its applicability across diverse quantum simulations. Benchmarking on graphics processing units (GPUs) has demonstrated speedups of up to 1000× at large matrix sizes, signifying a substantial leap in computational efficiency. Furthermore, the approach is applicable to fermionic neural quantum states, provided the orbital transformations admit a low-rank representation. Illustrative examples and recommended environment settings are provided to facilitate adoption, with a strong recommendation to enable double-precision arithmetic for improved numerical stability in large-scale simulations. The work leverages the matrix determinant lemma to compute determinant ratios using a k × k matrix, Rt, rather than recalculating the full determinant. lrux natively integrates with JAX transformations, including just-in-time compilation, vectorization, and automatic differentiation, enabling efficient utilization of modern GPU architectures. Benchmarks conducted on GPUs demonstrated speedups of up to 1000× at large matrix sizes, highlighting the performance gains achieved through parallelization and optimized linear algebra operations. Furthermore, the package supports delayed-update strategies, trading increased floating-point operations for reduced memory traffic, allowing users to tailor performance to their specific hardware configurations. Low-rank updates deliver substantial speedups in determinant and Pfaffian calculations A speedup of up to 1000x in Pfaffian computation has been demonstrated using a new software package, lrux, on an A100-80GB GPU. Benchmarks reveal that for a matrix size of 1024, lrux accelerates determinant calculations by approximately 200x and Pfaffian calculations by approximately 1000x. The work details an implementation in JAX that achieves O(n2k) scaling for successive updates, where n represents matrix size and k is the update rank. The research presents a low-rank update technique that reduces the computational cost of wavefunction evaluations from O(n3) to O(n2) when the update rank is smaller than the matrix dimension. Time cost analysis, performed with parallel computations of 1024 determinants and Pfaffians, shows that direct computation scales with O(n3), while lrux achieves O(n2) scaling for large matrix sizes. Specifically, the time cost for both determinants and Pfaffians using lrux remains below 10−2 seconds even as the matrix size increases to 1024. Further optimization through delayed updates provides an additional speedup of 20% to 40%. These delayed updates were tested using parallel computations of 16384 determinants and Pfaffians with a matrix size of 128, with the optimal delay parameter determined to be 16 for determinants and 4 for Pfaffians. The study establishes that the efficiency of delayed updates is comparable to direct updates, offering a pathway to maximize computational performance when LRU is the primary bottleneck. The package supports both real and complex data types and integrates natively with JAX transformations including JIT compilation and vectorization. The package achieves this through efficient low-rank update strategies, reducing the computational cost of successive wavefunction evaluations from O(n3) to O(n2k) when the update rank, k, is smaller than the matrix dimension, n. Both determinant and Pfaffian updates are supported, alongside delayed-update options that balance computational effort with data transfer on modern processors. Lrux integrates seamlessly with JAX, a high-performance numerical computation library, benefiting from features like just-in-time compilation, vectorisation, and automatic differentiation, while accommodating both real and complex data types. Benchmarking on graphics processing units demonstrates speedups of up to 50x at large matrix sizes, indicating substantial performance gains. While direct matrix inverse updates offer a more reliable approach if parameter tuning is undesirable, lrux provides a flexible and robust foundation for large-scale simulations, enabling efficient evaluation of antisymmetric wavefunctions and paving the way for next-generation sampling algorithms. 👉 More information 🗞 lrux: Fast low-rank updates of determinants and Pfaffians in JAX 🧠 ArXiv: https://arxiv.org/abs/2602.05255 Tags:
