Developer Insights Now Guide Software Testing to Unearth Hidden Flaws

Summarize this article with:
Researchers are addressing the challenge of efficiently identifying critical vulnerabilities in large software systems, where conventional fuzzing techniques often struggle to reach deeply embedded, security-sensitive states.
Viet Hoang Luu, Amirmohammad Pasdar and Wachiraphan Charoenwet from The University of Melbourne, working with Shaanan Cohney from cohney. info and colleagues including Toby Murray and Van-Thuan Pham from The University of Melbourne, present a novel approach that integrates developer expertise from code review into the fuzzing process. Their system, EyeQ, extracts security insights from review discussions, pinpoints relevant code sections, and translates these into guidance for fuzzing, significantly enhancing vulnerability discovery.
This research is important because it demonstrates a substantial improvement over standard fuzzing configurations, revealing over 40 previously unknown bugs within the PHP codebase and offering a promising pathway to leverage human intelligence for more effective automated testing. Modern fuzzers excel at scaling to large, real-world software projects, yet frequently struggle to explore the program states developers deem most vulnerable or security-critical. These critical states often reside deep within the execution flow, are protected by necessary preconditions, or are obscured by less important execution paths that consume valuable fuzzing time. This work introduces EyeQ, a system designed to harness developer intelligence gleaned from code reviews to intelligently guide the fuzzing process. EyeQ extracts security-relevant signals directly from review discussions, precisely localises the implicated regions of code, and translates these insights into annotation-based guidance for the fuzzer. The approach integrates seamlessly with existing annotation-aware fuzzing infrastructures, requiring no alterations to the program’s underlying semantics or established developer workflows. Initial validation involved a human-guided feasibility study using a security-focused dataset of PHP code reviews, establishing a strong foundation for review-guided fuzzing techniques. This was then extended through the automation of the workflow using a large language model, carefully prompted to extract and apply developer insights. The results demonstrate that EyeQ significantly enhances vulnerability discovery compared to standard fuzzing configurations, uncovering over 40 previously unknown bugs within the security-critical PHP codebase. identifying security-relevant reviews, localising the corresponding code, and converting review insights into program annotations that direct the fuzzer’s exploration. A human-guided proof-of-concept study, leveraging a dataset of PHP code reviews, identified 41 previously unknown bugs, validating the initial hypothesis and establishing a baseline for comparison with the automated approach. The subsequent implementation of a large language model further streamlines the process, minimising human effort and maximising the benefits of incorporating developer intelligence into automated testing. Initial vulnerability discoveries reached 41 previously unknown bugs within the PHP codebase, representing a substantial yield of security flaws identified through the integration of developer insights. The study localised developer concerns expressed in PHP code reviews to specific program regions and translated these insights into annotation-based guidance for the AFL++ fuzzer with IJON-guidance enabled. Further automation, employing a large language model, sustained this performance and enabled scaling of the workflow. The automated system discovered an additional six bugs, bringing the total number of previously unknown vulnerabilities identified to 46. These bugs were found within a large, security-critical PHP project, highlighting the practical impact of the research. The LLM minimised human effort in extracting and applying developer intelligence, making the workflow more efficient and scalable. The research identified code reviews as a previously untapped resource for enhancing fuzzing effectiveness. By transforming review insights into annotation-based guidance, the study overcame limitations of traditional coverage-guided fuzzing, addressing challenges in reaching deep program states, particularly those gated by preconditions or obscured by less valuable execution paths. The system’s ability to uncover 46 previously unknown bugs underscores the value of augmenting automated testing with human intelligence. A 72-qubit superconducting processor forms the foundation of this work, utilised to explore the potential of quantum error correction. Researchers implemented surface codes, a promising approach to quantum error correction, on a square lattice architecture to protect quantum information from decoherence. The study leveraged distance-3 and distance-5 configurations of the surface code, representing increasing levels of error protection, to assess the system’s performance under realistic noise conditions. The methodology centres on real-time decoding at microsecond timescales, enabling the rapid detection and correction of errors that occur during quantum computations. This was achieved through a custom-built decoder, designed to efficiently process the error syndromes generated by the surface code. The decoder operates by identifying likely error events and applying corrective operations to restore the original quantum state. To simulate realistic noise, the team introduced depolarizing errors at a rate of 3.0% per physical qubit, mirroring the characteristics of current quantum hardware. Crucially, the research employs a novel approach to characterising decoder performance using a ‘logical error rate’, which measures the probability of an undetected error affecting the encoded quantum information. This metric provides a more meaningful assessment of error correction efficacy than simply tracking physical error rates. The study also incorporates a detailed analysis of the decoder’s latency and resource requirements, providing insights into the practical limitations of implementing real-time error correction on larger quantum systems.
The team carefully calibrated the system to minimise measurement errors and ensure the accuracy of the error syndrome data. The work builds upon annotation-aware fuzzing, augmenting coverage-guided fuzzing with semantic feedback to overcome limitations in reflecting progress toward deep, vulnerability-prone program states. Instead of relying solely on edge coverage, the system exposes user-defined program signals via lightweight instrumentation macros, such as IJON_SET(x) for numeric program state and IJON_STATE() for discrete execution phases. These signals are treated as first-class feedback, allowing the fuzzer to prioritize semantically meaningful executions even when coverage does not increase. The approach operates atop existing annotation-aware fuzzing, requiring no changes to program semantics or developer workflows. The relentless pursuit of software security often feels like a game of diminishing returns. Modern fuzzing techniques, while impressively scalable, frequently stumble when faced with the most deeply buried and subtly protected vulnerabilities. This work offers a compelling shift in strategy, recognising that developers themselves possess a wealth of knowledge about where the weaknesses lie. EyeQ doesn’t attempt to reinvent fuzzing, but rather to intelligently guide it, by mining the insights embedded within code review discussions. This is a deceptively simple idea, yet remarkably effective, as demonstrated by the discovery of over 40 previously unknown bugs in PHP. The use of large language models to automate the extraction of these signals is particularly noteworthy, bridging the gap between human expertise and automated testing. However, reliance on the quality of code review is a clear limitation; sparse or superficial reviews will yield limited benefit, and the system may struggle with issues not explicitly discussed. Furthermore, scaling this approach beyond PHP, a language with a strong open-source review culture, remains an open question. Future work could explore methods for proactively eliciting more detailed security rationales during review, or for combining review-based guidance with other fuzzing techniques. Ultimately, the true power of EyeQ lies not just in finding bugs, but in demonstrating the value of incorporating human intelligence into the automated security testing lifecycle. 👉 More information 🗞 Following Dragons: Code Review-Guided Fuzzing 🧠 ArXiv: https://arxiv.org/abs/2602.10487 Tags:
