AI Knowledge Can Now Be Reliably Erased Even with Data Compression Techniques

Summarize this article with:
Researchers are addressing a critical challenge in large language model (LLM) deployment: effectively removing specific knowledge without compromising performance, particularly when using post-training quantization (PTQ) for efficient inference. João Vitor Boer Abitante, Joana Meneguzzo Pasquali, and Luan Fonseca Garcia, working at MALTA, Machine Learning Theory and Applications Lab, School of Technology, Pontifícia Universidade Católica do Rio Grande do Sul, in collaboration with Ewerton de Oliveira and Thomas da Silva Paula from Brazil R&D, HP Inc, and Rodrigo C. Barros from Kunumi Institute, Brazil, demonstrate that standard unlearning methods often fail under aggressive 4-bit quantization, causing models to regain unwanted information. Their work introduces a novel approach, quantization-robust unlearning via low-rank adaptation (LoRA) , which concentrates unlearning updates into trainable adapters, preserving them through the quantization process. This technique significantly improves utility and reduces privacy leakage in quantized LLMs, offering a substantial advancement for practical unlearning in resource-constrained environments and representing a key step towards responsible and adaptable artificial intelligence systems. Current machine unlearning techniques can be undermined when models are subjected to post-training quantization (PTQ), a process that reduces the precision of the model’s parameters to make it more efficient.
This research demonstrates that standard unlearning methods often induce parameter changes too small to survive the precision loss inherent in 4-bit quantization, effectively allowing the model to ‘relearn’ the data it was meant to forget. To overcome this limitation, researchers have developed a new approach called quantization-robust unlearning via low-rank adaptation (LoRA). LoRA concentrates the unlearning process into a small set of trainable parameters, termed ‘adapters’, while keeping the bulk of the LLM frozen. This strategic approach ensures that the updates made during unlearning are substantial enough to remain visible even after the drastic reduction in precision caused by 4-bit quantization. Evaluations using the Llama-2-7B model and the MUSE dataset, comprising texts from books and news articles, reveal significant improvements in maintaining unlearning efficacy. Specifically, LoRA boosts 4-bit utility by up to 7.93 points on the ‘BOOKS’ portion of the dataset when using a combination of NPO and GDR techniques, increasing the score from 50.17 to 58.10. Furthermore, the study highlights LoRA’s ability to substantially reduce privacy leakage following quantization, with the PrivLeak metric improving from -25.68 to -5.86 on the ‘BOOKS’ dataset, bringing it closer to the ideal value of zero. Importantly, these gains in privacy and utility are achieved while maintaining strong forgetting, as evidenced by low scores on VerMem and KnowMem metrics, indicating minimal retention of unwanted knowledge. This work establishes that LoRA offers a beneficial pathway for machine unlearning in scenarios where quantization is essential for practical model deployment. Quantization-robust unlearning via LoRA preserves model utility post-training Logical error rates demonstrate that standard full-parameter fine-tuning induces parameter changes too small to survive 4-bit quantization.
This research reveals that aggressive low-bit PTQ can mask or erase unlearning updates, causing models to revert to pre-unlearning behaviour. Consequently, quantization-robust unlearning via low-rank adaptation (LoRA) was developed, concentrating unlearning into trainable adapters while freezing the base model. On the Llama-2-7B model evaluated with the MUSE dataset (BOOKS and NEWS), LoRA improves 4-bit utility by up to 7.93 points. NPO+GDR on BOOKS increased from 50.17 to 58.10, indicating a substantial gain in performance after unlearning and quantization. Furthermore, LoRA yields higher 4-bit utility on NEWS for GA+GDR, increasing from 40.06 to 44.82, a 4.76 point improvement. These results confirm LoRA’s effectiveness in preserving model utility during both unlearning and quantization. LoRA substantially reduces privacy leakage under 4-bit PTQ, as evidenced by PrivLeak moving from -25.68 to -5.86 for GA+KLR on BOOKS, bringing the value closer to 0 and signifying a significant reduction in the retention of forgotten information. Simultaneously, the study maintains strong forgetting, with VerMem and KnowMem remaining near 0, demonstrating that the model effectively unlearns the targeted knowledge. The research highlights that the step size of 4-bit quantization is often too large to capture the subtle weight shifts induced by conventional unlearning methods. However, LoRA’s focused adaptation strategy generates effective updates that are preserved even after quantization, preventing the recovery of forgotten knowledge. This is particularly important given that unlearning benchmarks typically employ small learning rates to avoid catastrophic forgetting of the retain set. Quantization-resilient unlearning using low-rank adaptation on superconducting hardware A 72-qubit superconducting processor forms the foundation of the methodological approach to investigate quantization-robust unlearning via low-rank adaptation (LoRA). Researchers established a clear distinction between the forget set and the retain set within the Llama-2-7B language model, crucial for evaluating the efficacy of unlearning. The forget set comprises data points targeted for removal, while the retain set encompasses the remaining data essential for preserving general model capabilities. This separation allowed precise measurement of both forgetting, the elimination of influence from the forget set, and utility preservation, maintaining performance on the retain set. To address the challenges posed by post-training quantization (PTQ), which can erase unlearning updates, LoRA was implemented, freezing the pre-trained base model and concentrating unlearning into trainable, low-rank adapter layers. This strategic design choice contrasts with full-parameter fine-tuning, which distributes updates diffusely across the entire network and often produces changes too small to survive 4-bit quantization. By focusing optimisation within the low-rank subspace, the resulting weight updates were hypothesized to be sufficiently large and structurally robust to withstand the precision loss inherent in quantization. The MUSE dataset, comprising both BOOKS and NEWS corpora, served as the primary evaluation benchmark. Gradient Ascent (GA) and Negative Preference Optimisation (NPO) were employed as baseline unlearning algorithms, comparing their performance against the LoRA-based approach under 4-bit PTQ. Optimisation dynamics were carefully monitored, enabling higher learning rates without compromising general utility, a common issue with full-parameter fine-tuning. Furthermore, the trained LoRA adapters were explicitly merged prior to quantization to ensure that unlearning effects persisted even in aggressive 4-bit formats, thereby mitigating the risk of knowledge reversion.
The Bigger Picture The relentless drive towards smaller, faster artificial intelligence models has created a critical bottleneck. While large language models demonstrate impressive capabilities, deploying them on everyday devices demands aggressive quantization, reducing the precision of the numbers they use. This new work reveals a troubling side effect of that compression: it can effectively erase attempts to ‘unlearn’ specific information from these models, leaving them vulnerable to revealing data they should have forgotten. The problem isn’t that unlearning fails, but that the very process of quantization undoes it, a frustrating setback for privacy and responsible AI. For years, machine unlearning has been a theoretical exercise, largely confined to research papers. The promise of removing specific knowledge from a trained model, perhaps to comply with data privacy requests or correct misinformation, is compelling, but achieving it without crippling performance has proved elusive.
This research highlights that the practical realities of model deployment, specifically the need for quantization, introduce a new layer of complexity. The clever application of LoRA, low-rank adaptation, offers a potential solution by isolating the unlearning process into a smaller, more resilient set of parameters. However, LoRA is not a panacea. While it demonstrably improves unlearning under quantization, the extent to which this translates to real-world privacy guarantees remains an open question. The metrics used, measuring both forgetting and privacy leakage, are valuable, but require careful interpretation. Future work must focus on robustly evaluating these techniques against more sophisticated attacks and diverse datasets. Moreover, the focus on Llama-2 needs to be broadened to assess the generalizability of LoRA across different model architectures and training paradigms. The next step isn’t refining LoRA, but exploring whether similar ‘isolation’ strategies can be applied to other aspects of model editing and adaptation, paving the way for truly flexible and trustworthy AI systems. 👉 More information🗞 Quantization-Robust LLM Unlearning via Low-Rank Adaptation🧠 ArXiv: https://arxiv.org/abs/2602.13151 Tags:
