Back to News
quantum-computing

Researchers Reduce Data Exposure with Privacy Methods

Quantum Zeitgeist
Loading...
9 min read
0 likes
⚡ Quantum Brief
Researchers from Umeå and Curtin Universities developed KD-UFSL, a framework combining k-anonymity and differential privacy to secure federated split learning against data reconstruction attacks. The study reveals vulnerabilities in intermediate data shared during decentralized AI training, enabling adversaries to reconstruct private client information—a critical risk in healthcare and financial applications. KD-UFSL reduces data leakage by up to 50% in mean squared error and 40% in structural similarity across four datasets, while maintaining model accuracy within 2-2.5% of traditional methods. Unlike prior approaches, this method applies noise to raw data rather than intermediate representations, enhancing privacy without sacrificing utility in large-scale distributed systems. The work advances trustworthy AI by balancing privacy and performance, paving the way for secure edge computing and sensitive data applications like personalized medicine.
Researchers Reduce Data Exposure with Privacy Methods

Summarize this article with:

Scientists are increasingly focused on securing sensitive data within decentralised machine learning systems, and a new study led by Obaidullah Zaland and Sajib Mistry, from Umeå University and Curtin University respectively, addresses vulnerabilities in federated split learning. Working with Monowar Bhuyan, also of Umeå University, the researchers demonstrate that intermediate data shared during this process can be exploited to reconstruct private client information. Their work proposes a novel framework, k-anonymous differentially private UFSL (KD-UFSL), which employs techniques like microaggregation and differential privacy to significantly reduce data leakage while maintaining model accuracy.

This research is significant because it offers a practical solution for balancing privacy preservation and utility in large-scale, distributed machine learning applications, thereby enabling more secure and reliable big data analysis. Protecting sensitive data during collaborative analysis is becoming ever more important for applications like personalised medicine and financial modelling. A new technique enhances federated split learning. Powerful insights to be gained from distributed datasets without compromising individual privacy. By cleverly masking shared information, it offers a practical route to secure, decentralised artificial intelligence. Scientists are increasingly focused on methods for training machine learning models using data distributed across numerous devices. Federated learninghas emerged as a prominent technique, enabling decentralized training without requiring data to be transferred to a central location. However, this decentralized approach introduces computational demands on individual client devices. U-shaped federated split learning addresses this by offloading some computation to a server. Meanwhile, crucially maintaining data and labels locally on each client. Despite these benefits, the intermediate data shared between clients and the server remains vulnerable to privacy breaches.

Scientists have developed k-anonymous differentially private UFSL, or KD-UFSL, a system designed to minimise data leakage from this intermediate data. This new approach combines microaggregation and differential privacy, established privacy-enhancing technologies, to protect sensitive client information. Initial investigations revealed that an adversary could reconstruct private client data from these intermediate representations, highlighting the need for stronger safeguards. KD-UFSL directly tackles this vulnerability by altering the data before it is transmitted, making reconstruction considerably more difficult. KD-UFSL not only increases the discrepancy between original and reconstructed images, by as much as 50% in certain instances, but also reduces the structural similarity between them by up to 40% across four different datasets. This balance between privacy and utility makes KD-UFSL particularly well-suited for large-scale applications where both data protection and model performance are essential. Unlike traditional centralized machine learning, federated learning relies on clients retaining local control of their data. At the same time, split learning further divides model training between clients and servers, creating a need for secure communication of intermediate results. By addressing the specific risks associated with these intermediate representations, KD-UFSL offers a step towards more secure and practical decentralized machine learning systems.

The team focused on mitigating data reconstruction attacks, as a curious server could potentially rebuild a client’s original data from the transmitted “smashed data”. To counter this, KD-UFSL employs k-anonymity through microaggregation, obscuring individual values by grouping similar data points together. Simultaneously, differential privacy adds carefully calibrated noise to the raw data before processing, further hindering reconstruction efforts.

The team explored how to best integrate these techniques within the UFSL framework. While existing studies often focus on adding differential privacy to the smashed data itself, KD-UFSL instead applies noise to the original data. Combined with microaggregation of the intermediate representations. Such an approach aims to provide a stronger privacy guarantee without unduly sacrificing the accuracy of the final model. Through carefully balancing these factors, the team sought to create a system that is both secure and effective. Meanwhile, the effectiveness of KD-UFSL was tested through rigorous experimentation across four benchmarking datasets. Scientists measured the extent to which reconstructed images differed from the originals, using metrics like mean squared error and structural similarity. These results showed a clear trade-off between privacy and utility, with KD-UFSL demonstrably increasing the difficulty of reconstruction while still maintaining acceptable model performance — this effort highlights the importance of considering data privacy throughout the entire machine learning pipeline. Through addressing the vulnerabilities inherent in federated split learning. Here, kD-UFSL contributes to a growing body of research aimed at building more trustworthy and responsible AI systems, and this is particularly relevant in edge computing scenarios, where data is generated and processed on a multitude of devices, demanding strong privacy protections. Performance of KD-UFSL against baseline methods across diverse datasets and network architectures Table III presents averaged test dataset per image mean squared error (MSE) and structural similarity index measure (SSIM) between actual and reconstructed data for KD-UFSL and baseline methods, across different networks. Initial KD-UFSL consistently outperforms all baselines in minimising structural similarity between original and reconstructed images. Specifically, using the ResNet18 network, KD-UFSL achieved a mean squared error increase of up to 50% on the CIFAR10 dataset compared to traditional UFSL. Simultaneously, structural similarity decreased by up to 40% on the FashionMNIST dataset, indicating a stronger level of privacy preservation. K-anonymized UFSL demonstrates performance comparable to, and occasionally exceeding, differentially private UFSL. In certain scenarios with the ConvNet architecture, UFSL with k-anonymity achieved a lower MSE loss than KD-UFSL. Still, KD-UFSL generally improves both SSIM and MSE scores on the SVHN and EMNIST datasets. Meanwhile, maintaining a superior SSIM value on CIFAR10 and a better MSE value on FashionMNIST. In turn, the ResNet50 architecture showed mixed results, with either k-anonymity or differential privacy alone matching KD-UFSL’s performance. Figure 4 displays actual and reconstructed images from client smashed data for each model — by randomly selecting client head models from the test set. Meanwhile, the project team aimed to reduce bias and ensure the images were not used during training, and KD-UFSL maintains a utility level comparable to vanilla federated split learning. At the same time, the utility, measured by the accuracy of the final model, experiences a drop of only 2-2.5% compared to the absolute performance of traditional FSL. Varying the noise level (σ2) in differential privacy had a noticeable impact on privacy metrics. Changing σ2 between 0.1 and 0.5 improved both SSIM and MSE scores on the SVHN dataset, though the effect on CIFAR10 was minimal. By increasing the group size (k) generally enhanced privacy under the k-anonymity parameter, although the differences were often negligible. Except for a noticeable decline in SSIM on the SVHN dataset when k=3.

Mitigating Data Leakage in Federated Split Learning via k-Anonymity and Differential Privacy A 72-qubit superconducting processor forms the foundation of this effort, though its application differs from typical quantum computation. Scientists are enabling machine learning without centralising sensitive data. Federated learning offers a potential solution, allowing models to be trained across numerous devices while keeping personal information local. A weakness emerged in these systems: the intermediate data shared during the training process could still reveal details about individual contributions. This vulnerability with a new approach to federated split learning, termed KD-UFSL. This actively works to obscure private data within those shared representations. Protecting data during decentralised machine learning is now yielding to practical solutions. Rather than relying on simply keeping data on devices, KD-UFSL introduces techniques borrowed from the field of differential privacy, adding carefully calibrated noise to the shared information. By employing microaggregation alongside differential privacy, the system diminishes the possibility of reconstructing private data from the intermediate results, even under attack. To achieve this privacy comes at a cost, with some reduction in the accuracy of the final model. The significance of this effort extends beyond the specific performance metrics reported. For applications in healthcare, finance, or any area dealing with personal data, the balance between utility and privacy is central. KD-UFSL demonstrably improves privacy levels while maintaining a functional global model, a critical step towards real-world deployment. The level of privacy offered requires a trade-off with model accuracy. Further investigation is needed to refine this balance. The challenge shifts towards optimising these privacy-preserving techniques for diverse data types and computational constraints. We can anticipate a surge in research combining federated learning with differential privacy and other privacy-enhancing technologies. Expect to see more sophisticated methods for dynamically adjusting privacy levels based on data sensitivity and application requirements, and potentially, the development of hardware designed to accelerate these computations. Since the demand for data-driven insights continues to grow, securing those insights without compromising individual privacy will remain a defining challenge for the field. Protecting individual data contributions within decentralised federated split learning Scientists are increasingly focused on enabling machine learning without centralising sensitive data. Federated learning offers a potential solution, allowing models to be trained across numerous devices while keeping personal information local. However, a weakness emerged in these systems: the intermediate data shared during the training process could still reveal details about individual contributions. This vulnerability with a new approach to federated split learning, termed KD-UFSL — this actively works to obscure private data within those shared representations. Protecting data during decentralised machine learning is now yielding to practical solutions, and rather than relying on simply keeping data on devices, KD-UFSL introduces techniques borrowed from the field of differential privacy. Adding carefully calibrated noise to the shared information. By employing microaggregation alongside differential privacy, the system diminishes the possibility of reconstructing private data from the intermediate results, even under attack. To achieve this privacy comes at a cost, with some reduction in the accuracy of the final model. The significance of this effort extends beyond the specific performance metrics reported. For applications in healthcare, finance, or any area dealing with personal data, the balance between utility and privacy is central. Unlike previous methods, KD-UFSL demonstrably improves privacy levels while maintaining a functional global model, a critical step towards real-world deployment. The level of privacy offered requires a trade-off with model accuracy. Further investigation is needed to refine this balance. The challenge shifts towards optimising these privacy-preserving techniques for diverse data types and computational constraints. We can anticipate a surge in research combining federated learning with differential privacy and other privacy-enhancing technologies. 👉 More information 🗞 Guarding the Middle: Protecting Intermediate Representations in Federated Split Learning 🧠 ArXiv: https://arxiv.org/abs/2602.17614 Tags:

Read Original

Tags

government-funding
partnership

Source Information

Source: Quantum Zeitgeist