Back to News
research

Privacy-enhanced Vision Transformers on the Edge Address Data Vulnerabilities with Distributed Framework

Quantum Zeitgeist
Loading...
5 min read
1 views
0 likes
Privacy-enhanced Vision Transformers on the Edge Address Data Vulnerabilities with Distributed Framework

Summarize this article with:

Modern visual intelligence systems offer remarkable convenience, but their intensive computational demands often exceed the capabilities of everyday mobile and wearable devices. Zihao Ding, Mufeng Zhu, and Zhongze Tang, from Rutgers University, alongside Sheng Wei and Yao Liu, present a new framework that tackles this challenge while simultaneously safeguarding user privacy. Their research introduces a distributed system for Vision Transformers, which intelligently partitions visual data and distributes it across multiple cloud servers, ensuring no single server holds complete image information. This innovative approach performs final data merging locally on a trusted edge device, like a smartphone, substantially reducing the risk of data exposure and maintaining near-baseline performance on demanding tasks such as image segmentation.

The team’s work delivers a scalable and privacy-preserving solution, paving the way for more secure and accessible visual computing in the increasingly connected world. Privacy Risks in Augmented and Egocentric Vision This extensive collection of research details investigations into privacy and security within computer vision, particularly in augmented reality (AR) and egocentric vision, which captures a first-person perspective. The research highlights key concerns surrounding data collection and surveillance by devices constantly capturing visual information, raising questions about the potential misuse of collected data and the unintentional recording of sensitive details. Several studies explore techniques to protect privacy within images and videos, including methods to blur faces, remove sensitive objects, or distort images while still enabling useful computer vision tasks. Researchers also demonstrate the risk of reconstructing scenes or inferring private information from visual data, even when data appears anonymized. The research encompasses a range of technical approaches to privacy preservation, including methods inspired by differential privacy and adversarial learning, while secure multi-party computation and homomorphic encryption enable computations on visual data without revealing it to any single party. Protective perturbation adds noise to images to conceal sensitive information, and feature masking removes sensitive elements before processing. The studies cover a wide range of deep learning architectures used in computer vision, including convolutional neural networks, transformers, object detection, and semantic segmentation. The research addresses specific applications and scenarios, including augmented and virtual reality, egocentric vision, financial security, and location-based services. In essence, this body of work represents a growing effort to balance the benefits of computer vision with the need to protect individual privacy in an increasingly visual world, spanning fundamental privacy-preserving techniques to specific challenges in emerging technologies.

Privacy Preserving Distributed Vision Transformer Inference Researchers developed a distributed framework to address privacy vulnerabilities when offloading complex visual intelligence tasks, such as those powered by Vision Transformers (ViTs), to cloud servers. Recognizing the limitations of mobile and wearable devices in processing these computationally intensive tasks, the study pioneered a method for partitioning visual data into smaller portions and distributing them across multiple independent cloud servers. This design prevents any single server from possessing the complete image, mitigating the risk of comprehensive data reconstruction and protecting user privacy. A local trusted edge device functions as an orchestrator, managing the data partitioning and distribution process. Final data merging and aggregation computations occur exclusively on the user’s trusted edge device, further enhancing privacy by preventing external servers from accessing the complete, reconstructed image. Scientists applied this framework to the Segment Anything Model (SAM), and evaluations demonstrate that it maintains near-baseline segmentation performance while substantially reducing the risk of content reconstruction and user data exposure. This innovative method provides a scalable and privacy-preserving solution for vision tasks in the edge-cloud continuum, addressing concerns about data breaches during transmission and server-side computations.

Privacy Preserving Vision Transformers with Partitioning Scientists developed a distributed computing framework that enhances privacy for Vision Transformer (ViT) based visual intelligence models. The work leverages local trusted edge devices to orchestrate computations and distribute visual data into smaller portions across multiple independent cloud servers. This partitioning ensures no single server possesses the complete image, preventing comprehensive data reconstruction and bolstering user privacy. Experiments demonstrate the framework maintains high task accuracy while substantially reducing privacy risks. The system partitions images, sending only limited data, as little as one twenty-fifth of the total image, to each external cloud resource. This limits the amount of recoverable pixel-level and object-level information, even when subjected to state-of-the-art reconstruction adversaries. Results show the framework achieves comparable vision task performance to conventional, non-privacy-enhanced approaches.

The team implemented and deployed the framework using the Segment Anything Model (SAM), utilizing Docker and gRPC for communication between edge and cloud resources. By preserving full pixel fidelity within each partition and limiting the view of any single server, the framework avoids the utility loss associated with input perturbation techniques and circumvents the heavy cryptographic overhead of methods like Homomorphic Encryption and Secure Multi-Party Computation.

Privacy Preserving Image Processing with Transformers This research presents a novel distributed framework for processing images using Vision Transformers, designed to enhance user privacy in edge-cloud computing environments.

The team successfully demonstrated that by partitioning images and distributing these portions across multiple cloud servers, the framework prevents complete image reconstruction even in the event of a data breach. Evaluations confirm that this approach maintains performance comparable to traditional methods while significantly reducing the risk of private content exposure, making it suitable for integration into mobile and wearable devices. The framework achieves this by performing the final data merging and aggregation exclusively on a trusted local device, ensuring that complete images are never stored or processed externally. While the current implementation exhibits a minor performance reduction compared to unpartitioned processing, the authors acknowledge this and propose that fine-tuning the Vision Transformer on the partitioned data could recover this lost utility. Future work will also investigate temporal-aware partitioning schemes to address potential risks associated with assembling information across multiple frames of video. 👉 More information 🗞 A Distributed Framework for Privacy-Enhanced Vision Transformers on the Edge 🧠 ArXiv: https://arxiv.org/abs/2512.09309 Tags:

Read Original

Source Information

Source: Quantum Zeitgeist