Back to News
research

Fastpose-vit: Vision Transformer Achieves Real-Time 6DoF Spacecraft Pose Estimation from Single Images

Quantum Zeitgeist
Loading...
5 min read
1 views
0 likes
Fastpose-vit: Vision Transformer Achieves Real-Time 6DoF Spacecraft Pose Estimation from Single Images

Summarize this article with:

Estimating a spacecraft’s position and orientation in space, known as its six-degrees-of-freedom pose, is fundamental to enabling autonomous missions such as in-orbit servicing and the removal of space debris. Pierre Ancey, Andrew Price, Saqib Javed, and Mathieu Salzmann from EPFL and the Swiss Data Science Center present a new method, FastPose-ViT, that directly determines this pose from single images, offering a significant advance over existing techniques. Current state-of-the-art methods typically rely on complex calculations, making them too slow for use on the limited hardware found on spacecraft. FastPose-ViT employs a Vision Transformer architecture to rapidly and accurately predict a spacecraft’s pose, and the team demonstrates its performance exceeds other non-traditional approaches while matching the accuracy of established methods on benchmark datasets. Crucially, the researchers also validated the system’s practicality for real-world space applications by successfully deploying a streamlined version on low-power hardware, achieving frame rates suitable for immediate use. Vision Transformers for Efficient Spacecraft Pose Estimation Researchers investigated a system called SPEED, designed for accurately and efficiently determining the orientation and position of a spacecraft. Traditional methods often struggle with reliability and speed, so the team explored using Vision Transformers, a type of deep learning model, to overcome these limitations. They focused on finding the right balance between accuracy and computational demands, making the system suitable for onboard processing. The study demonstrates that carefully selecting the Vision Transformer architecture and employing a novel technique to enhance robustness to variations in lighting and viewpoint significantly improves performance.

The team analysed different Vision Transformer designs, identifying those that offer the best trade-off between computational cost and accuracy on the SPEED dataset. They then introduced a new method for augmenting the training images, rotating them and adjusting the camera parameters to make the system more resilient to changes in viewing conditions.

This research provides valuable insights for future development of autonomous spacecraft systems.

Direct Spacecraft Pose Estimation via Vision Transformers Scientists developed FastPose-ViT, a new architecture based on Vision Transformers, to directly estimate a spacecraft’s orientation and position from single images. This approach addresses the computational limitations of existing methods by eliminating the need for slow, iterative calculations. Instead of first detecting key features, the team engineered a system that directly predicts the spacecraft’s pose from cropped images. A key innovation involves reformulating the prediction targets using geometric principles, allowing the system to accurately estimate both translation and attitude in a single step. The researchers rigorously evaluated their method on the SPEED and SPEED+ datasets, achieving state-of-the-art performance among techniques that do not rely on iterative solvers. To demonstrate its suitability for space missions, they optimised the model for deployment on a power-constrained NVIDIA Jetson Orin Nano, achieving a latency of 75 milliseconds per frame and a throughput of up to 33 frames per second. This work demonstrates that direct regression, combined with geometric insights and efficient hardware deployment, offers a viable alternative for spacecraft pose estimation.

Direct Pose Regression with Vision Transformers Scientists developed FastPose-ViT, a new method for estimating a spacecraft’s six-degree-of-freedom pose from a single image, achieving significant performance on benchmark datasets while maintaining real-time processing speeds on edge hardware. The work addresses a critical need for autonomous spacecraft operations, such as in-orbit servicing and space debris removal, by eliminating the need for computationally intensive iterative algorithms traditionally used for pose estimation. FastPose-ViT directly predicts the spacecraft’s pose using a Vision Transformer architecture, processing cropped images and employing a novel mathematical formalism based on projective geometry and apparent rotation. Experiments on the SPEED dataset demonstrate that the team’s direct regression approach matches the performance of state-of-the-art methods relying on iterative solvers, while achieving a latency of approximately 75 milliseconds per frame and a throughput of up to 33 frames per second on a Jetson Orin Nano. Ablation studies validate the key design choices, demonstrating the effectiveness of the proposed approach and its potential for deployment in realistic space mission scenarios. The research highlights that direct regression methods can achieve competitive performance without relying on computationally expensive iterative solvers, paving the way for more efficient and autonomous spacecraft operations.

Fast Pose Estimation From Single Images FastPose-ViT represents a significant advance in spacecraft pose estimation, delivering a method for determining a spacecraft’s six-degree-of-freedom position from single images.

The team developed a Vision Transformer-based architecture that directly predicts pose, bypassing the need for computationally expensive iterative processes commonly used in this field. A key innovation lies in the reformulation of pose regression using the concept of apparent rotation, enabling accurate pose determination through geometric principles. The research demonstrates performance competitive with state-of-the-art techniques on standard datasets, while also achieving real-time performance on embedded hardware, specifically the Jetson Orin Nano, with throughput up to 33 frames per second. This capability is crucial for practical deployment on resource-constrained spacecraft. Ablation studies confirm the importance of pre-training, image cropping, and the use of geometric targets in achieving optimal results. The authors acknowledge that performance can be affected by significant differences between training data and real-world imagery, suggesting that future work should focus on training with datasets more representative of actual space environments to further improve generalisation. 👉 More information 🗞 FastPose-ViT: A Vision Transformer for Real-Time Spacecraft Pose Estimation 🧠 ArXiv: https://arxiv.org/abs/2512.09792 Tags:

Read Original

Source Information

Source: Quantum Zeitgeist