research

Evolve-vla Achieves Continuous Robotic Adaptation with Zero Task-Specific Demonstrations

Quantum Zeitgeist

5 min read

1 views

0 likes

Evolve-vla Achieves Continuous Robotic Adaptation with Zero Task-Specific Demonstrations

Summarize this article with:

The pursuit of genuinely adaptive robots demands systems that learn through experience, much like humans refine skills through practice, rather than simply replicating pre-programmed actions. Zechen Bai, Chen Gao, and Mike Zheng Shou, all from the National University of Singapore, address this challenge with their new framework, EVOLVE-VLA, which enables vision-language-action models to continuously improve through interaction with their environment, even without specific task demonstrations.

This research overcomes a fundamental limitation of current robotic systems, which rely on extensive pre-recorded data and struggle to adapt to changing conditions.

The team achieves significant gains in performance, demonstrating an 8. 6% improvement on complex tasks, a 22. 0% increase in one-shot learning, and crucially, the ability to generalise to entirely new tasks without any additional training data, a feat previously unattainable with standard methods. Qualitative analysis further reveals that EVOLVE-VLA fosters emergent behaviours, such as error recovery and innovative strategies, marking a critical advance towards robots capable of true learning and self-improvement. Recent advances in robotic manipulation leverage large language models, yet these systems remain fundamentally limited by supervised finetuning. This approach requires extensive demonstrations for each task, rigidly memorises trajectories, and struggles to adapt when deployed in new environments. This work introduces EVOLVE-VLA, a test-time training framework enabling vision-language-action models to continuously adapt through interaction with their environment, requiring minimal or no task-specific demonstrations. The key challenge lies in replacing external reward signals, unavailable during deployment, with autonomous feedback, which is addressed through a learned progress estimator providing dense feedback and a framework designed to facilitate continuous adaptation. Reinforcement Learning for Continuous VLA Adaptation This paper introduces EVOLVE-VLA, a framework that enables vision-language-action models to continuously learn and adapt through interaction with their environment, moving beyond static, pre-trained models. The core idea is to allow VLAs to learn from experience rather than just imitating demonstrations. Key contributions include addressing the limitations of supervised finetuning, which restricts adaptability to new situations, by using reinforcement learning during deployment. The framework incorporates a learned progress estimator to provide feedback to the VLA, replacing impractical external rewards. Two technical innovations underpin this approach: accumulative progress estimation, which handles noisy reward signals by accumulating feedback over time, and progressive horizon extension, which gradually increases the planning horizon of the VLA, allowing it to learn more complex, long-horizon tasks. Experiments on the LIBERO benchmark demonstrate significant improvements: an 8. 6% gain on long-horizon tasks, a 22. 0% increase in one-shot learning, and enabled cross-task generalization, increasing success rates from 0% to 20. 8% without task-specific demonstration training. The framework also allows for the emergence of capabilities like error recovery through autonomous exploration. In essence, EVOLVE-VLA shifts VLAs from rigid trajectory memorization to genuine adaptive learning, allowing them to continuously improve through interaction with their environment. Future research directions include developing more accurate reward models, achieving zero-shot capability, addressing challenges related to training time and safety, and improving sample efficiency for more complex tasks.

Robotic Learning Adapts Without Task Guidance Scientists have developed EVOLVE-VLA, a new framework that enables robotic agents to continuously learn and adapt during operation, moving beyond reliance on pre-programmed demonstrations. This work addresses a critical limitation of existing vision-language-action models, which typically require extensive examples and struggle when faced with unexpected situations.

The team achieved this breakthrough by implementing a test-time training approach, allowing the VLA to improve through interaction with its environment without needing specific task guidance. A key innovation lies in replacing traditional external reward signals, unavailable during real-world deployment, with a learned progress estimator. This estimator provides dense feedback, and the researchers tackled the challenge of noisy signals through an accumulative progress estimation technique that smooths out fluctuations, and a progressive horizon extension strategy that enables gradual policy evolution. Experiments on the LIBERO benchmark demonstrate substantial gains, with an 8. 6% improvement on long-horizon tasks and a 22. 0% increase in one-shot learning performance. Notably, the framework achieved pioneering zero-shot cross-task generalization, increasing success rates on unseen tasks from 0% to 20. 8% through autonomous adaptation alone. Qualitative analysis revealed emergent capabilities, including error recovery and the development of novel strategies, absent in the original demonstration data. These results validate that test-time training represents a paradigm shift toward adaptive embodied agents, paving the way for truly general-purpose VLA systems capable of continuous self-improvement.

The team commits to releasing their codebase to foster further research in this direction. Continuous Adaptation for Robotic Manipulation Tasks This work introduces EVOLVE-VLA, a novel test-time training framework that enables vision-language-action models to adapt continuously through interaction with their environment, moving beyond the limitations of traditional supervised finetuning. Inspired by human learning, the system learns from trial and error rather than relying solely on pre-programmed demonstrations, achieving substantial gains in robotic manipulation tasks. Experiments on a benchmark suite demonstrate an 8. 6% improvement on long-horizon tasks and a 22. 0% increase in one-shot learning performance, alongside a significant ability to generalize to unseen tasks without specific training data. Notably, the research team observed emergent capabilities, such as error recovery, arising from the autonomous exploration facilitated by the framework, indicating a move towards more robust and adaptable robotic systems. The core of this achievement lies in replacing the need for external reward signals with a learned progress estimator, coupled with mechanisms to manage the inherent noise in self-generated feedback and enable gradual policy evolution. The authors acknowledge that uncontrolled policy behavior during early training stages presents a potential safety concern, and future work should focus on developing safety mechanisms to mitigate this risk. Further improvements in sample efficiency and adaptation to more complex tasks could be achieved through more sophisticated exploration strategies and curriculum designs. 👉 More information 🗞 EVOLVE-VLA: Test-Time Training from Environment Feedback for Vision-Language-Action Models 🧠 ArXiv: https://arxiv.org/abs/2512.14666 Tags:

Read Original

Source Information

Source: Quantum Zeitgeist

Website: https://quantumzeitgeist.com/feed/