The advent of Generative Pre-trained Transformers (GPT) marked a pivotal advancement in modern AI. By tapping into extensive internet data, GPTs revolutionised AI by enhancing the versatility of models. Foundation models are neural networks “pre-trained” on massive amounts of data without specific use cases in mind. They have transformed AI – powering large language models (LLMs) such as ChatGPT. Robotics emerges as a leading frontier in this evolution, poised to revolutionise physical-world efficiencies akin to digital transformations. Robotics foundation models, trained on diverse datasets encompassing both internet-derived and real-world physical interactions, signify a significant leap towards constructing AI models capable of adeptly navigating the complexities of real-world environments. The momentum of robotics foundation models is rapidly escalating, buoyed by access to extensive and varied robotic data from real-world production settings.
This project is to use the state-of-the-art generative AI approaches such as language and vision models to build robotics foundation models for general-purpose manipulation skills in robotic application. The main tasks of the project are composed of
• Building a running environment and platform based on the Alvis computing resource;
• Fine-tuning multimodal LLM for robotic manipulation aligning with autonomous mobile robotic systems, which is supported by high-performance GPUs such as NVIDIA A100/H100 GPUs;
• Pre-training foundation models with massive datasets includes texts, images and videos with the help of GPUs;
• Training a robotics foundation model for general-purpose manipulation skills for wide-range robotic applications, where the robotics foundation model is built on a framework of a state-of-the-art open language model.
The models and algorithms that we plan to develop or apply are based on GPUs. This project is a combination of robotics and AI research. We will acknowledge the support from NAISS SUPR in our research work if our proposal is approved.