Robotics foundation models for general-purpose manipulation skills

SUPR uses JavaScript for certain functions. We cannot guarantee that you will be able to use the system with JavaScript disabled.

Dnr:

NAISS 2024/5-164

Type:

NAISS Medium Compute

Principal Investigator:

Lihui Wang

Affiliation:

Kungliga Tekniska högskolan

Start Date:

2024-04-29

End Date:

2025-05-01

Primary Classification:

20201: Robotics

Secondary Classification:

20307: Production Engineering, Human Work Science and Ergonomics

Webpage:

https://www.kth.se/

Allocation

Alvis at C3SE: 1000 GPU-h/month
Mimer at C3SE: 500 GiB

Abstract

The advent of Generative Pre-trained Transformers (GPT) marked a pivotal advancement in modern AI. By tapping into extensive internet data, GPTs revolutionised AI by enhancing the versatility of models. Foundation models are neural networks “pre-trained” on massive amounts of data without specific use cases in mind. They have transformed AI – powering large language models (LLMs) such as ChatGPT. Robotics emerges as a leading frontier in this evolution, poised to revolutionise physical-world efficiencies akin to digital transformations. Robotics foundation models, trained on diverse datasets encompassing both internet-derived and real-world physical interactions, signify a significant leap towards constructing AI models capable of adeptly navigating the complexities of real-world environments. The momentum of robotics foundation models is rapidly escalating, buoyed by access to extensive and varied robotic data from real-world production settings. This project is to use the state-of-the-art generative AI approaches such as language and vision models to build robotics foundation models for general-purpose manipulation skills in robotic application. The main tasks of the project are composed of • Building a running environment and platform based on the Alvis computing resource; • Fine-tuning multimodal LLM for robotic manipulation aligning with autonomous mobile robotic systems, which is supported by high-performance GPUs such as NVIDIA A100/H100 GPUs; • Pre-training foundation models with massive datasets includes texts, images and videos with the help of GPUs; • Training a robotics foundation model for general-purpose manipulation skills for wide-range robotic applications, where the robotics foundation model is built on a framework of a state-of-the-art open language model. The models and algorithms that we plan to develop or apply are based on GPUs. This project is a combination of robotics and AI research. We will acknowledge the support from NAISS SUPR in our research work if our proposal is approved.