Flexible foundation models are critical for optimizing resource usage in delivering advanced artificial intelligence services, such as autonomous driving, smart drones, and other mobile applications. In this project, we propose to develop flexible foundation models by leveraging both large language models and vision transformers. These models will be designed with scalability in mind, enabling efficient resource allocation across a wide range of mobile platforms with varying computational capacities.
Our approach begins by creating elastic models through the integration of several pretrained models with varying scales, allowing for dynamic adjustment based on system requirements. We will apply advanced techniques, such as model stitching, to combine models of different complexities. After this integration, the models will undergo finetuning for task adaptation, ensuring optimal performance across diverse application domains, such as real-time object detection, natural language understanding, and autonomous navigation.
Ultimately, this project aims to provide a highly efficient, scalable solution that can be deployed across various mobile devices, improving both resource usage and task-specific performance.