Probabilistic multimodal synthesis with conditional flow matching

SUPR uses JavaScript for certain functions. We cannot guarantee that you will be able to use the system with JavaScript disabled.

Dnr:

NAISS 2024/23-72

Type:

NAISS Small Storage

Principal Investigator:

Shivam Mehta

Affiliation:

Kungliga Tekniska högskolan

Start Date:

2024-02-02

End Date:

2025-02-01

Primary Classification:

10209: Media and Communication Technology

Webpage:

https://shivammehta25.github.io/Matcha-TTS/

Allocation

Mimer at C3SE: 5000 GiB

Abstract

Currently, Matcha-TTS is a fast TTS architecture based on conditional flow matching. Conditional Flow Matching can be considered a fast diffusion model, which opens the possibility to apply wherever diffusion has been successful including image synthesis and multimodal synthesis. The project aims to benefit from these advancements in the field of probabilistic synthesis and create a larger and stronger multimodal model.