NAISS
SUPR
NAISS Projects
SUPR
Controllable Expressive Speech with Adversarially Trained Orthogonal Neural Codec.
Dnr:

NAISS 2025/22-1716

Type:

NAISS Small Compute

Principal Investigator:

Juliana Francis

Affiliation:

Kungliga Tekniska högskolan

Start Date:

2025-12-10

End Date:

2026-12-01

Primary Classification:

10210: Artificial Intelligence

Webpage:

Allocation

Abstract

The purpose of this project is to train a new neural codec for speech which will . The goal of this is to enable more expressive control in speech synthesis through the use of various layers of codebooks at different levels of speech, as well as training methods to ensure orthogonality between these differing scales of codebooks. Speech systems we create using this codec will potentially be able to use this to enable both local and global control and transfer of expressiveness. We also will enforce through a secondary predictive model that only certain speech features will be encoded by given codebooks, and adversarially prevent others learning these features. Through this work, we aim to enable more controllable expressive speech synthesis that can be used in a streaming manner for realtime applications. Initially this will be trained in English, and then in a multilingual context within which Swedish will be included (something which is uncommon within many current models).