SUPR
Quality Diversity Optimization on Simulated Vocal Models
Dnr:

NAISS 2025/5-450

Type:

NAISS Medium Compute

Principal Investigator:

Bobby Lee Townsend Sturm

Affiliation:

Kungliga Tekniska högskolan

Start Date:

2025-09-01

End Date:

2026-09-01

Primary Classification:

10210: Artificial Intelligence

Secondary Classification:

10208: Natural Language Processing

Tertiary Classification:

60412: Music

Webpage:

Allocation

Abstract

Knowing which sounds can be produced by a simulated vocal model is not trivial. Being able to map this out is interesting for applications that make use of the extended capabilities of a voice, e.g. singing and voice acting. In previous work (Grouwels et al. 2025) we developed, implemented and validated a method to explore and steer the expressive capabilities of a state-of-the-art articulatory vocal model using a recent Quality-Diversity optimization algorithm with multimodal embeddings. Now that we have a basic working implementation of the method we intend to: 1) test and validate the current method more extensively, 2) calibrate its hyperparameters, 3) improve various aspects of the current method, 4) extend it with new functions, 5) apply it to questions in vocal science, music and performing arts. This research is meant to result in several conference submissions. We are aiming for conferences on AI and Music (EvoMusArt, AIMC, etc.) and in the realm of Speech and Voice Science (e.g. Interspeech). Eventually we also envisage a journal paper that ties the different aspects of this research together before it will become part of the dissertation of co-investigator JG. Reference: Joris Grouwels, Nicolas Jonason, and Bob L. T. Sturm. 2025. Exploring the Expressive Space of an Articulatory Vocal Modal using Quality-Diversity Optimization with Multimodal Embeddings. In Genetic and Evolutionary Computation Conference (GECCO ’25), July 14–18, 2025, Malaga, Spain. ACM, New York, NY, USA, 9 pages. https://doi.org/10.1145/3712256.3726313