Quality Diversity Optimization on Simulated Vocal Models

SUPR uses JavaScript for certain functions. We cannot guarantee that you will be able to use the system with JavaScript disabled.

Dnr:

NAISS 2025/5-107

Type:

NAISS Medium Compute

Principal Investigator:

Bobby Lee Townsend Sturm

Affiliation:

Kungliga Tekniska högskolan

Start Date:

2025-03-01

End Date:

2025-09-01

Primary Classification:

10210: Artificial Intelligence

Secondary Classification:

10208: Natural Language Processing

Tertiary Classification:

60412: Music

Webpage:

Allocation

Klemming at PDC: 500 GiB
Dardel at PDC: 150 x 1000 core-h/month

Abstract

Knowing which sounds can be produced by a simulated vocal model is not trivial. Being able to map this out is interesting for applications that make use of the extended capabilities of a voice, e.g. singing and voice acting. In previous work - submitted to the Genetic and Evolutionary Computation Conference (GECCO) at the end of January - we developed, implemented and validated a method to explore and steer the expressive capabilities of a state-of-the-art articulatory vocal model using recent Quality-Diversity optimization with multimodal embeddings. Now that we have a basic working implementation of the method we intend to: 1) test and validate the current method more extensively, 2) calibrate its hyperparameters, 3) improve various aspects of the current method, 4) extend it with new functions, 5) apply it to questions in vocal science, music and performing arts. This research is meant to result in several conference papers, the first of which is aimed at the AI for Music Creativity Conference (AIMC) with deadline april 11. Eventually we also envisage a journal paper that ties the different aspects of this research together before it will become part of the dissertation of co-investigator JG.