SUPR
OpenAI Whisper transcription evaluation on Uppmax Bianca
Dnr:

sens2023632

Type:

NAISS SENS

Principal Investigator:

Ola Ã…berg

Affiliation:

Uppsala universitet

Start Date:

2023-12-20

End Date:

2025-01-01

Primary Classification:

10208: Language Technology (Computational Linguistics)

Webpage:

Allocation

Abstract

Researchers at Uppsala University have a great need of legal and efficient transcription services. Traditionally this has been a tedious manual work associated with large costs. In addition, if trancription of sensitive data is outsourced it becomes complicated from legal point of view, hence a wordaround has been to temporarily employ a person for the transcription. This requires a lot of administration and takes time and resources. In recent times the develpment of AI has paved the way for automated machine transcription. Up until recently this has been a service mainly offered by companies outside of Sweden, again leaeding us into a legal gray zone regarding handling and storage of sensitive data. An on-prem installation of the open source speech recognitions software OpenAI Whisper will solve a lot of above problems. It requires substantial compute power to run smoothly, a T4 GPU typically. - No legal problems, on prem, data never leaves UU - Compute capacity problems, - No coslty internal billing administration - Open soruce free software, no procurment issues - Whisper is the best trained transcription software to date The outcome of this project will be a step-by-step guide how to effectively use Whisper at Uppmax.