SUPR
representation learning for conversational AI [More Storage Allocation]
Dnr:

NAISS 2024/23-657

Type:

NAISS Small Storage

Principal Investigator:

Mehrdad Farahani

Affiliation:

Chalmers tekniska högskola

Start Date:

2024-11-11

End Date:

2025-11-01

Primary Classification:

10208: Language Technology (Computational Linguistics)

Allocation

Abstract

I am actively working on Retrieval-augmented Generation (RAG) models. These models have considerable storage demands, especially in terms of saving document passages, indices, and model weights. To elaborate, each RAG model requires at least 200GB of space to store their indexes. Additionally, 50GB of memory is needed to save model weights during evaluation (over CACHE). Considering the training phase, the storage requirement will be approximately triple the aforementioned quantities since we must save at least two checkpoints. Given these requirements and factoring in the need to run our experiments on at least two such RAG models, the current storage capacity is insufficient.