SUPR
Probabilistic Word Embeddings with TensorFlow and Stan
Dnr:

NAISS 2024/22-1419

Type:

NAISS Small Compute

Principal Investigator:

Isac Boström

Affiliation:

Chalmers tekniska högskola

Start Date:

2024-11-06

End Date:

2025-11-01

Primary Classification:

10106: Probability Theory and Statistics

Webpage:

Allocation

Abstract

Quantifying uncertainty in word embeddings is crucial for reliable inference from textual data, yet existing methods like bootstrap and mean-field variational inference are computationally intensive or make limiting assumptions. We explore alternative approaches, focusing particularly on Gibbs sampling via Polya-Gamma augmentation as our key contribution, alongside Laplace approximation and Hamiltonian Monte Carlo. Additionally, we address the challenge of non-identifiability in word embeddings. We study the effectiveness and accuracy of our method in simulation studies with known ground truth. Moreover, we use the MovieLens data set to study our methods' feasibility and accuracy on real data.