Probabilistic programming frameworks (PPF), such as Stan, PyMC, Tensorflow probability, and Pyro, are becoming increasingly popular for probabilistic data analysis and predictive modelling. Many of these frameworks rely on general inference algorithms such as Hamiltonian Monte Carlo or variational inference. The recent development of new inference algorithms, adaptation methods and implementations has sparked an interest in formally evaluating and benchmarking PPFs and algorithms.
We propose posteriordb, a database of posteriors with many different models and data to evaluate and test inference algorithms efficiently. An easily available set of diverse posteriors with reference results makes a comparison of algorithms and PPF more comparable and trustworthy.