Bayesian Optimization for automatic tuning of resource-intensive black-box systems

SUPR uses JavaScript for certain functions. We cannot guarantee that you will be able to use the system with JavaScript disabled.

Dnr:

NAISS 2024/5-228

Type:

NAISS Medium Compute

Principal Investigator:

Luigi Nardi

Affiliation:

Lunds universitet

Start Date:

2024-05-01

End Date:

2025-05-01

Primary Classification:

10201: Computer Sciences

Webpage:

https://cs.lth.se/luigi-nardi/

Allocation

Alvis at C3SE: 10000 GPU-h/month
Centre Storage at NSC: 500 GiB
Mimer at C3SE: 500 GiB
Tetralith at NSC: 50 x 1000 core-h/month

Abstract

We apply to extend our Medium SNIC/NAISS allocation for one year and increase the compute budget to 400 000 hours/month to carry out further development of state-of-the-art optimization methods in AI. We have used the previous compute budget to great effect with three publications at top (CORE A*) machine learning venues over the last nine months Hvarfner et al. [2022a,b], Papenmeier et al. [2022]. We will focus on adaptive experimentation research and development using Bayesian optimization (BO) and apply it to a wide range of applications. BO has become an established framework and popular tool for hyperparameter optimization of machine learning [Snoek et al., 2012], as well as a go-to procedure for the tuning of complex systems in hardware design Nardi et al. [2019], robotics [Calandra et al., 2014], chemistry [Griffiths and Hernández-Lobato, 2020] and beyond [Chen et al., 2018]. BO relies on a statistical model of the (unknown) objective function, whose beliefs guide the algorithm in making informed decisions. These are sophisticated models which, while expensive to build, yield unparalleled sample efficiency. In the optimization field, the goal is to develop statistically efficient methods that find good solutions quickly. To show that our methods are in fact outperforming the baselines, we run our algorithm together with the competitors on a number of benchmark problems. As the outcome of a single optimization run is highly stochastic, it is necessary to run each method a large number of times on each problem to get statistically significant results. While such experiments are computationally expensive, they are an essential part of performing fair high-quality research. The increasing computing demand in this project continuation is driven by: 1) Focus on real-world benchmark problems. Recently, we have been working on optimizing compiler autotuners for example, which is compute intensive. However, real-world applications are necessary because they make for a stronger research case. 2) Recent development in BO relies on Monte Carlo (MC) methods. While computationally expensive, these methods tend to yield superior performance. It is crucial then to employ MC to explore new directions. 3) Ablation analysis studies. Running all the required experiments on the 50k hours currently allocated would be infeasible. On a note of our past year's usage of our compute budget. Team members have frequently had to restrict their experimental setup to comply with the current budget allocation. Compromises include caps on the number of repetitions and baselines evaluated in their work, severely limiting the statistical significance of the results. Team members have been overly cautious in empirically testing new algorithms under the current constraints. Moreover, we had to establish a policy of always leaving 5-10k hours of computing available in case a team member urgently needs these hours for experiments related to a deadline or rebuttal. The requested allocation of 400 000 CPU hours will allow us to pursue future research more freely without having to restrict the number of projects, stagger project deadlines or prioritize between researchers.