Solving Word Equations by Combining Symbolic Reasoning with Machine Learning

Dnr:

NAISS 2024/5-154

Type:

NAISS Medium Compute

Principal Investigator:

Philipp Ruemmer

Affiliation:

Uppsala universitet

Start Date:

2024-03-27

End Date:

2025-04-01

Primary Classification:

10201: Computer Sciences

Secondary Classification:

10206: Computer Engineering

Webpage:

- Alvis at C3SE: 15000 GPU-h/month
- Klemming at PDC: 500 GiB
- Mimer at C3SE: 500 GiB
- Crex 1 at UPPMAX: 128 GiB
- Dardel at PDC: 20 x 1000 core-h/month
- Rackham at UPPMAX: 20 x 1000 core-h/month

String solving is a dynamic research field within formal methods and program analysis, focusing on checking the satisfiability of first-order constraints over string variables. String solving has received the attention of many research groups, in particular during the last 10 years, due to its importance for checking security properties of systems. Our research group at Uppsala University is one of the leading groups in the field; among others, our string solver Ostrich was the winner in the QF_S track (quantifier-free string constraints) of the international solver competition SMT-COMP in 2023.
At the core of string solvers are algorithms for solving word equations, i.e., for solving equations between terms that may contain characters, string variables, and the concatenation function. Despite their apparent simplicity, word equations give rise to an extremely complex theory, and represent one of the bottlenecks in solvers; even after more than half a century of research on how to solve word equations, it is still easy to find short word equations that cannot be solved by any state-of-the-art solver.
In this research project, we study the the use of Graph Neural Networks (GNN) for intelligently guiding and speeding up solvers for word equations. To this end, we have developed the first GNN-guided solver for word equations, which is essentially a procedure searching for satisfying assignments of equations in an infinite tree-shaped search space. We employ GNNs to learn decision strategies that are applied at the branching points of the search space, and conceptualize the process of making decisions as a multi-classification task, wherein the input consists of one parent word equation and two or three offspring word equations, all encoded as graphs. We are in particular studying different ways to represent word equations using graphs, different ways to train GNN models, and different ways to integrate predictions produced by a GNN during the search. Tackling this data-intensive task necessitates substantial computational resources.