Safe reinforcement learning for heavy machine decision making

SUPR uses JavaScript for certain functions. We cannot guarantee that you will be able to use the system with JavaScript disabled.

Dnr:

NAISS 2026/4-590

Type:

NAISS Small

Principal Investigator:

Han Wang

Affiliation:

Umeå universitet

Start Date:

2026-03-24

End Date:

2027-04-01

Primary Classification:

10210: Artificial Intelligence

Webpage:

Allocation

Mimer at C3SE: 500 GiB
Alvis at C3SE: 250 GPU-h/month
Arrhenius Disk at NAISS: 250 GiB

Abstract

This project develops a machine learning framework for controlling heavy machinery (such as timber harvesting equipment) that simultaneously achieves two critical objectives: rapid learning from demonstrations and guaranteed safety during deployment. The core innovation separates concerns between task performance optimization and safety constraint monitoring, enabling the system to learn efficiently while maintaining verifiable safety guarantees that can be audited and validated before real-world deployment. The technical approach integrates three key components: causal reward shaping to accelerate learning from human demonstrations, preference-based learning to accurately estimate safety costs from human feedback, and formal safety constraints to ensure that real-world physical limits (such as maximum force thresholds on robotic arms) are never exceeded. Unlike conventional reinforcement learning methods that treat safety as a secondary concern, this framework treats safety as a first-class constraint embedded in the learning process itself, ensuring that safety remains verifiable throughout training and operation. The expected impact includes three concrete improvements: First, the learning algorithm requires significantly fewer training samples compared to standard approaches, reducing computational cost and time to deployment. Second, the safety verification is transparent and auditable—regulators and operators can track exactly how the system's learned behaviors map to real physical safety limits, rather than treating the system as a black box. Third, the framework provides formal mathematical guarantees that the deployed system will satisfy safety constraints with high confidence, enabling safer deployment of autonomous machinery in high-risk environments. Successful completion of this project will demonstrate a practical path toward deploying intelligent machinery control systems that are both sample-efficient and provably safe, with applications extending beyond timber harvesting to other safety-critical domains such as heavy construction and industrial automation.