Preview Mode Links will not work in preview mode

Technical AI Safety Podcast


Jan 5, 2021

With Bettina Könighofer and Rüdiger Ehlers

Feedback form

Request an episode

Safe Reinforcement Learning via Shielding

Mohammed Alshiekh, Roderick Bloem, Ruediger Ehlers, Bettina Könighofer, Scott Niekum, Ufuk Topcu
Reinforcement learning algorithms discover policies that maximize reward, but do not necessarily guarantee safety during learning or execution phases. We introduce a new approach to learn optimal policies while enforcing properties expressed in temporal logic. To this end, given the temporal logic specification that is to be obeyed by the learning system, we propose to synthesize a reactive system called a shield. The shield is introduced in the traditional learning process in two alternative ways, depending on the location at which the shield is implemented. In the first one, the shield acts each time the learning agent is about to make a decision and provides a list of safe actions. In the second way, the shield is introduced after the learning agent. The shield monitors the actions from the learner and corrects them only if the chosen action causes a violation of the specification. We discuss which requirements a shield must meet to preserve the convergence guarantees of the learner. Finally, we demonstrate the versatility of our approach on several challenging reinforcement learning scenarios.
Continued Work:

Stefan Pranger, Bettina Könighofer, Martin Tappler, Martin Deixelberger, Nils Jansen, Roderick Bloem: Adaptive Shielding under Uncertainty. CoRR abs/2010.03842 (2020)

Nils Jansen, Bettina Könighofer, Sebastian Junges, Alex Serban, Roderick Bloem: Safe Reinforcement Learning Using Probabilistic Shields (Invited Paper). CONCUR 2020: 3:1-3:16

Bettina Könighofer, Julian Rudolf, Alexander Palmisano, Martin Tappler, Roderick Bloem: Online Shielding for Stochastic Systems. CoRR abs/2012.09539 (2020)

Bettina Könighofer, Florian Lorber, Nils Jansen, Roderick Bloem: Shield Synthesis for Reinforcement Learning. ISoLA (1) 2020: 290-306