Mar 11, 2021
With Alex Turner
by Alexander Matt Turner, Logan Smith, Rohin Shah, Andrew Critch, Prasad Tadepalli
Abstract: "Some researchers have speculated that capable reinforcement learning agents are often incentivized to seek resources and power in pursuit of their objectives. While seeking power in order to optimize a misspecified objective, agents might be incentivized to behave in undesirable ways, including rationally preventing deactivation and correction. Others have voiced skepticism: human power-seeking instincts seem idiosyncratic, and these urges need not be present in reinforcement learning agents. We formalize a notion of power within the context of Markov decision processes. With respect to a class of neutral reward function distributions, we provide sufficient conditions for when optimal policies tend to seek power over the environment."