Skip to content

Value Alignment and Multi-agent Inverse Reinforcement Learning

Amount recommended
Grant program
Primary investigator
Stefano Ermon, Stanford University
Technical abstract

Reward specification, a key challenge in value alignment, is particularly difficult in environments with multiple agents, since the designer has to balance between individual gain and overall social utility. Instead of designing rewards by hand, we consider inverse reinforcement learning (IRL), an imitation learning technique where agents learn directly from human demonstrations. These techniques are well developed for the single agent case, and while they have limitations, they are often considered a key component for addressing the value alignment problem. Yet, multi-agent settings are relatively unexplored.

We propose to fill this gap and develop imitation learning and inverse reinforcement learning algorithms specifically designed for multi-agent settings. Our objectives are to: 1) develop techniques to imitate observed human behavior and interactions, 2) explicitly recover rewards that can explain complex strategic behaviors in multi-agent systems, enabling agents to reason about human behavior and safely co-exist, 3) develop interpretable techniques, and 4) deal with irrational agents to maximize safety. These methods will significantly improve our capabilities to understand and reason about the interactions among multiple agents in complex environments.

Published by the Future of Life Institute on 1 February, 2023

Sign up for the Future of Life Institute newsletter

Join 40,000+ others receiving periodic updates on our work and cause areas.
cloudmagnifiercrossarrow-up linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram