Value Alignment and Multi-agent Inverse Reinforcement Learning
Reward specification, a key challenge in value alignment, is particularly difficult in environments with multiple agents, since the designer has to balance between individual gain and overall social utility. Instead of designing rewards by hand, we consider inverse reinforcement learning (IRL), an imitation learning technique where agents learn directly from human demonstrations. These techniques are well developed for the single agent case, and while they have limitations, they are often considered a key component for addressing the value alignment problem. Yet, multi-agent settings are relatively unexplored.
We propose to fill this gap and develop imitation learning and inverse reinforcement learning algorithms specifically designed for multi-agent settings. Our objectives are to: 1) develop techniques to imitate observed human behavior and interactions, 2) explicitly recover rewards that can explain complex strategic behaviors in multi-agent systems, enabling agents to reason about human behavior and safely co-exist, 3) develop interpretable techniques, and 4) deal with irrational agents to maximize safety. These methods will significantly improve our capabilities to understand and reason about the interactions among multiple agents in complex environments.