Stability of Neuromorphic Motivational Systems
We are investigating the safety of possible future advanced AI that uses the same basic approach to motivated behavior as that used by the human brain. Neuroscience has given us a rough blueprint of how the brain directs its behavior based on its innate motivations and its learned goals and values. This blueprint may be used to guide advances in artificial intelligence to produce AI that is as intelligent and capable as humans, and soon after, more intelligent. While it is impossible to predict how long this progress might take, it is also impossible to predict how quickly it might happen. Rapidly progress in practical applications is producing rapid increases in funding from commercial and governmental sources. Thus, it seems critical to understand the potential risks of brain-style artificial intelligence before it is actually achieved. We are testing our model of brain-style motivational systems in a highly simplified environment, to investigate how its behavior may change as it learns and becomes more intelligent. While our system is not capable of performing useful tasks, it serves to investigate the stability of such systems when they are integrated with powerful learning systems currently being developed and deployed.
We apply a neural network model of human motivated decision-making to an investigation of the risks involved in creating artificial intelligence with a brain-style motivational system. This model uses relatively simple principles to produce complex, goal-directed behavior. Because of the potential utility of such a system, we believe that this approach may see common adoption, and has significant risks. Such a system could provide the motivational core of efforts to create artificial general intelligence (AGI). Such a system has the advantage of leveraging the wealth of knowledge already available and rapidly accumulating on the neuroscience of mammalian motivation and self-directed learning. We employ this model, and non-biological variations on it, to investigate the risks of employing such systems in combination with powerful learning mechanisms that are currently being developed. We investigate the issues of motivational and representational drift. Motivational drift captures how a system will change the motivations it is initially given and trained on. Representational drift refers to the possibility that sensory and conceptual representations will change over the course of training. We investigate whether learning in these systems can be used to produce a system that remains stable and safe for humans as it develops greater intelligence.