AI Safety Research

Kaj Sotala

Researcher

Foundational Research Institute

kaj.sotala@intelligence.org

Project: Teaching AI Systems Human Values Through Human-Like Concept Learning

Amount Recommended:    $20,000

Project Summary

AI systems will need to understand human values in order to respect them. This requires having similar concepts as humans do. We will research whether AI systems can be made to learn their concepts in the same way as humans learn theirs. This will involve a literature review of the relevant fields, as well as experimental work.

We are particularly interested in a branch of machine learning called deep learning. The concepts learned by deep learning agents seem to be similar as the ones that have been documented in psychology. We will attempt to apply existing deep learning methodologies for learning what we call moral concepts, concepts through which moral values are defined. In addition, we will investigate a particular hypothesis of how we develop our concepts and values in the first place.

Technical Abstract

Autonomous AI systems will need to understand human values in order to respect them. This requires having similar concepts as humans do. We will research whether AI systems can be made to learn their concepts in the same way as humans learn theirs. This will involve a literature review of the relevant fields, as well as experimental work.

Both human concepts and the representations of deep learning models seem to involve a hierarchical structure, among other similarities. For this reason, we will attempt to apply existing deep learning methodologies for learning what we call moral concepts, concepts through which moral values are defined. In addition, we will investigate the extent to which reinforcement learning affects the development of our concepts and values.