
Dan Mackinlay
Why do you care about AI Existential Safety?
I care about AI existential safety not least because I myself wish to continue existing. Bigger picture, I think that no more significant a step has been taken in history than ushering in new intelligences comparable to or greater than our own. We see in evolutionary history that each time more cognitively powerful organisms arise they have the potential reshape ecosystems radically. Bringing about organisms of new cognitive capacities in the most interconnected and mass-scale world system that we have ever witnessed will likely be the greatest disruption of call. I think we must be very careful if we do not wish to go the way of megafauna when humans arrive in their biome.
Please give at least one example of your research interests related to AI existential safety:
I’m currently interested in the strategic/game theoretic models via reinforcement learning; A mini-literature review I wrote is here: https://www.lesswrong.com/posts/cqvqHcpnxSXGfFLwQ/opponent-shaping-as-a-model-for-manipulation-and-cooperation
Generally I am interested in discovering how powerful models might be encouraged to “co-evolve” with humans; it seems to me that as long as we can play iterated games against each other we can find arbitrarily good Nash equilibria. But if we are playing one-shot games, then there is no reason for a dominant agent not to play “defect”.
Generally, game theoretic models seem like they have some alpha in them as a research strategy, since they are somewhat agnostic to the implementation of the inference algorithm, and they incorporate action. I see them as complementing technical results such as singular learning theory; not the only strategy to pursue, but one that is somewhat under-explored and which I may have positive alpha in, as part of a larger strategy.
