
Qiyao (Chi-yao) Wei
Why do you care about AI Existential Safety?
During my PhD research, I find myself thinking a lot about where AI is going, how contributions can be made from academia, industry, etc. Firstly, the field of AI is changing so fast that almost no one can reliably predict what the next AI wave will be like. Across AI researchers, there is widespread debate about what the next wave of AI revolution will be, when AI will reach superhuman intelligence, and even P(doom). Secondly, AI is different from other fields in science, in the sense that there is so much revenue in AI that frontier AI research is often shaped by the company labs in industry rather than academia. There is also widespread debate about how different the research in academia and industry should be. I believe these are all very important problems, and I personally find these problems extremely intriguing to think about. Therefore, I care deeply about AI safety.
Please give at least one example of your research interests related to AI existential safety:
I am currently doing research in monitoring Chain-of-Thought (CoT) faithfulness, which falls broadly under the umbrella of oversight/control. I developed research interest in this project because in a hackathon hosted by Anthropic, my team discovered that the preference models reward unfaithful CoT thought traces. This sparked my interest in investigating whether unfaithfulness occurs during training, and whether there is a systematic way to characterize the relationship between training signals and CoT faithfulness. For example, how various RL rewards or the pre-RL biases installed via SFT affect faithfulness is very interesting to me. My research is very relevant to the AI safety community, because my research will allow the community as a whole to have a better understanding of model faithfulness as a result of model training, and plausibly enable better methods to control/monitor frontier LLM models.
