Research
AI Safety Research
I’m actively conducting research on AI safety and alignment, focusing on making AI systems more transparent and controllable.
Current & Recent Projects
Vulnerability in Trusted Monitoring and Mitigations
Research conducted as part of AI Safety Camp, exploring vulnerabilities in AI monitoring systems and developing robust mitigation strategies.
Learn more about this research →
SAE-driven Auto Steering
Independent research on using Sparse Autoencoders (SAEs) for automated steering of language model behaviors, contributing to interpretability and control mechanisms.
View the auto steering research →
Efficient Sparse Autoencoder Feature Splitting
Research on improving the efficiency of sparse autoencoder feature splitting methods, aimed at better understanding and decomposing neural network representations.
Read about efficient feature splitting →
Current Focus
I’m currently part of the ML Alignment and Theory Scholars (MATS) program, conducting AI Control research on Chain-of-Thought monitorability with David Lindner, Scott Emmons, and Erik Jenner.
Contact
Interested in discussing AI safety research? Feel free to reach out at wen.xing.us@gmail.com.