AI Safety Research

I’m actively conducting research on AI safety and alignment, focusing on making AI systems more transparent and controllable.

Current & Recent Projects

Vulnerability in Trusted Monitoring and Mitigations

Research conducted as part of AI Safety Camp, exploring vulnerabilities in AI monitoring systems and developing robust mitigation strategies.

Learn more about this research →

SAE-driven Auto Steering

Independent research on using Sparse Autoencoders (SAEs) for automated steering of language model behaviors, contributing to interpretability and control mechanisms.

View the auto steering research →

Efficient Sparse Autoencoder Feature Splitting

Research on improving the efficiency of sparse autoencoder feature splitting methods, aimed at better understanding and decomposing neural network representations.

Read about efficient feature splitting →

Current Focus

I’m currently part of the ML Alignment and Theory Scholars (MATS) program, conducting AI Control research on Chain-of-Thought monitorability with David Lindner, Scott Emmons, and Erik Jenner.

Contact

Interested in discussing AI safety research? Feel free to reach out at wen.xing.us@gmail.com.