Can Reasoning Models Obfuscate Reasoning?
We stress-test the monitorability of chain-of-thought reasoning in language models, investigating whether reasoning models can obfuscate their reasoning processes.
Paper accepted at NeurIPS 2025 FoRLM workshop. Findings used by Anthropic in Claude Opus 4.6 safety evaluations and cited by OpenAI’s Monitoring Monitorability paper.