Since 2024, Anthropic’s performance optimization team has given job applicants a take-home test to make sure they know their stuff. But as AI coding tools have gotten better, testing has had to change a lot to stay ahead of AI-assisted fraud.
Team leader Tristan Hume describes the history of the challenge In a blog post on Wednesday. “Each new cloud model forces us to redesign testing,” writes Hume. “Given the same time frame, Cloud Opus 4 outperformed most human applicants. This still allowed us to isolate the strongest candidates – but then again, Cloud Opus 4.5 matches them as well.”
The result is a serious candidate-evaluation problem. Without in-person proctoring, there’s no way to be sure someone isn’t using AI to cheat on an exam – and if they do, they’ll quickly come out on top. Hume writes, “Under the constraints of the take-home test, we no longer had any way to distinguish between the output of our top candidates and that of our most capable models.”
AI fraud is already an issue wreaking havoc in schools and universities Around the world, this is so ironic that even AI labs are having to deal with it. But Anthropic is also uniquely equipped to tackle the problem.
Finally, Hume designed a new test that had nothing to do with hardware optimization, making it new enough to deter contemporary AI tools. But as part of the post, he shared the original test to see if anyone reading could come up with a better solution.
“If you can make Opus 4.5 the best it can be, we’d love to hear from you,” the post reads.

