Maybe AI agents can be lawyers after all

Last month, I wrote about Mercor’s new benchmark Measuring the capabilities of AI agents on professional tasks such as law and corporate analysis. At the time, the scores were quite disappointing, with every major lab scoring below 25%, so we concluded that lawyers were safe from AI displacement, at least for now.

But AI capabilities can change a lot in a matter of weeks.

Opus 4.6 released this week shaken up leaderboardAnthropic’s new model scored only 30% in one-shot tests, and an average of 45% when given a few more cracks at the problem. Notably, the release included a bunch of new agentic features, including “Agent Swarms”, which may have helped with this kind of multi-step problem-solving.

Despite this, the score is a huge leap from the previous state-of-the-art, and a sign that progress on the foundation model is not slowing down. Mercor CEO Brendan Foody, who was particularly impressed, said, “To jump from 18.4% to 29.8% in a few months is crazy.”

apex-agent leaderboard

Thirty percent is still a long way from 100%, so it’s not like lawyers need to be worried about being replaced by machines next week. But they should be a lot less confident than they were last month!

Source link

Please follow and like us:
Pin Share

Leave a Reply

Your email address will not be published. Required fields are marked *