Anthropic has released its upgraded Claude 3.5 Sonnet model that features a brand new capability – computer use. The model delivers significantly improved coding skills. With this new feature, developers can direct Claude to interact with any desktop application just like humans do. Let’s get into the details!
Table of Contents
The Upgraded Claude 3.5 Sonnet
The new Sonnet AI model outperforms various models on coding benchmarks compared to its predecessor. On SWE-bench Verified, an evaluation for software engineering skills, it improves performance from 33.4% to 49.0%, scoring higher than all publicly available models. Similarly, on TAU-bench, which tests agentic tool usage, Claude 3.5 Sonnet enhances scores in both retail and airline domains. Early customers have noticed substantially stronger reasoning abilities with this upgraded model.
Performance Evaluation of New Claude 3.5 Sonnet
Early customer feedback highlights the leap taken by Claude 3.5 Sonnet. GitLab, who tested it for DevSecOps tasks, observed 10% higher reasoning abilities without any latency increase. Cognition reported major improvements in coding, planning and problem-solving compared to the prior version in autonomous evaluations. The Browser Company noted the upgraded model outperformed every model they had assessed before while automating workflows.
Teaching Claude Computer Usage
Additionally, Anthropic is teaching Claude universal “computer skills” rather than tool-specific automation abilities. Through their new ‘computer use’ API, developers can direct Claude to perceive and interact with computer interfaces like humans – translating instructions into clicking, typing and navigation sequences across standard programs and software. On OSWorld, a computer usage benchmark, Claude 3.5 Sonnet attained a score of 14.9% with screenshots and 22% with additional steps – outshining competing systems.
The New Claude 3.5 Haiku
Claude 3.5 Haiku presents the next generation of Anthropic’s fastest model. For an identical price point and speed as Claude 3 Haiku, it exceeds the capabilities of even the powerful Claude 3 Opus model in numerous categories. Notably, Claude 3.5 Haiku scores 40.6% on SWE-bench Verified, outshining many publicly available agents. With low latency, enhanced instruction comprehension and more precise tool utilization, Claude 3.5 Haiku is well-suited for user-facing applications, specialized subtasks and vast personalized experiences.
Developing Computer Use Responsibly
While computer use enables powerful skills, Anthropic acknowledges limitations and has safeguards to promote responsible adoption. The company conducted in-depth pre-release reviews of the new Sonnet with safety experts. Joint testing of the model was conducted by renowned AI safety organizations – US AI Safety Institute and UK Safety Institute. Anthropic also self-assessed Claude 3.5 Sonnet and found the risks defined in their ‘Responsible Scaling Policy’ document remain valid.
Availability and Accessibility
The computer use public beta along with upgraded Claude 3.5 Sonnet and new Claude 3.5 Haiku models are available through Anthropic API, Amazon and Google cloud platforms. Anthropic invites everyone to explore the new models and capabilities while providing feedback to progress responsibly together.
| Latest From Us
- NoLiMa Reveals LLM Performance Drops Beyond 1K Contexts
- InternVideo2.5, The Model That Sees Smarter in Long Videos
- SYNTHETIC-1 Uses DeepSeek-R1 for Next-Level Base Model Cold Start
- Microsoft Study Reveals How AI is Making You Dumber
- Clone Any Voice in Seconds With Zonos-v0.1 That Actually Sounds Human