Anthropic’s new artificial intelligence (AI)-powered Claude 3 models beat competitors in many areas, experts told PYMNTS.
The company, which released the models on Monday (March 4), claims that Claude 3 Opus — the most advanced among the new models — surpassed both OpenAI’s GPT-4 and Google’s Gemini Ultra in industry benchmark assessments. The evaluations covered areas such as undergraduate-level knowledge, graduate-level reasoning and basic mathematics.
The new models signify the intensifying competition among AI companies to advance their technologies in an increasingly heated sector.
“Claude surpasses GPT-4 in almost every area,” Richard Gardner, the CEO of tech consulting firm Modulus, told PYMNTS in an interview.
“However, we feel Claude’s alignment layer is overly restrictive. With that said, GPT-4’s alignment layer is also becoming too restrictive,” he said, adding that he prefers using open source models.
Anthropic’s new AI tools within the Claude 3 family are called Opus, Sonnet and Haiku. The models Sonnet and Haiku are simpler and cheaper than Opus. Sonnet and Opus are available in 159 countries, and Haiku will be released soon, Anthropic said. The company hasn’t shared how long or how much it cost to develop Claude 3, but mentioned that companies like Airtable and Asana helped test the models.
Sonnet is also available on Amazon Bedrock, with plans for Opus and Haiku to be available on the platform in a few weeks.
For the first time, Anthropic is allowing users to analyze various kinds of data, including pictures, charts and documents, through its new multimodal support feature.
Tests show that Claude 3 is better at creating source code compared to other models, Caleb Moore, the co-founder and chief technology officer at software company Darwinium, told PYMNTS in an interview.
“Other common factors are comparing reasoning (the ability to come to a logical conclusion based on interrelated information given to it) as well as the depth of the knowledge already encoded in the system that it can use,” he added.
Comparing AI models can be tricky, Ilia Badeev, the head of data science at Trevolution Group, a travel services company that uses AI, told PYMNTS in an interview.
“People often rely on public tests for comparison, but these tests are pretty abstract and might not always reflect real-world scenarios,” Badeev said. “Just because a model excels in some tests doesn’t mean it will be perfect for your unique tasks.”
An important point to consider when choosing an AI model is the cost, Badeev pointed out. For instance, Claude 3 Opus will set you back $75 for a million tokens — significantly more than GPT-4 Turbo, priced at $30 for the same volume.
Gardner said almost any model can be fine-tuned to support a specific business use case. Some models may be better than others for particular tasks, but that’s primarily due to fine-tuning, he noted, citing apps that are designed specifically for managing clinical notes or to aid healthcare workers.
Businesses should choose an AI model based on accuracy, speed, privacy, ease of deployment or maintenance, and cost, Gardner said, adding that open source models provide users with more privacy.
For creative writers, GPT-4’s capabilities in generating text might be more useful, Michal Oglodek, the chief technology officer at Ivy.ai, told PYMNTS in an interview. On the other hand, if a user is aiming for accuracy and maintaining brand consistency, Gemini 1, with its focus on truthfulness and safety, could be the preferable choice. And for users who need to handle complex inquiries accurately, Claude 3 could offer advantages.
“Whenever possible, test models directly in your application,” Oglodek said. “Benchmarks are informative, but real-world use gives the most accurate picture.”
For all PYMNTS AI coverage, subscribe to the daily AI Newsletter.