PYMNTS-MonitorEdge-May-2024

OpenAI Unveils New ChatGPT AI Models With Enhanced Reasoning

OpenAI has introduced its new “o1” series of reasoning models, which the company touts as a major advancement in artificial intelligence to tackle complex problems in science, coding, and mathematics. 

The company announced today that the first model in the series, “OpenAI o1-preview,” is now available in ChatGPT and through its API, marking a significant leap in AI’s problem-solving capabilities. Unlike previous models, the o1 series is designed to think more before responding, mimicking a human’s reasoning process. 

OpenAI claims that this refined approach allows the model to solve tougher tasks. For example, in tests, the o1 model performed at levels comparable to PhD students on challenging benchmarks across physics, chemistry, and biology. In coding, the o1 model outperformed its predecessors, reaching the 89th percentile in Codeforces competitions, compared to just 13% for its predecessor, GPT-4o, on the International Mathematics Olympiad qualifying exam.

“We trained these models to spend more time thinking through problems before they respond, much like a person would,”  OpenAI said in a statement. “Through training, they learn to refine their thinking process, try different strategies, and recognize their mistakes.”

A New Approach to Problem Solving

Rumors have been circulating about the upcoming launch of a new OpenAI LLM model called Strawberry. The “o1” appears to be that model. 

“OpenAI’s ‘Strawberry project signals a significant stride in AI capabilities, potentially revolutionizing how we interact with genAI technology and how it solves complex problems, Alon Yamin, co-founder and CEO of Copyleaks, an AI-based text analysis platform, told PYMNTS. “The implications for research, software development, and even scientific discovery are immense. Nevertheless, as we embrace this frontier, we must continue to prioritize the implementation of comprehensive guardrails. These guardrails will ensure that AI advancements like ‘Strawberry are harnessed responsibly, mitigating potential risks and maximizing their positive impact on society.”

Lars Nyman, CMO of CUDO Compute, previously told PYMNTS that the main strength of a reasoning-focused AI like ‘Strawberry lies in its ability to handle complex problem-solving, which could significantly impact industries like legal tech, healthcare, and scientific research. However, he noted that a potential downside is the slower response times, as this AI engages in more deliberate, ‘System 2 thinking. This slower processing could present a challenge in a fast-paced world that demands instant results.

OpenAI CEO Sam Altman wrote on X that “o1 is still flawed, still limited, and it still seems more impressive on first use than it does after you spend more time with it. but also, it is the beginning of a new paradigm: AI that can do general-purpose complex reasoning.”

The release also highlights advancements in safety, a growing concern in AI development. OpenAI claims that the o1 series incorporates a new safety training approach that allows the model to reason about and follow safety rules more effectively. The o1-preview model scored 84 out of 100 on OpenAI’s most difficult jailbreaking tests, where GPT-4o managed just 22 points.

To complement the launch of the o1-preview, OpenAI is also introducing a lighter, more cost-effective version dubbed “o1-mini, explicitly aimed at developers for coding tasks. This smaller model is 80% cheaper than its larger counterpart, providing a balance between efficiency and power.

Where to Get It

ChatGPT Plus and Team users can begin accessing the o1 models on Thursday (Sept. 12), while enterprise and educational users will gain access next week. Developers can also experiment with both models via OpenAI’s API, though certain features like function calling and streaming are still being developed.

As OpenAI continues to roll out new models, it plans to add browsing, file uploads, and other enhancements to make the o1 series more capable. 

For all PYMNTS AI coverage, subscribe to the daily AI Newsletter.

PYMNTS-MonitorEdge-May-2024