Top artificial intelligence (AI) systems are struggling with basic calculus problems that most high school students can solve, revealing an unexpected limitation in the technology’s current capabilities.
Recent testing by FrontierMath has identified gaps in advanced AI models’ mathematical reasoning, prompting a fresh look at development priorities even as technology transforms other business operations. The tech industry continues investing heavily in AI, balancing its potential with the need to address limitations.
“I’d argue that the performance on FrontierMath is actually more encouraging than concerning — it demonstrates that we’re developing better ways to rigorously evaluate AI capabilities while also showing that current models are quite transparent about their limitations,” Dev Nag, the CEO of QueryPal, told PYMNTS.
“Rather than dampening investor confidence, this kind of clear-eyed assessment of AI’s strengths and limitations should reinforce trust in the technology,” Nag added. “Companies aren’t investing in AI because they need help solving abstract mathematical theorems; they’re investing because AI can dramatically accelerate and improve the kind of analytical work that businesses do every day.”
The new benchmark, FrontierMath, was created by Epoch AI in collaboration with over 60 leading mathematicians; the benchmark consists of hundreds of original, research-level math problems that require sophisticated reasoning and creativity. Despite the advanced capabilities of top AI models like GPT-4o and Gemini 1.5 Pro, they’re solving less than 2% of these problems.
FrontierMath’s problems are deliberately crafted to be “guess proof” and require deep mathematical understanding spanning fields from computational number theory to abstract algebraic geometry. Notable mathematicians, including Fields Medalists Terence Tao and Timothy Gowers, have validated the benchmark’s difficulty, noting that these problems often require hours or days for human experts to solve.
Experts say AI’s struggle with basic calculus could slow its adoption in math-heavy sectors like finance and engineering, while most businesses focused on routine analytical tasks and customer service will likely continue their AI investments unaffected. Industry analysts noted that companies care more about AI’s proven ability to streamline everyday operations than its limitations with advanced mathematics.
“Since only a small number of people can solve complex mathematical problems, it is unrealistic to expect AI models to excel at this stage,” Oleh Komenchuk, a machine learning engineer at Uptech, told PYMNTS. “These challenges represent a normal phase in the evolution of technology, and the capabilities of AI will inevitably grow over time.”
Tech giants and regulators are reassessing AI’s limits in business. OpenAI’s updates, while noteworthy, have sparked debate over whether the company’s latest models represent meaningful innovation or incremental change. Meanwhile, Microsoft is focusing on enhancing existing AI tools like Copilot, signaling a more cautious approach to development.
Regulators are also weighing in: the U.S. Patent and Trademark Office recently barred internal use of generative AI tools, citing concerns over security and bias. As companies continue integrating AI, these developments underscore the challenges of balancing innovation with practical and ethical considerations in business applications.
Despite recent scrutiny of AI’s limitations, investor confidence in the technology remains strong. Sam Altman, CEO of OpenAI, is reportedly seeking $150 million for Rain AI, a semiconductor venture to challenge Nvidia’s dominance in AI chips.
Meanwhile, Microsoft-backed d-Matrix unveiled its first AI chip, with plans for full shipments next year, and initial customers have already signed on. These developments illustrate that while questions about AI’s immediate scalability persist, the sector attracts significant investment, signaling faith in its long-term potential. For many investors, AI’s challenges appear to be hurdles rather than roadblocks.
“Although there are mathematical limitations right now, I strongly believe that investor confidence in AI companies remains robust,” Philip Gjørup, co-founder at Nord Comms, told PYMNTS. “For instance, the recent $5 billion funding round for Elon Musk’s xAI, which nearly doubled its valuation to $45 billion, illustrates that significant investment interest persists in the AI sector. While the mathematical shortcomings identified by FrontierMath may temper some of the exuberance surrounding AI’s capabilities, investors are likely to view this as a temporary challenge rather than a fundamental flaw in the technology.”