Google DeepMind introduced two new artificial intelligence (AI) models and said they correctly answered four out of six questions in a math competition that has become a benchmark measuring the capabilities of AI systems.
The new AI models are a reinforcement-learning based system for formal math reasoning called AlphaProof and a new version of the company’s geometry-solving system called AlphaGeometry 2, Google DeepMind said in a Thursday (July 25) press release.
“Together, these systems solved four out of six problems from this year’s International Mathematical Olympiad (IMO), achieving the same level as a silver medalist in the competition for the first time,” the company said in the release.
The IMO, which is a competition for elite pre-college mathematicians, has become a benchmark for an AI system’s advanced mathematical reasoning capabilities, according to the release.
After the problems for this year’s competition were manually translated into formal mathematical language for the systems to understand, AlphaProof solved two algebra problems and one number theory problem, while AlphaGeometry 2 proved the geometry problem, the release said.
The two combinatorics problems included in the competition remained unsolved, per the release.
Earning a perfect score on each of the four problems they solved, the systems achieved a final score of 28 points — equivalent to the top end of the silver-medal category and one point below the gold-medal threshold of 29, which was achieved by 58 of 609 contestants at the official competition, according to the release.
“We’re excited for a future in which mathematicians work with AI tools to explore hypotheses, try bold new approaches to solving long-standing problems and quickly complete time-consuming elements of proofs — and where AI systems like Gemini become more capable at math and broader reasoning,” Google DeepMind said in the release.
Bloomberg reported Thursday that solving math problems has become a key proof point in the AI industry, where it’s difficult to compare different models. Large language models tend to have greater linguistic intelligence than mathematical intelligence, per the report.
PYMNTS reported in November that an AI model capable of doing math reliably is an enticing concept because math represents a foundation of learning for other, more abstract tasks.