AI

Google's AI Models Achieve Major Milestone in Mathematical Problem Solving

26 July 2024

|

Zaker Adham

Summary

Google DeepMind has announced a significant breakthrough with its AI systems, AlphaProof and AlphaGeometry 2, which successfully solved four out of six problems from this year's International Mathematical Olympiad (IMO). This achievement marks the first time an AI system has reached such a high level of performance in this prestigious competition, earning a score equivalent to a silver medal.

Advanced AI Techniques in Action

AlphaProof employs reinforcement learning to tackle mathematical proofs in the Lean formal language. By generating and verifying millions of proofs, the system gradually handles increasingly complex problems. On the other hand, AlphaGeometry 2, an enhanced version of Google’s previous geometry-solving AI, utilizes a Gemini-based language model trained on extensive data sets.

Impressive Results with Human Oversight

Prominent mathematicians Sir Timothy Gowers and Dr. Joseph Myers evaluated the AI's solutions according to official IMO standards. The combined AI system scored 28 out of 42 points, narrowly missing the 29-point threshold for a gold medal. Remarkably, it achieved a perfect score on the most challenging problem, a feat only matched by five human contestants this year.

A Unique Mathematical Competition

The IMO, held annually since 1959, challenges elite pre-college mathematicians with difficult problems in algebra, combinatorics, geometry, and number theory. The IMO problems are widely regarded as a benchmark for assessing the mathematical reasoning capabilities of AI systems. Google reported that AlphaProof solved two algebra problems and one number theory problem, while AlphaGeometry 2 tackled the geometry question. However, the AI struggled with the combinatorics problems, failing to solve them.

Efficiency and Limitations

While some problems were solved within minutes, others required up to three days. Google first translated the IMO problems into formal mathematical language for the AI to process, a step that differs from the competition's standard where human contestants work directly with problem statements during two 4.5-hour sessions.

Prior to this year's IMO, AlphaGeometry 2 had already solved 83% of historical IMO geometry problems from the past 25 years, a significant improvement from its predecessor's 53% success rate. This year, it solved the geometry problem in just 19 seconds once given the formalized version.

Nuanced Perspectives and Future Implications

Despite the impressive achievements, Sir Timothy Gowers offered a nuanced view, highlighting the longer time taken by the AI compared to human competitors. He also emphasized that humans translated the problems into Lean before the AI began its work, meaning the core mathematical reasoning was performed by the AI while the initial "autoformalization" step was human-led.

Gowers speculated on the broader implications for mathematical research, suggesting that while these AI systems are not yet capable of making mathematicians redundant, they could become valuable research tools. He noted the potential for AI to assist in solving a wide range of questions, provided they are not overly complex.