Deepmind claims that its AI is superior to international mathematical Olympic gold medalists
The AI system developed by Google Deepmind, the leading AI research lab at Google, appears to outperform the average gold medalist when solving geometric problems in international mathematical competition.
The system called Alphageometry2 is an improved version of the system, Alphageometry. Its deep attitude was released last January. in Newly published researchThe deep researchers behind Alphageometry2 claim that AI can solve 84% of all geometric problems over the past 25 years at the International Mathematics Olympiad (IMO), a mathematics contest for high school students. .
Why is Deepmind interested in high school-level math competition? Well, the lab believes that the key to more capable AI might lie about discovering new ways to solve challenging geometry problems. Euclidean geometry problems.
To prove the mathematical theorem or to logically explain why the theorem (e.g. Pythagoras’ theorem) is true, you need both the ability to choose from a range of steps that may have been directed towards a solution. is. These problem-solving skills may turn out to be useful components of future generic AI models if they have the rights to DeepMind.
In fact, this summer, Deepmind demonstrated a system that combines Alphageometry2 with Alphaproof, an AI model of formal mathematical inference, to solve four of the six problems in IMO in 2024. In addition to geometry problems, such approaches can be extended to other fields of mathematics and science to aid in the calculation of complex engineering.
Alphageometry2 has several core elements, including the language model for the AI model of Google’s Gemini family and the “symbolic engine.” The Gemini model helps symbolic engines to infer solutions to problems using mathematical rules, and arrives at viable proofs of a particular geometry theorem.
The geometry problem in Olympiad is based on diagrams such as points, lines, circles, etc. that require “configuration” to be added before solving it. The Gemini model in Alphageometry2 predicts which components may be useful for adding them to the diagram.
Essentially, the Gemini model of Alphageometry2 suggests steps and structure in formal mathematical language to the engine. The search algorithm allows Alphagemetry2 to perform multiple searches of solutions in parallel and store useful findings in a general knowledge base.
Alphageometry2 considers the problem to be “solved” when it arrives at evidence that combines the Gemini model proposal with known principles of symbolic engines.
The complexity of translating proofs into formats that AI can understand leads to a lack of available geometry training data. So DeepMind has created its own synthetic data to train the language model for Alphageometry2, generating over 300 million theorems and proofs of various complexity.
The Deepmind team has selected 45 geometric problems from IMO competitions over the past 25 years (2000-2024) that include linear equations and equations that require moving geometric objects around the plane. I then “translated” these into a big set of 50 issues. (For technical reasons, I had to split some issues into two.)
According to the paper, Alphageometry2 solved 42 out of 50 problems, with an average gold medalist score of 40.9.
Certainly there are limitations. Due to technical quirks, Alphageometry2 prevents solving problems of varying numbers of points, nonlinear equations, and inequality. and Alphageometry2 is not Technically It is the first AI system to reach gold medal level performance in geometry, but the first one to achieve with this sized problem set.
Alphageometry2 has also gotten worse with another set of more difficult IMO problems. For additional challenges, the DeepMind team selected questions (29 total) that had been nominated for the IMO exam by mathematics experts, but it had not yet appeared in the competition. Alphageometry2 was able to solve only these 20.
Still, the findings say whether AI systems need to be built on symbol manipulation, i.e., whether to manipulate symbols representing knowledge using rules, or, on the surface, neural networks like the brain. may encourage discussion.
Alphageometry2 uses a hybrid approach. The Gemini model has a neural network architecture, and its symbolic engine is rule-based.
Advocates of neural network technology argue that from speech recognition to image generation, intelligent actions are nothing more than a huge amount of data and computing. Opposed to symbolic systems that solve tasks by defining a set of symbolic manipulation rules specialized for a particular job, such as editing lines in word processor software, neural networks solve tasks through statistical approximations and use examples. I’m trying to learn.
Neural networks are the cornerstone of powerful AI systems Openai’s O1 “Inference” Model. But claiming to support iconic AI, they are not the end of everything. Iconic AI could be better positioned to efficiently encode world knowledge, pass through complex scenarios, and “explain” how they reached the answer, and these Supporters insist.
“It’s amazing to see the contrast between these types of benchmarks and the language models that include recent ones with ‘inference’ during this time,” says Vince Conitzer of Carnegie Mellon. A computer science professor at a university specializing in AI told TechCrunch. “I don’t think it’s all smoke and mirror, but it shows that we still don’t really know what behavior is expected for the next system. These systems could be very impactful. Because they are high, you need to understand them urgently, and the risk is much better.”
Alphageometry2 probably has two approaches: symbol manipulation and neural networks. Combined It’s a promising path to advance in searching for generalizable AI. In fact, according to deep paper, O1, which also has a Neural Network architecture, failed to solve the IMO problem that Alphageometry2 could answer.
This may not be the case forever. In the paper, the Deepmind team stated that they found preliminary evidence that the language model of Alphageometry2 can generate partial solutions to the problem without the help of symbolic engines.
“(The) results support the idea that large-scale language models can be self-sufficient without relying on external tools (such as symbolic engines),” the DeepMind team wrote in their paper . Hallucinations The tool remains essential for mathematical applications as it is completely resolved. ”