In a moment that felt ripped straight off the pages of a science-fiction novel, the mathematics world was put into turmoil this July. OpenAI and Google claimed they had made a model capable of solving the IMO and winning Gold. Critics called such a claim controversial at best and blasphemous at worst, while others erupted in cheers, hailing it as the beginning of a new era in maths. Was this a ‘moon landing’ moment for AI? Is this the end of formal human mathematicians?
The IMO in essence the world cup of maths—is the most prestigious and difficult mathematics competition in the world. Every year thousands of the brightest high school students from all over the globe face 6 questions so difficult they’d make most university mathematics students sweat. The contest is divided over two days, with every day the contestants are allowed upwards of 4 and a half hours to complete 3 questions with each carrying 7 marks. This year the 66th IMO took place from 10th to 20th July in Sunshine Coast in Queensland, Australia, with 630 of the brightest high schoolers from 110 nations, competing out of which 72 or roughly 11% secured a gold medal. A popular misconception is that the IMO crowns a single world champion like the Olympics, however medals are awarded based on percentiles with the top approximately 1/12 of the scorers taking home a gold medal. So while the Chinese dominate the standings like they’re Argentina in the 2022 world cup, there is no real ‘world champion of maths’.
This year’s contest marked a turning point in the history, for both AI and the contest, as two separate AI models, Google’s Gemini Deep Think (which is basically Gemini 2.5 pro on steroids) and OpenAI’s experimental model, achieved gold medal level performance. Furthermore, the results from Gemini Deep Think were also graded by the same jury which grades human results. Both models were able to achieve 35 out of 42 points and were able to solve 5 questions perfectly.
The IMO is designed to ensure that the problems are extremely difficult and require creativity, deep insight and reasoning to be solved. Moreover, the problems stem from different domains of math, requiring different approaches to solve correctly. The IMO serves as a benchmark to gauge the general intelligence and reasoning of the AI model in comparison to human intelligence and reasoning. Moreover, the models could solve and generate answers in natural language(English), which is a ‘huge leap’ for AI as previous models could only solve them if and only if (no math pun intended) they were written in the correct computer syntax.
As of writing, OpenAI won’t spill the tea about how the model actually works. All we know as of yet is that it is a general purpose LLM (Large Language Model) instead of being dedicated to math. Additionally, it has the ability to work in natural language like a human mathematician and is self-aware and will not produce a response if it is unsure if its answer is correct. However, unlike Gemini, it was graded by a group of 3 IMO medalists.
Gemini Deep Think works by incorporating parallel thinking and was trained on novel reinforcement learning techniques that can leverage more multi-step reasoning, problem-solving and theorem-proving data. Gemini was also provided access to a curated corpus of high-quality solutions to mathematics problems, and some general hints and tips on how to approach IMO problems to its instructions. This model served as an improvement to last year’s model and its solutions were regarded as “astonishing in many respects.”, and “clear, precise and most of them easy to follow.” Moreover, both companies provided their submitted solutions
However, most mathematicians dismiss the hype around such models solving the IMO, Terence Tao an IMO Gold Medalist Fields Medal laureate and perhaps the G.O.A.T. of IMO voiced his concerns over such claims on Mastodon saying that such claims depend on the testing methodology. The IMO president Gregor Dolinar, although having praised the solutions produced by the models said, “cannot validate the methods [used by the AI models], including the amount of compute used or whether there was any human involvement, or whether the results can be reproduced.” The models employ a best of n strategy to find the best solution, internally grading different solutions produced via parallel thinking and submitting the best one, yet this akin to having several students working independently and only submitting the best solution. This could lead to a team which never even achieved a bronze standard to reliably achieve a gold standard and, in essence, completely throws the format and rules of the IMO out of the window.
On the other hand, such AI models are still quite handy. They can enable formal mathematicians to explore hypotheses, try new approaches to solve long-standing unsolved problems in significantly less time. Moreover, AI can aid in writing proofs, they check whether the logical argument supports the claim. Additionally, they can suggest useful lemmas through a large online library of proofs and, it can also auto formalize proofs leading to the generation of new proofs and conjectures; enriching online libraries and mathematics as a whole. Furthermore, it can guide students through proves interactively, aiding them in learning complex mathematics.
Nonetheless, all of this seems a bit spine-chilling. It feels like yesterday when these AI models struggled to solve basic geometry problems and now they’re able to solve all but one question in the most difficult mathematics competition in the world. Additionally, it feels like AI crossed a boundary; previously we believed that solving such complex math can only be solved using human abstract reasoning or creativity—or perhaps this is just copium. This also implies a shift in AI from just predicting results to actually grasping complex topics, out performing humans in the purest form of human reasoning. This has blood-curdling implications, demonstrating that AI has overtaken its creators in terms of intelligence and reasoning, blurring the line between tool and thinker. What began as an innocent desire for a computer system to solve complex math may well end with AI discovering its sense of self – and perhaps terminating with a Terminator style global takeover by AI.
To conclude, I would like to say the AI has come a long way since its launch in November of 2022. The fact that it can solve IMO-level mathematics problems serves as a major milestone on the path toward Artificial General Intelligence, which is an AI system’s ability to understand, learn and perform any intellectual task a human would be able to. Yet as the AI system can never truly be in exam conditions of the IMO hence, no matter how many points it scores it can never truly win an IMO gold medal (and I’m not saying it due to the fear for my life at the hands of the mathematics community). However it can not be debated that this will be going down in history books.
Raahim Nadeem
Team Writer (2025-2026)

