AI continues to be more powerful and it becomes difficult to determine what smart models actually look like
How do you determine that AI models are already superior to humans? This is a challenge facing researchers like Russell Wald, executive director of Stanford University’s Institute of Artificial Intelligence (HAI).
“As of 2024, there are few task categories where human capabilities outweigh AI. Even in these areas, the gap between AI and human performance is rapidly shrinking,” Wald said in a presentation held last week at the Fortune Brainstorm AI Singapore Conference. “AI is beyond human capabilities and benchmarking is becoming more and more difficult.”
HAI releases AI indexes every year. It aims to provide a comprehensive, data-driven snapshot of where AI is today. At Fortune Brainstorming AI Singapore, Waldo shared some highlights from 2025 edition The rise in power in today’s models, increased industrial control in the AI frontier, and China’s poised to overtake the US, including the AI index.
The following transcripts have been lightly edited for brevity and clarity.
I’m Russell Wald, executive director of Stanford University for human-centered artificial intelligence, or something called “High.”
We are Stanford University’s globally recognized interdisciplinary research institute, at the forefront of shaping AI development for public goods. HAI was founded in 2019 with the goal of promoting AI research, education, policy and practice. And through the role of AI convening and rigorous research, we have become trusted partners in AI governance for industry, government and civil society decision makers.
I’ll talk about what is produced by HAI, an AI index. This is an annual data-driven analysis of AI trends that tracks the research, development, deployment and socioeconomic impacts of AI across academia, government and industry.
AI performance is improving consistently year by year. I’m looking for surreal images of Harry Potter using Midjourney, a text-to-image generator. And from February 2022 to July 2024, the quality of these generated images has been improving rapidly.
In 2022, the model produced an inaccurate cartoonish rendering of Harry Potter, but by 2024 it could create a surprisingly realistic portrayal. We’ve come from a reflection of Picasso’s painting to a creepy rendering of actor Daniel Radcliffe, who played Harry Potter in the film.
Due to this consistent performance growth, benchmarking of these models is increasingly challenging. As of 2024, there are few task categories where human abilities outweigh AI. Even in these areas, the gap between AI and human performance is rapidly shrinking. From image recognition to competitive level mathematics to PHD level science questions, AI goes beyond human capabilities, making benchmarks more difficult.
From healthcare to transportation, AI is rapidly moving from labs to our daily lives. In 2023, the US Food and Drug Administration approved only 6 to 223 AI-enabled medical devices in 2015.
On the roads, self-driving cars are no longer experimental. for example, WaymoI take regularly while I live in San Francisco, one of the largest operators in the United States, offering over 150,000 autonomous vehicles every week, but the affordable price of Baidu Apollo Go Robotaxi There is a fleet that serves many cities across China.
Business use of AI has increased significantly after stagnating from 2017 to 2023. Latest McKinsey Report 78% of respondents surveyed revealed that organizations are beginning to use AI in at least one business feature, showing a significant increase from 55% in 2023.
Driven by increasingly capable small-scale models, the inference costs for systems running at the level of (GPT 3.5) between November 2022 and October 2024 have been reduced by more than 280 times. Hardware costs have decreased by 30% each year, while energy efficiency has increased by 40% each year.
Openweight models also close the gap in closed models, with performance (gap) dropping from 8% to just 1.7% in some benchmarks over the course of a year. Together, these trends are rapidly decreasing barriers to advanced AI.
However, even with lower inference and hardware costs, training costs remain out of reach of academia and most small players. Almost 90% of the prominent AI models in 2024 came from industries that are rising from 60% in 2023. Also, academia remains the top source of highly cited research, but at this point it is struggling to advance at the frontier level.
The model scale continues to grow rapidly. Training doubles every five months, datasets doubles every eight months, and power usage doubles every year. However, the performance gap has been narrowing. The difference between the top-ranked model and the 10th place score has fallen from 11.9% to 5.4% annually, with the top two models being separated by just 0.7%. Frontiers are increasingly competitive and more crowded.
In recent years, AI models have converged in performance on the frontier, with multiple providers offering highly capable models. This shows the shift from the second half of 2022, when it was seen as a breakthrough in public consciousness, from the launch of ChatGpt, which is considered a breakthrough in AI.
One of the most important things to note is that it costs Google $930 to train in 2017. This is the GPT T.
Last year’s AI Index was one of the first publications to highlight the lack of standard benchmarks for AI safety and responsibility assessment. The Index also analyzes global public opinion. If you are from a non-Western industrialized country, you are more likely to be seen aggressively than you are actively looking at AI. China has a positive view of 83%, Indonesia 80%, and Thailand 77%. Meanwhile, Canada has 40%, the US has 39%, and the Netherlands has 36%.
We conclude the geopolitical situation. The US remains at its AI lead, followed by China in close proximity. but, This gap is tightened. My intention is not to exacerbate the idea of the AI weapons race between China and the US, but to highlight it. Various approaches Among the most advanced frontier AI model developers.
Over the past few years, the US has relied on several unique model providers. Meanwhile, China is investing heavily in its talent base and, more importantly, its open source environment. If this trend continues and I appear next year, then at this rate, China will outperform the US in terms of model performance.