Ultra-Large AI Models Are Over
I don’t mean ‘over’ as in “you won’t see a new large AI model ever again” but as in “AI companies have reasons to not pursue them as a core research goal — indefinitely.”
Don’t get me wrong. This article isn’t a critique of the past years — even if I don’t buy the “scale is all you need” argument, I acknowledge just how far scaling has advanced the field.
Parallelism can be drawn between the 2020–2022 scaling race and — keeping the distance — the 50s-70s space race. Both advanced science significantly as a byproduct of other intentions.
But there’s a key distinction.
While space exploration was innovative in nature, the quest for novelty isn’t present in the “bigger is better” AI trend: To conquer space, the US and USSR had to design novel paths toward a clear goal. In contrast, AI companies have blindly followed a predefined path without knowing why or whether it’d lead us anywhere.
You can’t put the cart before the horse.
That makes all the difference and explains why and how we’ve got here.
The scaling laws of large models
Some companies use AI to automate processes, improve efficiency, and reduce costs. Others want to advance scientific understanding or improve people’s life and well-being. And yet others want to build the “last invention” we’ll make — or so they think.
Call it AGI, superintelligence, human-level AI, or true AI.
In any case, it’s been a recurring goal since the field’s birth in 1956. But the idea got tangible in 2012, then more in 2017, and finally exploded in 2020.
The last milestone was OpenAI’s discovery and application of the strongest version of scaling laws for large language models (LLMs).
They accepted, earlier than anyone else, that sheer model size (and thus data and computing power) was key to advancing the field. OpenAI’s faith in the scaling hypothesis was reflected in the Jan 2020 empirical paper “Scaling Laws for Neural Language Models.”
In May 2020, OpenAI announced GPT-3, a direct result of applying the scaling…