Member-only story
Meet M6 — 10 Trillion Parameters at 1% GPT-3’s Energy Cost
Smaller players can now enter the game of large AI models

I can confidently say artificial intelligence is advancing fast when a neural network 50 times larger than another can be trained at a 100 times less energy cost — with just one year in between!
On June 25, Alibaba DAMO Academy (the R&D branch of Alibaba) announced they had built M6, a large multimodal, multitasking language model with 1 trillion parameters — already 5x GPT-3’s size, which serves as the standard to measure the rate of progress for large AI models. The model was intended for multimodality and multitasking, going a step further than previous models towards general intelligence.
In terms of abilities, M6 resembles GPT-3 and other similar models like Wu Dao 2.0 or MT-NGL 530B (from which we have very little information). InfoQ, a popular Chinese tech magazine compiles M6’s main skills: “[It] has cognition and creativity beyond traditional AI, is good at drawing, writing, question and answer, and has broad application prospects in many fields such as e-commerce, manufacturing, literature and art.”
However, the critical aspect Alibaba researchers highlighted was the significant efficiency and energy cost improvements. They reduce the consumption of the model by 80% and increased its efficiency x11 when compared to 100-million language models.
Extremely important news in line with green AI principles and objectives.
Green AI to demonopolize large language models
But they didn’t stop there and now, 5 months later, they’ve just achieved not one, but two new striking milestones: They’ve improved M6 to make it the first 10-trillion-parameter large language model — 50x GPT-3’s size. And they’ve bettered their previous marks on efficiency, reducing the energy consumption to 1% of what GPT-3 needed to train.
They used a mere 512 GPUs to train the model in 10 days!
These achievements will have far-reaching positive consequences for the AI community and the world.
On the one hand, it’s a big leap towards finding common ground between the necessities of large AI models and…