More parameters strictly means that the neural network is larger; it has more processing units…

1 min readJun 10, 2021

More parameters strictly means that the neural network is larger; it has more processing units (neurons). Yet, it's precisely a finite amount of parameters that forces a neural net to find higher-level representations to adequately encode the patterns in the training data. If we had a neural network with too many parameters (whatever that number is, which would depend on the nature of the training data), it could overfit the data, not actually learning the patterns hidden within, but the details (which limits generalization).

Finding a compromise solution to avoid overfitting is often required. Yet, as the comment below yours points out, GPT-3 and Wu Dao 2.0 (and other systems) proved that having more parameters can provide qualitatively more power to a neural network.

Also, it could be the case that adding more parameters is simply not helping. Those parameters could be left unused after training, or could not help the neural net optimize further. It probably depends on many factors, one being the number of parameters added. 10x GPT-3 is a huge difference, so it's reasonable to expect something different from Wu Dao 2.0

Taking all into account, we could conclude that both statements are true depending on the context: More parameters can imply more power, but not necessarily, which is what Aniket Kumar meant.

Hope it's more clear now, if you have more questions, feel free to ask :)

Written by Alberto Romero

No responses yet