NeMo Megatron: NVIDIA’s Large Language Model Framework

Alberto Romero
4 min readSep 17, 2022
NVIDIA NeMo Megatron is an end-to-end framework for training & deploying LLMs with billions and trillions of parameters. Credit: NVIDIA

As hyperscalers and enterprises embrace the new era of massive AI models (LLMs), they will realize that the most pressing obstacle for AI to go to the next stage isn’t the AI itself — algorithms are fine — but the software (and hardware) infrastructure that underlies the development, training, and deployment of the models.

We tend to think that AI companies — those that build the algorithms (think, OpenAI or DeepMind) — are the ones leading the field forward. But that’s only the tip of the iceberg when it comes to building useful AI. There are key players in the field who aren’t getting the credit they deserve. I’m talking about those making sure the OpenAIs of the world have adequate frameworks to work with so they have an easier time training and making inferences on their custom LLMs.

NVIDIA’s Solution: Nemo Megatron

The foremost example of this type of company is NVIDIA. Everyone associates NVIDIA with leadership GPUs — the hardware that powers the immense majority of AI, especially if we zoom in on state-of-the-art (SOTA) large models. But NVIDIA is also a key player in the software/framework side of LLMs.

NVIDIA (together with Microsoft) is the company behind the second largest dense language model in existence, and one of the most performant —…

--

--