Site icon DigiAlps LTD

Nemotron-4 15B: A New LLM From NVIDIA, Outperforming LLaMA-2 70B and More

Nemotron-4 15B: A New LLM From NVIDIA, Outperforming LLaMA-2 70B and More

Nemotron-4 15B: A New LLM From NVIDIA, Outperforming LLaMA-2 70B and More

NVIDIA has just announced their new large language model (LLM) Nemotron-4 15B. At 15 billion parameters, Nemotron-4 15B is a significant step up from NVIDIA’s previous offerings while still fitting on a single GPU, making it highly accessible for real-world use cases. Let’s take a deeper look at the details of this powerful new model, delving into its groundbreaking research, architecture, training, and unparalleled performance across diverse linguistic and coding tasks.

Development of NVIDIA Nemotron-4 15B 

NVIDIA developed Nemotron-4 15B by training it with an unprecedented amount of data, collecting a total of 8 trillion tokens from diverse sources. They utilized 3,072 of NVIDIA’s powerful H100 GPUs over the course of 13 days to train the model. This immense computing power and data allowed them to maximize parameter and data efficiency.

The training corpus included 70% English text, 15% multilingual data spanning 53 languages, and 15% source code samples from 43 programming languages. By including such a variety of data types, NVIDIA aimed to produce a general-purpose model that excelled at both natural and programming languages. They also carefully curated the data distributions to optimize for often underrepresented domains and languages.

The Architecture of Nemotron-4 15B

Nemotron-4 15B uses a standard Transformer decoder-only architecture with a number of optimizations for performance. It contains 32 layers with a hidden size of 6,144 and 48 attention heads. NVIDIA incorporated techniques like Rotary Position Embeddings and Grouped Query Attention to make the model more parameter and computationally efficient while maintaining high quality. It boasts 3.2 billion embedding parameters and 12.5 billion non-embedding parameters. Nemotron-4 15B exemplifies this data-over-parameters approach, achieving impressive abilities while remaining within the limits of a single GPU for broad usability.

Performance Evaluation: Outperforms Meta LLAMA-2 70B and More

In their recent technical report, the researchers at NVIDIA conducted an in-depth evaluation of Nemotron-4 15B’s abilities, benchmarking it against other leading open-source models of similar sizes. Their findings provide valuable insights into how their new LLM handles challenges in multilingual understanding, programming languages, and more.

Let’s take a closer look at some of the key results from the performance evaluation. 

1. Strong Commonsense Reasoning Ability

Commonsense reasoning is a foundational capability for intelligent systems to interact naturally with humans. Nemotron-4 15B is evaluated on standardized benchmarks like ARC, PIQA assessing causal, temporal and spatial relationships. It achieved the strongest average performance of 60.9% compared to 54.5-68.4% for models like LLAMA-2 models, Baichuan-2, QWEN, Mistral and Gemma.

Image Credits: NVIDIA

MMLU and BBH are popular benchmarks aggregating a wide range of NLU tasks to comprehensively measure models. Nemotron-4 15B attains state-of-the-art scores of 58.7% on BBH and 64.2% on MMLU, significantly better than LLAMA-2 70B on BBH. This highlights its strong general language abilities.

Image Credits: NVIDIA

3. Strong Math and Code Skills

Mathematical and programming abilities are increasingly important for AI assistants. On math benchmark GSM8K, Nemotron-4 15B achieves 46% accuracy on par with Gemma 7B. Coding shows broad competency across 11 languages, outperforming specialized models such as Starcoder and Mistral, especially on scalable, Julia, and R.

Image Credits: NVIDIA

4. Unmatched Multilingual Capabilities

Extensive multilingual evaluation results demonstrate that Nemotron-4 15B establishes new state-of-the-art for general purpose models in its class on various benchmarks for multilingual understanding. 

Image Credits: NVIDIA

It achieves 12% higher accuracy than specialized multilingual models XGLM and mGPT on the multilingual reasoning benchmark XCOPA. On the question-answering benchmark TyDiQA-GoldP, its 50.5% accuracy is significantly better than all compared models. Remarkably, on the challenging multilingual math benchmark MGSM, Nemotron-4 15B’s 41.3% accuracy surpasses the next-best score by nearly 30%.

For language translation, Nemotron-4 15B achieves a 23.2% BLEU score, translating Chinese to 6 other languages, massively outperforming LLAMA-2 13B and Baichuan-2 13B. Most strikingly, it can translate Chinese directly into lower-resource languages without deteriorating much in quality.

Success Factor Behind Nemotron-4 15B

Nemotron-4 15B owes its success to the Chinchilla scaling laws, which revolutionized language model pre-training. These laws argue for scaling the data along with the model size, rather than solely focusing on increasing the size of the model. Previous approaches only emphasized larger models without considering the significance of scaling the training data. The Chinchilla scaling laws highlight the benefits of allocating computing resources to train on vast amounts of high-quality data. By following this approach, NVIDIA was able to collect an impressive 8 trillion tokens from sources like Common Crawl, ensuring the highest data quality for Nemotron-4 15B.

Conclusion: A Milestone in Language Modeling

In conclusion, NVIDIA’s Nemotron-4 15B is a game-changer in the field of language modeling. With its extraordinary size and exceptional performance, this model has set new standards in the industry. NVIDIA enabled Nemotron-4 15B to surpass even much larger models, cementing its position as a premiere general-purpose multilingual model while maintaining an easily deployable size. As for now, the model is not yet available for public use. However, NVIDIA announced its training and architectural details in its research paper on arXiV. Make sure to check it out for in-depth details.

| Latest From Us

Exit mobile version