OpenChat has recently introduced its new language model: OpenChat-3.5-0106-Gemma. It has been proven to outperform Google’s original Gemma 7B model with an astounding 6T tokens on several benchmarks. Let’s explore what makes this OpenChat Gemma model so special and how it achieves such impressive results.
OpenChat-3.5-0106: A Foundation to Build Upon
Before we delve into the details of the OpenChat Gemma model, let’s first understand the foundation on which it is built. OpenChat-3.5-0106 is a 7B parameter conversational language model originally trained using Mistral, with C-RLFT fine-tuning to advance open-source models with mixed data. It demonstrates state-of-the-art performance on benchmarks like HumanEval, AGIEval and more. The model code and weights are released under an open-source license for everyone to use freely.
Introduction to OpenChat-3.5-0106-Gemma
OpenChat-3.5-0106-Gemma is a unique model that applies the same C-RLFT training procedure as OpenChat-3.5-0106 but uses the Gemma 7B model by Google instead of Mistral. Google’s proprietary Gemma framework allows the replication of the model performance and methodology under their terms of use license. The OpenChat Gemma model achieves results that are on par with the Mistral-based version, which is no small feat while outperforming the original Gemma 7B model.
The Secret Recipe: 6T Tokens
The secret lies in the recipe, and the main ingredient is the use of 6T tokens. During initial model development, OpenChat found that using 6T (trillion) tokens for unsupervised pretraining led to significant gains over standard 1T token pretraining. This extra data helps the model learn richer representations, leading to more coherent, consistent, and capable conversations. This gemma model benefits from inheriting this high-quality pretraining foundation. The use of vast pretraining data can thus be considered a key “secret sauce” to the model’s strong zero-shot capabilities.
Performance Evaluation of OpenChat-3.5-0106-Gemma
Openchat-3.5-0106-gemma achieves similar performance to the Mistral-based openchat model and outperforms the original Gemma 7B model by Google on various benchmarks, which we will discuss in detail below. Plus, it also outperforms the popular OpenHermes 2.5 7B model on 7/8 benchmarks and widely used OpenAI ChatGPT in 4/8 benchmarks.
Model | OpenChat-3.5-0106 Gemma (7B) | ChatGPT (March) | OpenHermes 2.5 (7B) |
Average | 64.4 | 61.5 | 59.3 |
MT-Bench | 7.83 | 7.94 | 7.54 |
HumanEval | 67.7 | 48.1 | 48.2 |
BBH MC | 52.7 | 47.6 | 49.4 |
AGIEval | 50.2 | 47.1 | 46.5 |
TruthfulQA | 55.4 | 57.7 | 57.5 |
MMLU | 65.7 | 67.3 | 63.8 |
GSM8K | 81.5 | 74.9 | 73.5 |
BBH CoT | 63.7 | 70.1 | 59.9 |
1. OpenChat-3.5-0106-Gemma vs. OpenChat-3.5-0106 Mistral
As per the benchmarks, the OpenChat Gemma model achieves state-of-the-art performance similar to the Mistral version on most NLP tasks. On average, across all benchmarks, the Gemma version scores 64.4 compared to 64.5 for Mistral. Both models outperform other baselines like ChatGPT. On tasks measuring conversational skills like HumanEval, the Gemma version scores 67.7, while Mistral achieves 71.3. However, on mathematical problem solving under the MATH benchmark, Gemma scores slightly higher at 29.3 vs. 28.6 for Mistral.
Model | OpenChat-3.5-0106 Gemma | OpenChat-3.5-0106 Mistral |
# Params | 7B | 7B |
Average | 64.4 | 64.5 |
MT-Bench | 7.83 | 7.8 |
HumanEval | 67.7 | 71.3 |
BBH MC | 52.7 | 51.5 |
AGIEval | 50.2 | 49.1 |
TruthfulQA | 55.4 | 61.0 |
MMLU | 65.7 | 65.8 |
GSM8K | 81.5 | 77.4 |
BBH CoT | 63.7 | 62.2 |
Overall, the benchmark results establish that OpenChat Gemma is able to match the high-quality performance of the original Mistral model despite differences in implementation frameworks.
2. OpenChat-3.5-0106-Gemma vs. Gemma-7B
When compared to the original Gemma-7B model, OpenChat Gemma demonstrates clear improvements on all benchmarks where scores are available.
For example, on HumanEval, the OpenChat Gemma model scores 67.7 compared to 32.3 for the original Gemma-7B model. Similarly, on AGIEval, the scores are 41.7 vs 50.2 vs 41.7 respectively. Notably, OpenChat Gemma even sets a new state-of-the-art score of 81.5 on the GSM8K dialogue task, surpassing the Gemma-7B score of 46.4.
Model | # Params | HumanEval | AGIEval | MMLU | GSM8K |
OpenChat-3.5-0106 Gemma | 7B | 67.7 | 50.2 | 65.7 | 81.5 |
Gemma-7B | 7B | 32.3 | 41.7 | 64.3 | 46.4 |
These results validate that the C-RLFT fine-tuning methodology effectively enhances the capabilities of Gemma-7B.
How to Get Started With OpenChat Gemma Model
To use this model, visit the model page on HuggingFace by OpenChat. There, you’ll find all the necessary information and resources to use it effectively. The model is compatible with the OpenAI ChatCompletion API specifications. Additionally, you can use the OpenChat Web UI for a user-friendly experience.
Conclusion
OpenChat-3.5-0106-Gemma represents a significant advancement in conversational AI. With its fine-tuning technique, it pushes the boundaries of what is possible in text generation tasks. Its impressive performance demonstrates the potential for even more sophisticated conversational AI models in the future.
| Also Read: Google’s New Gemma 2B and 7B Open-Source AI Models, But Do They Beat Meta Llama 2 7B and Mistral 7B?
| Latest From Us
- Meet Codeflash: The First AI Tool to Verify Python Optimization Correctness
- Affordable Antivenom? AI Designed Proteins Offer Hope Against Snakebites in Developing Regions
- From $100k and 30 Hospitals to AI: How One Person Took on Diagnosing Disease With Open Source AI
- Pika’s “Pikadditions” Lets You Add Anything to Your Videos (and It’s Seriously Fun!)
- AI Chatbot Gives Suicide Instructions To User But This Company Refuses to Censor It