In the fast-paced world of AI development, Google has once again taken a giant leap forward with the introduction of Gemma, its new open-source model. With Gemma, Google aims to revolutionize AI development by providing state-of-the-art 2B and 7B models that outperform the competition. In this article, we will delve into the world of Gemma and how to get started using its models. Plus, we will also explore how Gemma 2B and 7B compare against Meta Llama 2 7B and Mistral 7B, two other prominent 7B models in the AI landscape. So, let’s dive in!
Table of Contents
- Introduction to Google Gemma Family
- Google Gemma vs Meta Llama 7B and Mistral 7B
- Results
- Gemma 7B Prompt Format
- How to Prompt Gemma 7B
- Role Playing
- Reasoning
- Google Collaboration with HuggingFace
- Customizing Gemma with NVIDIA NeMo Framework
- Gemma Technical Report by Google DeepMind
- How to Get Started with Google Gemma
Introduction to Google Gemma Family
Google recently introduced Gemma, a family of new open-source AI models focused on natural language. Gemma comes in two sizes: a 2 billion parameter (2B) model for CPU/edge use cases and a 7 billion parameter (7B) model designed for GPU/TPU deployment. Both 2B and 7B models demonstrate state-of-the-art performance across various NLP tasks. Additionally, the new models have been made freely available under permissive licenses, allowing beneficial uses across sectors.
Models Architecture
Gemma uses the transformer-based architecture. Both models utilize techniques like multi-query attention, rotary positional embeddings, and GEGLU activations. The 7B model uses multi-head attention, while the 2B uses multi-query attention to improve performance.
Training Methodology
Gemma is trained on billions of text tokens using Google’s powerful TPU infrastructure. The models were trained on a mixture of web, math, code and other data after filtering for quality and safety. Prefine-tuning and reinforcement learning from human feedback are also used.
Google Gemma vs Meta Llama 7B and Mistral 7B
Now, let’s talk about the performance of Gemma models. In recent benchmarks, the Gemma 7B model has demonstrated remarkable capabilities, outperforming the Meta Llama 2 7B model and Mistral 7B.
1. Performance Evaluation on Key Benchmarks
Gemma models outperform Meta Llama 2 7B on self-evaluated benchmarks by Google (MMLU, reasoning, math and coding). Gemma models demonstrate particularly strong performance on mathematics and coding benchmarks, even outperforming code-fine-tuned CodeLLaMA-7B models on MBPP.
In human evaluations, Gemma also received better ratings than LLama for instruction following and safety tasks.
2. Performance Evaluation on LLM Leaderboard
On the LLM Leaderboard test, both Gemma models clearly outperform LLama 2 7B. At the 7B size, Gemma 7B scores 63.75, also outperforming strong models like Mistral 7B and PHI-2, Its score is closer to LLama 2 70B Chat, which is a much larger and higher-scoring model.
In the 2B size category, though Gemma 2B doesn’t match the top scores, it still achieves an impressive 46.51 points. This indicates the model has good capabilities despite its smaller size compared to leaders like PHI-2.

Results
The figure presented illustrates that the Gemma 7B model exhibits robust capabilities in mathematics, science, and coding tasks. Thus, These scores represent the mean performance across various academic benchmarks, categorized by the model’s abilities.

Gemma 7B surpasses both Llama 2 7B and Mistral 7B across a range of scholarly benchmarks, demonstrating exceptional capabilities in HumanEval, GSM8K, MATH, and AGIEval. It also shows enhanced abilities in reasoning, dialogue, mathematical tasks, and coding.

The Gemma 7B models, which have been fine-tuned for instruction following, also surpass the Mistral-7B v0.2 Instruct model in terms of safety and adherence to instructions, as assessed by human evaluators.

Gemma has been assessed using various academic benchmarks for safety and has been compared to Mistral. The technical documentation also discusses the implementation of techniques for debiasing and red-teaming as methods to address prevalent risks linked with large language models (LLMs). Additional details on ethical development practices with Gemma can be located in the model card and the Responsible Generative AI toolkit.

Gemma 7B Prompt Format
The base models of Gemma don’t require a specific format for prompts and can be instructed to carry out tasks using zero-shot or few-shot prompting techniques. In contrast, the Gemma Instruct model operates using a designated format.
<start_of_turn>user
Generate a Python function that Prints Hello World <end_of_turn>
<start_of_turn>model
Below is a table that outlines the control tokens relevant to formatting in Gemma:
Context | Relevant Token |
User turn | user |
Model turn | model |
Start of conversation turn | <start_of_turn> |
End of conversation turn | <end_of_turn> |
Additionally, you can utilize these special control tokens within the scope of a multi-turn dialogue with the user, as demonstrated:
<start_of_turn>user
Find hotels in Recoleta in Buenos Aires<end_of_turn>
<start_of_turn>model
Ayres De Recoleta Plaza.<end_of_turn>
<start_of_turn>user
What can I do in Buenos Aires?<end_of_turn>
<start_of_turn>model
How to Prompt Gemma 7B
To prompt Gemma 7B effectively, mastery in applying the prompt template is essential. The subsequent examples will showcase how to adeptly use Gemma 7B Instruct’s prompt template for a variety of tasks.
Zero-shot Prompting
To utilize Gemma’s zero-shot capabilities, simply employ the following prompt structure:
<start_of_turn>user
Explain why the sky is blue<end_of_turn>
<start_of_turn>model
Zero-shot Prompting with System Prompt
Incorporating a system role or prompt can significantly enhance the guidance provided to large language models (LLMs). Though Gemma lacks an explicit system role feature, it’s possible to embed extra instructions within the prompt as illustrated:
<start_of_turn>user
Answer the following question in a concise and informative manner:
Explain why the Sun appears yellow<end_of_turn>
<start_of_turn>model
In the preceding example, we incorporated "Answer the following question in a concise and informative manner:"
as an added instruction or system prompt, aiming to refine the model’s direction.
Role Playing
Furthermore, evaluating Gemma’s potential for role-playing showcases its versatility not only in personalizing responses but also in creating more intricate domain-specific LLM agents.
Prompt:
<start_of_turn>user
You are a helpful 8th-grade science teacher. Help a 8th grader to answer questions in a short and clear manner.
Explain why the sun appears yellow<end_of_turn>
<start_of_turn>model
Model response:
Sure, here's why the Sun appears yellow:The Sun is actually a hot, glowing ball of hot gas. It's made mostly of hydrogen and helium. When hydrogen atoms combine, they release energy in the form of light. This light travels from the Sun to Earth.The Sun's light is made up of all the colors of the rainbow. However, our eyes are not able to see all of the colors equally. Our eyes are most sensitive to yellow and green light. So, when the Sun's light hits our eyes, it appears yellow.In addition to the way our eyes see the light, the Sun also appears yellow because of the way light interacts with the Earth's atmosphere. The Earth's atmosphere scatters different colors of light in different ways. This scattering of light makes the Sun appear yellow.
Reasoning
You can explore Gemma’s thought process by initiating a zero-shot chain-of-thought prompt as described below:
Prompt:
<start_of_turn>user
Think and write your step-by-step reasoning before responding.
Explain why the Sun Appears yellow.<end_of_turn>
<start_of_turn>model
Model Response:
Step 1: Understand the Composition of the Sun's Atmosphere:
Step 2: Scattering of Light:
Step 3: Rayleigh Scattering:
Step 4: Blue Sky Effect:
Step 5: Yellow Sun:
(Shortened For the Blog)
Conclusion:
The Sun appears yellow due to the scattering of light by particles in its atmosphere, particularly Rayleigh scattering. The scattered blue light is dispersed in the sky, leaving the unscattered yellow light to be seen. This phenomenon is known as the blue sky effect.
Google Collaboration with HuggingFace
Gemma has also garnered attention and support from the AI community. Hugging Face, a renowned platform for AI enthusiasts has recently collaborated with Google Gemma. Visit huggingface.co/blog/gemma to get full details about Gemma integration with transformers, integration with Google Cloud, and integration with Inference Endpoints to get started. Plus, you can also learn how to fine-tune Gemma with your respective GPU sizes using HuggingFace TRL
Gemma Models on HuggingFace
The four open-access Gemma models (two base and two instruct models) available on the HuggingFace are as follows:
- google/gemma-7b (7B Base Model)
- google/gemma-7b-it (7B Instruct Model)
- google/gemma-2b (2B Base Model)
- google/gemma-2b-it (2B Instruct Model)
Gemma Demo on HuggingChat
Moreover, you can try out the Gemma 7B Instruct model on HuggingChat.
Chat Link: https://huggingface.co/chat?model=google/gemma-7b-it

Customizing Gemma with NVIDIA NeMo Framework
If you want to customize Gemma models and take them to the next level using NVIDIA, the NeMo Framework is your go-to tool. The NeMo Framework provides a comprehensive set of tools and libraries that enable you to fine-tune and customize Gemma models according to your specific requirements. To learn more about customizing Gemma with the NeMo Framework, refer to the Gemma GitHub repository.
Gemma Technical Report by Google DeepMind
For those who crave in-depth technical details, Google DeepMind has released the “Gemma: Open Models Based on Gemini Research and Technology” report. This report provides a comprehensive overview of the Gemma models, their architecture, and the research behind them.
How to Get Started with Google Gemma
Are you excited to explore Gemma for yourself? You can visit the official Gemma website at ai.google.dev/gemma to access access quickstart guides. Additionally, the official Google Developers Blog also features insightful details about Gemma and its applications in the field of AI.
You can get started with Gemma with Ready-to-use Colab and Kaggle notebooks, alongside integration with popular tools such as Hugging Face, MaxText, NVIDIA NeMo and TensorRT-LLM. Pre-trained and instruction-tuned Gemma models can run on your laptop, workstation, or Google Cloud. Google Cloud provides easy deployment of these models on Vertex AI and Google Kubernetes Engine (GKE).
With their superior performance straight out of the box, Gemma models are best for various applications requiring capable generative models.
| More Models From Google
- Google Just Dropped Gemini 1.5 To Compete Against GPT-4
- Instant Text-to-Image Generation on Your Mobile, Thanks to Google Breakthrough Model MobileDiffusion
- Google LUMIERE Model Outperforms Gen2, SVD and Pika on Realism
- How Google ASPIRE is Making LLMs Safer by Advanced Selective Prediction
- Google Created A New AI Called AMIE Better At Medical Diagnostics Than Physicians
- Google VideoPoet: A Groundbreaking LLM for Zero-Shot Video Generation
- Google Imagen 2: A Game-Changing AI Tool That Takes Photorealism to New Heights
- Google New Gemini 1.0 Model
- Google StyleDrop: A Game-Changing AI Image Generator