Digital Product Studio

Google Have Upgraded the Gemma Family With Gemma 2 and PaliGemma Models

Ever since its launch, the Gemma family of models from Google has received success from developers worldwide. Millions of downloads within months of release showcase the potential of these lightweight and capable AI models. Witnessing this creativity has fueled Google to keep upgrading the Gemma models for better performance and wider usability. In this article, we will discuss the latest additions and upgrades to the Gemma family announced by Google – Gemma 2 and PaliGemma.

Gemma 2: The Next Generation Model

Gemma 2 is the next big thing in the Gemma lineup. It is a powerhouse, boasting 27 billion parameters. What sets Gemma 2 apart is its remarkable efficiency. Despite its size, Gemma 2 requires fewer computational resources, reducing deployment costs significantly. Still under development, Gemma 2 promises breakthrough performance with a new optimized architecture. 

Key Features of Gemma 2

Some key features of Gemma 2 include:

1. Increased Size and Performance

At 27 billion parameters, Gemma 2 can match the capabilities of models twice its size. This makes it comparable to 70B Llama while using half the compute power.

2. Lower Deployment Costs

Gemma 2’s efficient design allows it to run on less powerful hardware. It can operate on a single TPU host or NVIDIA GPUs, making deployment more affordable for all.

3. Versatile Tuning

Gemma 2 will support extensive tuning capabilities on various platforms like Google Cloud, HuggingFace, TensorRT, etc. Developers can also optimize it to meet their specific application needs.

4. Benchmark Performance

As per the benchmarks, Google Gemma 2 (27b) almost matches Meta Llama 3 (70B) on tasks like MMLU, HellaSwag and GSM8K and outperforms xAI’s Grok-1 (314 B) on MMLU and GSM8K despite its smaller size as shown in the chart below:

Google Have Upgraded the Gemma Family With Gemma 2

Coming Soon

The official launch is slated for the coming weeks. So, stay tuned for more updates on Gemma 2’s capabilities and availability.

PaliGemma: A Powerful Vision-Language Model

Along with Gemma 2, Google is releasing PaliGemma, an advanced open-source Vision-Language Model (VLM) inspired by PaLI-3. PaliGemma is engineered to excel in a wide range of vision-language tasks. From image and short video captioning to visual question answering, understanding the text in images, object detection, and object segmentation, PaliGemma does it all. 

Key Features of PaliGemma

1. Image Captioning

PaliGemma can generate natural language descriptions for images when prompted with captions. This enables generation of summaries for images.

Google Have Upgraded the Gemma Family With PaliGemma Models

2. Visual Question Answering

It has the ability to comprehend images and answer free-form questions related to objects, scenes and details in an image. This expands its understanding capabilities.

Google Have Upgraded the Gemma Family With PaliGemma Models

3. Object Detection

The model is trained to detect and localize objects present in an image when prompted. It outputs bounding box coordinates for detected objects.

4. Referring Expression Segmentation

Going beyond detection, PaliGemma can also segment out specific objects referred to in an image using natural language phrases. This fine-grained understanding is useful for applications involving precise segmentation.

5. Document Understanding

Combining its image and text understanding, PaliGemma exhibits strong reasoning and comprehension skills over multi-modal inputs containing both visuals and textual content. This makes it suitable for tasks involving joint vision-language inferences.

How PaliGemma Works?

PaliGemma takes both images and text as inputs and can answer questions about images with detail and context, meaning that PaliGemma can perform deeper analysis of images and provide useful insights, such as captioning for images and short videos, object detection, and reading text embedded within images.

Google Have Upgraded the Gemma Family With PaliGemma Models

PaliGemma Models

The team at Google has released three types of models: the pretrained (pt) models, the mix models, and the fine-tuned (ft) models, each with different resolutions and available in multiple precisions for convenience. All models are released in the Hugging Face Hub model repositories with their model cards and licenses and have transformers integration.

1. Pre-trained Models 

Image-Text-to-Text Pretrained models in transformers format:

Image-Text-to-Text Pre-trained models for use with the big_vision repo

2. Fine-tuned models

Below are the fine-tuned models of PeliGemma

3. Mix models

Below are Image-Text-to-Text Mix Models of PaliGemma

  1. google/paligemma-3b-mix-224
  2. google/paligemma-3b-mix-448
  3. google/paligemma-3b-mix-224-jax
  4. google/paligemma-3b-mix-448-jax

How to Use PaliGemma

You can find PaliGemma on GitHub, Hugging Face models, Kaggle, Vertex AI Model Garden, and with easy integration through JAX and Hugging Face Transformers. Keras integration is also coming soon. 

HuggingFace Demo

Additionally, an interactive demo is available on HuggingFace Space that allows exploring PaliGemma’s image-to-text abilities.

In Conclusion

With Gemma 2 and PaliGemma, Google is raising the bar of capabilities for open-source AI models. PaliGemma can understand images and process language, while Gemma 2 boosts size and efficiency to enable more use cases. Both offer expanded capabilities for developers and researchers to build diverse multimodal applications. 

| Also Read Latest From Us


Stay updated with the latest news and exclusive offers!

* indicates required
Picture of Faizan Ali Naqvi
Faizan Ali Naqvi

Research is my hobby and I love to learn new skills. I make sure that every piece of content that you read on this blog is easy to understand and fact checked!

Leave a Reply

Your email address will not be published. Required fields are marked *

The reCAPTCHA verification period has expired. Please reload the page.

Don't Miss Out on AI Breakthroughs!

Advanced futuristic humanoid robot

*No spam, no sharing, no selling. Just AI updates.