Google unveils PaliGemma, announces Gemma 2

PaliGemma is an open vision-language model designed for tasks such as image captioning, visual question answering, and object detection.

Credit: Anna Martyanova/Shutterstock

Google has expanded on its Gemma family of AI models, introducing the PaliGemma vision-language model (VLM) and announcing Gemma 2, the next generation of Gemma models based on a new architecture. The company also released the LLM Comparator in open source, an addition to its Responsible Generative AI Toolkit.

Google announced the new products on May 14. The company described PaliGemma as a powerful open VLM inspired by the Pali-3 vision-language models, intended to be smaller, faster, and stronger. Built on components from the SigLIP vision model, PaliGemma is designed for a range of vision-language tasks including image and video captioning, visual question answering, understanding text in images, object detection, and object segmentation. PaliGemma can be found on GitHub, Hugging Face, Kaggle, and Vertex AI.

Gemma 2, due to be formally launched in coming weeks, features a new architecture designed for “breakthrough performance and efficiency,” Google said. At 27 billion parameters, Gemma 2 offers performance comparable to Llama 3B at less than half the size, Google said. An efficient design reduces deployment expenses, with Gemma 2 fitting on less than half the compute of comparable models. For fine-tuning, Gemma 2 can work with solutions ranging from Google Cloud to tools such as Axolotl.

Google also added to its Responsible Generative AI Toolkit by releasing the LLM Comparator in open source. Designed to assist developers with conducting model evaluations, the LLM Comparator is an interactive data visualization tool that allows users to perform side-by-side evaluations of model responses to assess their quality and safety.

Topics

About

Policies

Our Network

More

Google unveils PaliGemma, announces Gemma 2

PaliGemma is an open vision-language model designed for tasks such as image captioning, visual question answering, and object detection.

More from this author

Spin 3.0 supports polyglot development using Wasm components

JDK 24: The new features in Java 24

Rust Foundation moves forward on C++ and Rust interoperability

JetBrains IDEs ease debugging for Kubernetes apps

TypeScript 5.7 improves error reporting

Go language rises in Tiobe popularity index

Red Hat Developer Hub adds AI templates

Java proposals would boost resistance to quantum computing attacks

Show me more

The dirty little secret of open source contributions

14 great preprocessors for developers who love to code

Designing the APIs that accidentally power businesses

Building Python wheels to distribute your programs

Creating a pip install-able Python package

How to get better web requests in Python with httpx

Google unveils PaliGemma, announces Gemma 2

PaliGemma is an open vision-language model designed for tasks such as image captioning, visual question answering, and object detection.

Related content

Go language evolving for future hardware, AI workloads

Visual Studio 17.12 brings C++, Copilot enhancements

Microsoft’s .NET 9 arrives, with performance, cloud, and AI boosts

Red Hat OpenShift AI unveils model registry, data drift detection

More from this author

Spin 3.0 supports polyglot development using Wasm components

JDK 24: The new features in Java 24

Rust Foundation moves forward on C++ and Rust interoperability

JetBrains IDEs ease debugging for Kubernetes apps

TypeScript 5.7 improves error reporting

Go language rises in Tiobe popularity index

Red Hat Developer Hub adds AI templates

Java proposals would boost resistance to quantum computing attacks

Show me more

The dirty little secret of open source contributions

14 great preprocessors for developers who love to code

Designing the APIs that accidentally power businesses

Building Python wheels to distribute your programs

Creating a pip install-able Python package

How to get better web requests in Python with httpx