Google launches EmbeddingGemma, a powerful open-source model for on-device AI

Google has introduced EmbeddingGemma, a lightweight open-source embedding model designed for on-device AI. With 308M parameters, it delivers best-in-class multilingual embeddings under 500M on the MTEB benchmark while running on less than 200MB RAM.

Optimized for speed, privacy, and offline use, it powers RAG pipelines, semantic search, and real-time applications across mobile and desktop devices.

Pragya Singha Roy | Updated on: Sep 05, 2025 | 12:18 PM

Google has announced a new open embedding model, EmbeddingGemma, which aims to bring high-quality, multilingual text embeddings to the devices people use daily. The model uses only 308 million parameters, making it efficient and privacy-friendly, enabling developers to develop Retrieval Augmented Generation (RAG) pipelines, semantic search engines, and other artificial intelligence applications that can perform well without any connection to the internet.

Embedding Gemma is the highest-performing multilingual text embedding model with less than 500M parameters on the Massive Text Embedding Benchmark (MTEB). It is based on the Gemma 3 architecture and can implement more than 100 languages, but when quantised can use less than 200MB of RAM. This renders it appropriate across mobile phones, laptops, desktops and edge gadgets in which velocity and resource effectiveness are vital.

Embeddings in a compact design

The model architecture uses a combination of 100M core parameters and 200M embedding parameters to achieve state-of-the-art performance with minimum resource consumption. Settlers can choose any number of output dimensions, between 768 and 128, using Matryoshka Representation Learning, and trade quality and speed according to their priorities. Google also claims embedding inference on EdgeTPU at under 15 milliseconds on real-time interactions, another testament to speed.

Built for offline and private AI

Add TV9 English As A Trusted Source

Embedding Gemma is designed to consider the privacy of the user, allowing them to handle sensitive information on their device. In one way or another, seeking something within files, emails, and notifications to power personalised offline chatbots with Gemma 3n, the model opens up a range of applications without the need to connect to a cloud. It also helps in fine-tuning of particular domains, languages or tasks, and makes it flexible for developers.

Embedding Gemma has frameworks such as sentence-transformers, llama.cpp, MLX, Ollama, LiteRT, transformers.js, LMStudio, and Weaviate already integrated to make adoption easy. Model weights are made available to developers on Hugging Face, Kaggle, and Vertex AI and, even better, with a set of detailed guides and quickstart notebooks. This launch gives Google a better option to use on-device AI and leave its Gemini Embedding model to serve big server applications.