By signing in or creating an account, you agree with Associated Broadcasting Company's Terms & Conditions and Privacy Policy.
Google has announced a new open embedding model, EmbeddingGemma, which aims to bring high-quality, multilingual text embeddings to the devices people use daily. The model uses only 308 million parameters, making it efficient and privacy-friendly, enabling developers to develop Retrieval Augmented Generation (RAG) pipelines, semantic search engines, and other artificial intelligence applications that can perform well without any connection to the internet.
Embedding Gemma is the highest-performing multilingual text embedding model with less than 500M parameters on the Massive Text Embedding Benchmark (MTEB). It is based on the Gemma 3 architecture and can implement more than 100 languages, but when quantised can use less than 200MB of RAM. This renders it appropriate across mobile phones, laptops, desktops and edge gadgets in which velocity and resource effectiveness are vital.
The model architecture uses a combination of 100M core parameters and 200M embedding parameters to achieve state-of-the-art performance with minimum resource consumption. Settlers can choose any number of output dimensions, between 768 and 128, using Matryoshka Representation Learning, and trade quality and speed according to their priorities. Google also claims embedding inference on EdgeTPU at under 15 milliseconds on real-time interactions, another testament to speed.
Embedding Gemma is designed to consider the privacy of the user, allowing them to handle sensitive information on their device. In one way or another, seeking something within files, emails, and notifications to power personalised offline chatbots with Gemma 3n, the model opens up a range of applications without the need to connect to a cloud. It also helps in fine-tuning of particular domains, languages or tasks, and makes it flexible for developers.
Embedding Gemma has frameworks such as sentence-transformers, llama.cpp, MLX, Ollama, LiteRT, transformers.js, LMStudio, and Weaviate already integrated to make adoption easy. Model weights are made available to developers on Hugging Face, Kaggle, and Vertex AI and, even better, with a set of detailed guides and quickstart notebooks. This launch gives Google a better option to use on-device AI and leave its Gemini Embedding model to serve big server applications.