Breaking · googleaidevs

Google highlights laptop-ready Gemma 4 12B

June 3, 2026 at 12:07 EDT

Google on June 3, 2026 highlighted the launch of Gemma 4 12B, a unified, encoder-free multimodal model designed to run on laptops. Positioned between the mobile-oriented E4B and the larger 26B MoE model, it packs frontier-class reasoning and native audio, and is offered under the Apache 2.0 license. Details

The 12B is part of the Gemma 4 family that Google DeepMind formally announced around April 2, 2026. The family spans several sizes — E2B/E4B (edge and mobile), 12B (unified encoder-free), 26B A4B (MoE), and 31B (dense) — with the 12B framed specifically as a "unified" model for laptops. Google blog

The Gemma series is a line of open-weight models built on Gemini research, and Gemma 4 is positioned as the "most capable open model byte for byte." It strengthens agentic workflows — multi-step planning, tool use, and autonomous execution — and local execution, drawing comparisons with rival open models such as Llama and Qwen. The encoder-free architecture in the 12B removes the traditional standalone vision/audio encoders, instead feeding inputs directly into the LLM backbone via linear projection for efficiency. The Apache 2.0 license broadly permits commercial use and fine-tuning. Hugging Face

On specifications, the model supports native audio and includes Multi-Token Prediction (a built-in draft model for speculative decoding) to speed up inference. The context window reaches up to 256K for mid-size models, and it supports function calling, structured output, dynamic vision resolution, and more than 140 languages. Inference memory requirements for the 12B are cited as 26.7GB at BF16, 13.4GB at SFP8, and 6.7GB at Q4_0 (rough figures including 20% overhead), with 16GB of VRAM or unified memory said to make agentic workflows practical. Model overview

Distribution channels are wide-ranging — Hugging Face, Kaggle, Ollama, LM Studio, vLLM, MLX, Google AI Edge, and Android AICore — with rollout on Google Cloud Model Garden as well. Google Cloud

Reports of installations on Ollama and MLX followed immediately after the post. Japanese users said it "looks good as a backend for AI characters" and noted that ollama pull gemma4:12b enables text/image support with strong performance on consumer hardware, while some pointed out that Unsloth's 4-bit GGUF can run on laptops with 8GB of RAM. Reaction Others offered test reports such as "weaker than 26B A4B or 31B but good if you prioritize efficiency" and an error case during fact-checking, reflecting high expectations for easy local execution and agentic performance alongside notes about the gap with larger models. Test report On Reddit, the community shared an installation report on a MacBook Pro M2 Pro. Reddit

Source post →