Breaking

Gemma 4 31B reaches over 1,800 tokens per second in Cerebras preview

June 30, 2026 at 15:34 EDT

Foundation Models
Infra & Chips
Open Source

Google DeepMind's open-weight multimodal model "Gemma 4 31B" is now available in Public Preview on Cerebras Inference Cloud, delivering inference speeds exceeding 1,800 tokens per second. Around June 29, 2026, Cerebras announced the launch on its official blog, highlighting its ability to run multimodal inference with image input at very high speed. The offering is a limited-period preview accessed through Cerebras Inference Cloud, with the model ID gemma-4-31b.

June 29, 2026 · Cerebras Inference Cloud

The Fastest Inference Is Now Multimodal

Cerebras puts Google DeepMind's open-weight Gemma 4 31B in public preview, clocking 1,851 output tokens/sec — its first image-capable model and roughly 35× a typical GPU endpoint.

1,851

output tokens / second (measured)

1.5s

time to first token (with reasoning)

200M

Gemma 4 downloads in ~2.5 months

Output speed: dedicated hardware vs GPU

Same model (Gemma 4 31B), same unit — tokens per second, drawn to scale.

~50/s

Typical GPU
(low tens)

1,851/s

Cerebras
wafer-scale

≈ 35× faster output

Intelligence held, latency crushed

On the Artificial Analysis Intelligence Index it lands neck-and-neck with Claude Haiku — while running far faster.

29 vs 30

Intelligence Index — Gemma 4 vs Claude Haiku

18×

faster than Haiku on Cerebras

Vision agent loop — now within practical latency

Image-driven workflows that stalled on GPUs can complete in real time.

Image input

→

Reasoning

→

Tool calls

→

Verify

→

Retry

Developer reaction

Near real-time — responses arrive between keystrokes. Cited uses: dashboard analysis, UI fix-patch generation, image-driven document summarization.

The fine print

Time-limited public preview with usage limits; best suited for multimodal-focused workloads. General availability and further optimization still to come.

Model: Gemma 4 31B (dense, multimodal)

License: Apache 2.0 (open-weight)

First Google DeepMind & first multimodal model on Cerebras

Continue reading

The rest of this article is for AI News Blitz readers. Choose an option below to keep reading.