ainewsblitz.com

Breaking

Gemma 4 31B reaches over 1,800 tokens per second in Cerebras preview

  • Foundation Models
  • Infra & Chips
  • Open Source

Google DeepMind's open-weight multimodal model "Gemma 4 31B" is now available in Public Preview on Cerebras Inference Cloud, delivering inference speeds exceeding 1,800 tokens per second. Around June 29, 2026, Cerebras announced the launch on its official blog, highlighting its ability to run multimodal inference with image input at very high speed. The offering is a limited-period preview accessed through Cerebras Inference Cloud, with the model ID gemma-4-31b.

Continue reading

The rest of this article is for AI News Blitz readers. Choose an option below to keep reading.

$20
Read this article
$29/month
Unlimited — all 3,284 articles, the full archive, and comprehension quizzes
Save 72%
$98/year
≈ $8.17/month
Unlimited, billed once a year