AI Industry Daily News
A roundup of the AI industry's day, centered on Codex Windows support, grok-build-0.1, Claude Opus 4.8, Command A+, and Rosalind Biodefense.
Today's highlights
Key topics and reactions
DeepSeek V4 Pro Ranks First in Output Speed and Latency on Together AI
DeepSeek V4 Pro took the top position in both output speed and latency in Artificial Analysis benchmarks running on Together AI. The result is attributed to inference-system optimizations such as KV-cache tuning and prefix reuse rather than raw model capability alone.
Users report faster responses in agentic coding and long-context tool use. Local-execution reports also drew attention, with a 284B-parameter DeepSeek-V4-Flash GGUF build reportedly running above 1 token per second at around 8 watts on a Raspberry Pi 5.
Commenters noted that while the model leads on speed, pricing and quota balance remain points of discussion, and memory and power constraints persist in local environments.
Grok Imagine Video 1.5 Preview Tops Image-to-Video Leaderboard With Elo 1357
xAI's video generation model Grok Imagine Video 1.5 Preview took the overall top spot on the image-to-video leaderboard with an Elo score of 1357, 49 points ahead of second place. Through its API, an input image produces clips of about 15 seconds with native audio sync, motion, camera work, and physics in a single pass.
Users highlighted its value for prototyping, with some engineers reporting they had created their first video. Others praised its ability to control fine facial expressions.
Reports also noted artifacts and a 'body-switching' effect under load, with consistency limits even in the Heavy plan's priority queue. Simpler prompts and off-peak use reportedly reduce the issues, which are seen as rooted in the underlying image model.
Z.ai Releases Coding Model GLM-5.2 With 1M Context and MIT License
Z.ai released GLM-5.2, a coding-specialized model supporting a 1M-token context window under an MIT license. The company said it plans to open-source the model next week.
The release adds to a wave of agentic coding models reaching inference platforms, positioning GLM-5.2 against proprietary alternatives.
Moonshot's Kimi-K2.7-Code and StepFun's Step 3.7 Flash Reach Inference Platforms
Moonshot's Kimi-K2.7-Code, an agentic coding model based on Kimi K2.6 aimed at long-running software development, became available on Together AI. StepFun's open-source multimodal reasoning model Step 3.7 Flash launched on DeepInfra.
MiniMax M3 is also running on Nous Research's Hermes Agent. According to a shared tally, more than 40 major models including Qwen3, Claude Opus 4.x, GPT-5.x, Gemini 3.x, and Grok 4.3 were released in the first half of 2026, bringing cumulative releases above 316.
Category highlights
Foundation Model Roundup
Z.ai released the 1M-context, MIT-licensed GLM-5.2 with open-sourcing planned next week. Moonshot's Kimi-K2.7-Code arrived on Together AI and StepFun's Step 3.7 Flash on DeepInfra, while MiniMax M3 runs on Nous Research's Hermes Agent. A shared tally counts over 40 major models released in the first half of 2026 and cumulative releases above 316.
Video Generation Activity
Grok Imagine Video 1.5 Preview leads image-to-video rankings. PixVerse showcased original works built with Canvas, HeyGen published a comparison demo of HyperFrames and After Effects, and India's avataar.ai announced Varya AI, a low-cost, fast video generation model backed by the IndiaAI Mission. Comparison lists covering Meta AI, Google Flow, Hailuo AI, Runway, Kling AI, and Google Veo were also shared.
Audio and Music Workflows
A workflow combining AI video annotation with ElevenLabs voice and music layers was shared, illustrating broader creative applications across audio and music production.
Local Execution Gains Traction
Reports of large models running locally are increasing, including a 284B-parameter DeepSeek-V4-Flash GGUF build running above 1 token per second at around 8 watts on a Raspberry Pi 5, and the AMD Ryzen AI Max+ 395 chip running a 235B-parameter model on a single piece of silicon.
Key Trends
Agentic coding models specialized for long-running software development are arriving rapidly across inference platforms. Inference-system optimization is emerging as a differentiator alongside raw model capability, while open-source and sovereign AI efforts and a shift toward end-to-end video pipelines continue to grow.