Breaking

Grok Build 0.1 Ranks #15 in New Agent Arena, Shows Bash Gains

June 8, 2026 at 16:39 EDT

Software Dev & Coding
Foundation Models
AI Agents

xAI's coding-focused model "Grok Build 0.1" has placed #15 in "Agent Arena," a new benchmark released by Arena.ai on June 8, 2026. The company's general-purpose model "Grok 4.3 (High)" landed at #17, with both models scoring below average—yet Grok Build 0.1 demonstrated solid improvement in terminal-handling capability (ranking details).

June 8, 2026 · Agent Arena

Grok Build 0.1 cracks the agentic-coding top 15 — judged on real users, not benchmarks

xAI's new agentic coding model lands at #15 on a leaderboard built from 429,863 real user tasks, edging ahead of its predecessor Grok 4.3 (#17) on practical task completion.

#15

Overall rank
(Grok 4.3 sat at #17)

Bash Recovery
+6.1% — its strongest skill

429,863

Real user tasks analyzed
millions of tool calls

$1/$2

Per M input / output tokens
speed + low cost play

Metric breakdown vs the field

Rank shown left; signal vs. average shown right. Green = above field, orange = below.

Bash Recovery · #9 +6.1%

Tool Hallucination · #19 -3.5%

Net Improvement · #15 -5.3%

Confirmed Success · #15 -6.3%

Steerability · #15 -7.0%

Praise vs Complaint · #18 -15.8%

Center line = field average · Bash Recovery is the lone above-average signal.

▲ What developers praise

Faithful to rules & workflows
Strong in bash / tool use
Fast — idea to prototype
Clean, readable code
Holds context across steps

▼ Where it falls short

Many tool calls burn credits
Raw ability below Sonnet/Claude
Lower steerability, more hallucinations
Manual fixes on complex architecture

Third-party task completion rates

Terminal Bench 2.0-style 50.6%

Continue reading

The rest of this article is for AI News Blitz readers. Choose an option below to keep reading.