Breaking · Microsoft AI

Microsoft Unveils New Speech AI 'MAI-Transcribe-1.5', Up to 5x Faster

June 4, 2026 at 12:02 EDT

Microsoft AI on June 2, 2026 (around Build 2026) announced a new version of its speech-to-text model, MAI-Transcribe-1.5. It expands language support from 25 to 43 languages and claims to be up to 5x faster than competing models while maintaining high accuracy. According to an independent benchmark, it sits at the front of the accuracy-versus-speed tradeoff curve (the Pareto frontier).

In its official blog post, the company cited the main advances over its predecessor MAI-Transcribe-1 (released in 2025 with support for 25 languages) as broader language coverage plus improved accuracy and a new feature called keyword biasing (content biasing). This lets users supply a list of domain-specific keywords in advance so that terms in medicine, law, or specialized fields are recognized correctly in context — a feature in strong demand among enterprises. The company says it achieves a best-in-class word error rate (WER) on the multilingual FLEURS benchmark, and that using keyword biasing cuts WER by 30%.

On speed, the company claims processing roughly 270–276x faster than real time, and up to 5.7x faster than MAI-Transcribe-1 on long-form audio. According to the independent benchmark site Artificial Analysis, it scores 2.4% on the AA-WER accuracy metric (3rd place; Alibaba's Fun-Realtime-ASR-preview leads at 1.7%, with ElevenLabs Scribe v2 second at 2.2%), but ranks first on the Pareto frontier when speed is factored in. Robustness to real-world audio such as conference rooms, phone calls and street noise, support for multiple speakers, accents and fast speech, and automatic punctuation were also highlighted.

The model is offered through Microsoft Foundry / Azure and is listed in the Azure model catalog. Pricing is $0.36 per hour of audio (billed per second) on OpenRouter, and a comparable $6 per 1,000 minutes on Foundry. Streaming support is said to be coming soon. The model is already used in Copilot, Bing, OneDrive, PowerPoint and Azure Speech.

The announcement is part of the MAI model family advanced by the Microsoft AI team, alongside MAI-Image-2.5, MAI-Voice-2 and MAI-Code-1-Flash, also shown at Build 2026. Transcription is regarded as a foundational capability for meeting summaries, captions, call analysis, accessibility and searchable archives — a fiercely contested area where OpenAI Whisper, Google Gemini, ElevenLabs Scribe and Alibaba models compete.

The X thread posted right after the announcement drew positive replies such as "Impressive" and "Speed meets accuracy, game changer," but likes and reposts remain few. Concrete use-case reports and testing by developers and enterprises are expected to grow in the coming weeks.

Source post →