ainewsblitz.com

Breaking · xAI

xAI's Grok Voice APIs Go Live on Vapi, Touted for TTS and STT

xAI announced on June 3, 2026, via its official account that the Text-to-Speech (TTS) and Speech-to-Text (STT) APIs for its Grok voice AI are now available on Vapi, the enterprise platform for building voice AI agents. Quoting a post from Vapi, xAI urged users to "try the most natural TTS and cost-effective STT APIs" on Vapi, while Vapi itself stated that "Grok STT and Grok TTS are now live on Vapi."1

The Vapi availability follows xAI's official release of Grok STT and Grok TTS as standalone voice APIs on April 17, 2026. These stacks are said to extend Grok Voice, built on technology already deployed in production for Tesla vehicles and Starlink customer support. Vapi handles orchestration across STT, LLM and TTS, and the Grok API integration lets developers build natural-sounding voice agents on Vapi at low cost.3

According to the published specs, TTS is priced at $15.00 per 1M characters and offers five expressive voices including eve, ara, rex, sal and leo. Expression can be finely controlled with inline speech tags such as laugh, sigh, whisper, emphasis, pause and slow, with support for both batch and streaming, output formats including MP3, WAV, PCM and μ-law, and more than 25 languages.43 STT is priced at $0.10 per hour for batch (REST) and $0.20 per hour for streaming (WebSocket), supporting over 25 languages, word-level timestamps, speaker diarization, multi-channel and inverse text normalization.3 A realtime voice agent is listed at $0.05 per minute ($3.00 per hour). On Vapi, model provider fees apply separately, bring your own key is supported, and Vapi hosting starts at $0.05 per minute with volume discounts.5

The move is positioned as a bid to differentiate on naturalness and price from rivals such as ElevenLabs, Deepgram and AssemblyAI. Online, some cite the Tesla/Starlink track record as a strength, while third-party mentions claim Grok outprices these incumbents.

Reaction on X was largely positive, with the originating post drawing more than 45 replies. Comments included "cost-effective STT plus human-like TTS is going to crush it," "finally, voice AI that doesn't sound like a robot," and "Grok's TTS already sounds insanely natural." Users building voice agents on Vapi chimed in, along with mentions of business use cases such as customer support and regulated workflows, and plans for real call testing. Questions about supported languages, voice cloning support and whether it is open source also appeared. A minority voiced ethical concerns over voice use, suggesting models may have been "trained on people without consent." Vapi's official account said it was "proud to be able to bring these models to voice developers on Vapi."2

Source post →