Breaking

OpenAI Cuts Inference Costs by Over Half on Some Models via Software

June 30, 2026 at 15:00 EDT

Foundation Models
Infra & Chips

OpenAI engineers shared internally in early June 2026 that they had found a software optimization cutting inference costs by more than half on some existing models, according to reports. As reported by The Information, after applying the optimization, the number of Nvidia GPUs needed to serve traffic from ChatGPT's logged-out guest users dropped to "a couple hundred." (the-decoder)

Early June 2026 · OpenAI

An optimization that more than halved inference costs

Without buying new chips, OpenAI engineers reportedly made existing servers efficient enough to serve all logged-out ChatGPT traffic on just a few hundred NVIDIA GPUs — easing the heaviest cost in running generative AI.

−50%+

Inference cost, more than halved on some existing models

~100s

GPUs to serve logged-out ChatGPT at a single peak moment

New chips procured — pure efficiency gains

Gross margin target for year-end

The savings are framed as room to lift margins, loosen usage limits, and sharpen API pricing.

39%

earlier this year

→

52%

year-end target

What's confirmed — and what isn't

Scope is limited to logged-out ChatGPT users — spillover to paid users or the API is not confirmed.
The specific methods stay undisclosed — no technical post or benchmark released.
Effects on response quality and paid traffic remain unverified.

Praise

Serving all logged-out traffic on a few hundred GPUs is seen as a major leap in inference efficiency — part of an industry race to squeeze more from existing hardware.

Concern

Some tie a perceived dip in responses to the change, suspect a quality tradeoff behind the logged-out limit, and ask if savings reach users or just rehash known tricks like quantization.

Common levers for inference efficiency — which were used is not confirmed

Quantization KV cache Batching Speculative decoding Routing to cheaper models

Continue reading

The rest of this article is for AI News Blitz readers. Choose an option below to keep reading.