OpenAI engineers shared internally in early June 2026 that they had found a software optimization cutting inference costs by more than half on some existing models, according to reports. As reported by The Information, after applying the optimization, the number of Nvidia GPUs needed to serve traffic from ChatGPT's logged-out guest users dropped to "a couple hundred." (the-decoder)
Continue reading
The rest of this article is for AI News Blitz readers. Choose an option below to keep reading.
Already purchased? Sign in✓ Signed in — this article isn’t included in your current plan.