AI News Blitz

BREAKING

NVIDIA Touts DFlash: Up to 15x Inference

How DFlash Speculative Decoding Works

1Block diffusion draft

↓

2Parallel single pass

↓

3Batch verify

↓

4Accept multiple tokens

Throughput Gains by Configuration

gpt-oss-120b15

Gemma 4 31B5.8

Qwen3-8B5.1

0tokens

Max block size

Acceptance rate

Over EAGLE-3

Strengths and Known Limits

Strengths

●High on math and code

●Drop-in for vLLM, SGLang

●MIT license, 20+ checkpoints

Limits

●Lower on AgentBench

●Tapers with long context

●Draft bottleneck if quantized

DFlash Ecosystem Keeps Widening

AI NEWS BLITZ

NVIDIA says its open-source DFlash model can boost LLM inference up to fifteen times on Blackwell GPUs.