What causes the 99.8% non-compute power waste in LLM inference?

Mostly memory bandwidth — shuttling activations and KV cache eats electrons, not matrix multiplies.

Why has GPU TDP skyrocketed since 2017?

Dennard scaling's end; denser chips demand more power for frequency and parallelism, no free lunches.

Will optical computing fix LLM power walls?

Potentially huge — light-based interconnects could slash memory power 10x, arriving mid-decade if hype holds.

99.8% of Your LLM's Power Gulps Go to Memory, Not Math

Ever wonder why your cutting-edge LLM runs hot enough to grill steaks? Turns out, 99.8% of its inference power isn't crunching numbers—it's shuttling data around.

theAIcatchup Apr 08, 2026 3 min read

Escalating NVIDIA GPU TDP chart from V100 to Blackwell, highlighting power wall

⚡ Key Takeaways

Power, not compute, claims 99.8% of LLM inference energy due to memory bandwidth dominance. 𝕏
Post-2006 Dennard collapse turned GPU TDPs into a relentless upward escalator. 𝕏
Optical interconnects may shatter the power wall, echoing fiber's net revolution. 𝕏

Published by

theAIcatchup

Ship faster. Build smarter.

#Dennard scaling #GPU TDP #GPU power limits #LLM inference #LLM inference power #dark silicon #power consumption

Worth sharing?

Get the best Developer Tools stories of the week in your inbox — no noise, no spam.

Originally reported by dev.to

⚡ Key Takeaways

The 60-Second TL;DR

theAIcatchup

Share this article

Worth sharing?

Related Stories

Q4 KV Cache Quantization: Cram 32K Contexts into 8GB VRAM — If the Math Holds

Anthropic's Claude Mythos: Built to Break Software, Buried for Safety

Done Isn't Done: LLM-Proof Docs or Bust in 2026

You're Burning 97% More on LLM Tokens Than You Need—Here's Proof and the Fix

Stay in the loop