3866 Tokens/Second: Asthenosphere Unleashes AMD NPU's Full Fury
Picture this: an AMD Ryzen NPU churning out AI responses at 3866 effective tokens per second, no CPU or GPU in sight. Asthenosphere just turned your laptop into a speculative decoding beast.
DevTools FeedApr 03, 20264 min read
⚡ Key Takeaways
Asthenosphere achieves 3866 effective tok/s on AMD NPU with zero CPU/GPU usage.𝕏
Full 12-tile transformer pipeline enables speculative decoding at 91.8% acceptance.𝕏
Edge AI shift incoming: NPUs like Phoenix XDNA redefine on-device inference.𝕏
The 60-Second TL;DR
Asthenosphere achieves 3866 effective tok/s on AMD NPU with zero CPU/GPU usage.
Full 12-tile transformer pipeline enables speculative decoding at 91.8% acceptance.
Edge AI shift incoming: NPUs like Phoenix XDNA redefine on-device inference.