🚀 New Releases
I Fired Up LLMs on Intel's NPU — Shocking Load Times and CPU Wins
Intel promises NPUs will turbocharge on-device AI, but my ThinkPad test? A 96-second model load slog where CPU smoked it. Here's the raw truth.
theAIcatchup
Apr 10, 2026
4 min read
⚡ Key Takeaways
-
NPU load times crush usability at 96s vs CPU's 5s.
𝕏
-
llama.cpp crushes OpenVINO backends at 22 tok/s.
𝕏
-
Special export flags (--sym, group-size 128) are mandatory for NPU success.
𝕏
The 60-Second TL;DR
- NPU load times crush usability at 96s vs CPU's 5s.
- llama.cpp crushes OpenVINO backends at 22 tok/s.
- Special export flags (--sym, group-size 128) are mandatory for NPU success.
Published by
theAIcatchup
Ship faster. Build smarter.
Worth sharing?
Get the best Developer Tools stories of the week in your inbox — no noise, no spam.