Does Intel NPU run LLMs faster than CPU?

No — in my tests, CPU was quicker overall, especially with instant loads via llama.cpp.

How to run LLMs on Intel NPU without crashing?

Use optimum-cli with --sym --ratio 1.0 --group-size 128, then openvino-genai.LLMPipeline on "NPU" device.

Not yet for LLMs — great for light tasks, but CPU rules inference speed.

Intel promises NPUs will turbocharge on-device AI, but my ThinkPad test? A 96-second model load slog where CPU smoked it. Here's the raw truth.

theAIcatchup Apr 10, 2026 4 min read

Published by

Ship faster. Build smarter.

#Core Ultra #Intel NPU #LLM inference #OpenVINO #llama.cpp

Get the best Developer Tools stories of the week in your inbox — no noise, no spam.

Originally reported by dev.to