TurboQuant on MacBook: One-Command Local LLM Stack
Forget cramming 70B models onto your MacBook. TurboQuant targets the real killer: exploding KV caches during inference. Here's the dead-simple stack to make it work.
theAIcatchupApr 09, 20264 min read
⚡ Key Takeaways
TurboQuant compresses KV cache, not model weights—key for long-context local LLMs.𝕏
One-command installer sets up Ollama + MLX sidecar + routing proxy on Apple Silicon.𝕏
Smart proxy routes short prompts to Ollama, long ones to TurboQuant—zero client changes.𝕏
The 60-Second TL;DR
TurboQuant compresses KV cache, not model weights—key for long-context local LLMs.
One-command installer sets up Ollama + MLX sidecar + routing proxy on Apple Silicon.
Smart proxy routes short prompts to Ollama, long ones to TurboQuant—zero client changes.