Gemma 4 Tool Calling Saved in llama.cpp, But NVIDIA's cuBLAS Bug Torpedoes RTX Speed
Ever wonder why your local Gemma 4 model chokes on tools? Or why your RTX card runs like it's hungover? This week's updates fix one — and expose the other.
theAIcatchupApr 10, 20264 min read
⚡ Key Takeaways
llama.cpp's PR #21697 and Google templates fix Gemma 4's tool calling and reasoning for local use.𝕏
NVIDIA cuBLAS MatMul bug slashes RTX perf by 60% on FP32 batches — killer for LLM inference.𝕏
AmicoScript delivers local Whisper-Ollama audio processing: transcribe, diarize, summarize offline.𝕏
The 60-Second TL;DR
llama.cpp's PR #21697 and Google templates fix Gemma 4's tool calling and reasoning for local use.
NVIDIA cuBLAS MatMul bug slashes RTX perf by 60% on FP32 batches — killer for LLM inference.
AmicoScript delivers local Whisper-Ollama audio processing: transcribe, diarize, summarize offline.