☁️ Cloud & Infrastructure

Gemma 4 Tool Calling Saved in llama.cpp, But NVIDIA's cuBLAS Bug Torpedoes RTX Speed

Ever wonder why your local Gemma 4 model chokes on tools? Or why your RTX card runs like it's hungover? This week's updates fix one — and expose the other.

llama.cpp Gemma 4 fixes code screenshot with RTX GPU performance graph showing cuBLAS bug

⚡ Key Takeaways

  • llama.cpp's PR #21697 and Google templates fix Gemma 4's tool calling and reasoning for local use. 𝕏
  • NVIDIA cuBLAS MatMul bug slashes RTX perf by 60% on FP32 batches — killer for LLM inference. 𝕏
  • AmicoScript delivers local Whisper-Ollama audio processing: transcribe, diarize, summarize offline. 𝕏
Published by

theAIcatchup

Ship faster. Build smarter.

Worth sharing?

Get the best Developer Tools stories of the week in your inbox — no noise, no spam.

Originally reported by dev.to

Stay in the loop

The week's most important stories from theAIcatchup, delivered once a week.