What are the Gemma 4 tool calling fixes in llama.cpp?

Pull request #21697 fixes reasoning budget; Google's templates sort 31B tool chats. Essential for local reliability.

How bad is the cuBLAS MatMul bug on RTX GPUs?

60% perf loss on FP32 batched ops. Hits all RTX; inference crawls. Patch pending.

What's AmicoScript and how do I use it?

Local Whisper + Ollama UI for transcription, diarization, summaries. Clone repo, Docker up — privacy-first audio magic.

Gemma 4 Tool Calling Saved in llama.cpp, But NVIDIA's cuBLAS Bug Torpedoes RTX Speed

Ever wonder why your local Gemma 4 model chokes on tools? Or why your RTX card runs like it's hungover? This week's updates fix one — and expose the other.

theAIcatchup Apr 10, 2026 4 min read

llama.cpp Gemma 4 fixes code screenshot with RTX GPU performance graph showing cuBLAS bug

⚡ Key Takeaways

llama.cpp's PR #21697 and Google templates fix Gemma 4's tool calling and reasoning for local use. 𝕏
NVIDIA cuBLAS MatMul bug slashes RTX perf by 60% on FP32 batches — killer for LLM inference. 𝕏
AmicoScript delivers local Whisper-Ollama audio processing: transcribe, diarize, summarize offline. 𝕏

Published by

theAIcatchup

Ship faster. Build smarter.

#Gemma 4 #Ollama Whisper #cuBLAS bug #llama.cpp

Worth sharing?

Get the best Developer Tools stories of the week in your inbox — no noise, no spam.

Originally reported by dev.to

⚡ Key Takeaways

The 60-Second TL;DR

theAIcatchup

Share this article

Worth sharing?

Related Stories

Gemma 4 at 21 tok/s on Ryzen Mini PC: Vulkan's Messy Win

73% Success: Why Tiny LLMs Crush Code Edits But Flop at Writing From Scratch

Open Source AI Crushes Proprietaries in 2026 Showdown

Gemma 4 Drops Agentic Brains onto Edge Hardware

Stay in the loop