Gemma 4 Crumbles to Yesterday's Jailbreak — Zero-Shot Transfer Strikes Again
Imagine crafting a jailbreak for an AI model, only to find it slices through the next version like a hot knife through yesterday's butter. That's zero-shot attack transfer hitting Gemma 4 right out of the gate.
DevTools FeedApr 03, 20264 min read10 views
⚡ Key Takeaways
Zero-shot jailbreaks from Gemma 3 transfer untouched to Gemma 4, highlighting stagnant safety.𝕏
Responsible disclosure fails even with self-censorship, as AI filters confuse research with harm.𝕏
This predicts a shift to continuous, agile safety auditing to match rapid model releases.𝕏
The 60-Second TL;DR
Zero-shot jailbreaks from Gemma 3 transfer untouched to Gemma 4, highlighting stagnant safety.
Responsible disclosure fails even with self-censorship, as AI filters confuse research with harm.
This predicts a shift to continuous, agile safety auditing to match rapid model releases.