ASGuard: Activation-Scaling Guard to Mitigate Targeted Jailbreaking Attack
Paper • 2509.25843 • Published • 19
None defined yet.
ASGuard: Activation-Scaling Guard to Mitigate Targeted Jailbreaking Attack
The Curious Case of Analogies: Investigating Analogical Reasoning in Large Language Models