Skip to main content

Tag: cs.CR

#cs.CR

Academic · 1 min

Manifold of Failure: Behavioral Attraction Basins in Language Models

arXiv:2602.22291v1 Announce Type: new Abstract: While prior work has focused on projecting adversarial examples back onto the manifold of natural data to restore safety, we …

Sarthak Munshi, Manish Bhatt, Vineeth Sai Narajala, Idan Habler, AmmarnAl-Kahfah, Ken Huang, Blake Gatto
7 views
Academic · 1 min

TFL: Targeted Bit-Flip Attack on Large Language Model

arXiv:2602.17837v1 Announce Type: cross Abstract: Large language models (LLMs) are increasingly deployed in safety and security critical applications, raising concerns about their robustness to model …

Jingkai Guo, Chaitali Chakrabarti, Deliang Fan
7 views