A Lightweight Explainable Guardrail for Prompt Safety
arXiv:2602.15853v1 Announce Type: cross Abstract: We propose a lightweight explainable guardrail (LEG) method for the classification of unsafe prompts. LEG uses a multi-task learning architecture …
Md Asiful Islam, Mihai Surdeanu
9 views