Tag: cs.CR

#cs.CR

Academic · 1 min

XMark: Reliable Multi-Bit Watermarking for LLM-Generated Texts

arXiv:2604.05242v1 Announce Type: new Abstract: Multi-bit watermarking has emerged as a promising solution for embedding imperceptible binary messages into Large Language Model (LLM)-generated text, enabling …

Jiahao Xu, Rui Hu, Olivera Kotevska, Zikai Zhang
9 views
Academic · 1 min

Learning the Signature of Memorization in Autoregressive Language Models

arXiv:2604.03199v1 Announce Type: new Abstract: All prior membership inference attacks for fine-tuned language models use hand-crafted heuristics (e.g., loss thresholding, Min-K\%, reference calibration), each bounded …

David Ili\'c, Kostadin Cvejoski, David Stanojevi\'c, Evgeny Grigorenko
15 views
Academic · 1 min

UK AISI Alignment Evaluation Case-Study

arXiv:2604.00788v1 Announce Type: new Abstract: This technical report presents methods developed by the UK AI Security Institute for assessing whether advanced AI systems reliably follow …

Alexandra Souly, Robert Kirk, Jacob Merizian, Abby D'Cruz, Xander Davies
62 views
Academic · 1 min

Internal Safety Collapse in Frontier Large Language Models

arXiv:2603.23509v1 Announce Type: new Abstract: This work identifies a critical failure mode in frontier large language models (LLMs), which we term Internal Safety Collapse (ISC): …

Yutao Wu, Xiao Liu, Yifeng Gao, Xiang Zheng, Hanxun Huang, Yige Li, Cong Wang, Bo Li, Xingjun Ma, Yu-Gang Jiang
88 views
Academic · 1 min

RedacBench: Can AI Erase Your Secrets?

arXiv:2603.20208v1 Announce Type: new Abstract: Modern language models can readily extract sensitive information from unstructured text, making redaction -- the selective removal of such information …

Hyunjun Jeon, Kyuyoung Kim, Jinwoo Shin
69 views
Academic · 1 min

MAPLE: Metadata Augmented Private Language Evolution

arXiv:2603.19258v1 Announce Type: cross Abstract: While differentially private (DP) fine-tuning of large language models (LLMs) is a powerful tool, it is often computationally prohibitive or …

Eli Chien, Yuzheng Hu, Ryan McKenna, Shanshan Wu, Zheng Xu, Peter Kairouz
36 views