This platform requires JavaScript for full functionality. Please enable JavaScript in your browser settings.

Quality follows upgrading

Angelika Romanou, Mark Ibrahim, Candace Ross, Chantal Shaib, Kerem Okta, Sam Bell, Elia Ovalle, Jesse Dodge, Antoine Bosselut, Koustuv Sinha, Adina Williams

Articles by Angelika Romanou, Mark Ibrahim, Candace Ross, Chantal Shaib, Kerem Okta, Sam Bell, Elia Ovalle, Jesse Dodge, Antoine Bosselut, Koustuv Sinha, Adina Williams

Academic · 1 min

Brittlebench: Quantifying LLM robustness via prompt sensitivity

arXiv:2603.13285v1 Announce Type: new Abstract: Existing evaluation methods largely rely on clean, static benchmarks, which can overestimate true model performance by failing to capture the …

26 views Mar 17

Angelika Romanou, Mark Ibrahim, Candace Ross, Chantal Shaib, Kerem Okta, Sam Bell, Elia Ovalle, Jesse Dodge, Antoine Bosselut, Koustuv Sinha, Adina Williams

Articles by Angelika Romanou, Mark Ibrahim, Candace Ross, Chantal Shaib, Kerem Okta, Sam Bell, Elia Ovalle, Jesse Dodge, Antoine Bosselut, Koustuv Sinha, Adina Williams

Brittlebench: Quantifying LLM robustness via prompt sensitivity

JCG, PC

HSOLLC Co., Ltd.