DIALEVAL: Automated Type-Theoretic Evaluation of LLM Instruction Following
arXiv:2603.03321v1 Announce Type: cross Abstract: Evaluating instruction following in Large Language Models requires decomposing instructions into verifiable requirements and assessing satisfaction--tasks currently dependent on manual …
Nardine Basta, Dali Kaafar
4 views