Testing the Limits of Truth Directions in LLMs
arXiv:2604.03754v1 Announce Type: new Abstract: Large language models (LLMs) have been shown to encode truth of statements in their activation space along a linear truth …
Angelos Poulis, Mark Crovella, Evimaria Terzi
20 views