Understanding Moral Reasoning Trajectories in Large Language Models: Toward Probing-Based Explainability
arXiv:2603.16017v1 Announce Type: new Abstract: Large language models (LLMs) increasingly participate in morally sensitive decision-making, yet how they organize ethical frameworks across reasoning steps remains …
Fan Huang, Haewoon Kwak, Jisun An
14 views