HyFI: Hyperbolic Feature Interpolation for Brain-Vision Alignment
arXiv:2603.22721v1 Announce Type: new Abstract: Recent progress in artificial intelligence has encouraged numerous attempts to understand and decode human visual system from brain signals. These prior works typically align neural activity independently with semantic and perceptual features extracted from images...
Between Rules and Reality: On the Context Sensitivity of LLM Moral Judgment
arXiv:2603.23114v1 Announce Type: new Abstract: A human's moral decision depends heavily on the context. Yet research on LLM morality has largely studied fixed scenarios. We address this gap by introducing Contextual MoralChoice, a dataset of moral dilemmas with systematic contextual...
The Efficiency Attenuation Phenomenon: A Computational Challenge to the Language of Thought Hypothesis
arXiv:2603.22312v1 Announce Type: new Abstract: This paper computationally investigates whether thought requires a language-like format, as posited by the Language of Thought (LoT) hypothesis. We introduce the ``AI Private Language'' thought experiment: if two artificial agents develop an efficient, inscrutable...
PRISM: A Dual View of LLM Reasoning through Semantic Flow and Latent Computation
arXiv:2603.22754v1 Announce Type: new Abstract: Large language models (LLMs) solve complex problems by generating multi-step reasoning traces. Yet these traces are typically analyzed from only one of two perspectives: the sequence of tokens across different reasoning steps in the generated...
LLM Olympiad: Why Model Evaluation Needs a Sealed Exam
arXiv:2603.23292v1 Announce Type: new Abstract: Benchmarks and leaderboards are how NLP most often communicates progress, but in the LLM era they are increasingly easy to misread. Scores can reflect benchmark-chasing, hidden evaluation choices, or accidental exposure to test content --...
AgriPestDatabase-v1.0: A Structured Insect Dataset for Training Agricultural Large Language Model
arXiv:2603.22777v1 Announce Type: new Abstract: Agricultural pest management increasingly relies on timely and accurate access to expert knowledge, yet high quality labeled data and continuous expert support remain limited, particularly for farmers operating in rural regions with unstable/no internet connectivity....
PersonalQ: Select, Quantize, and Serve Personalized Diffusion Models for Efficient Inference
arXiv:2603.22943v1 Announce Type: new Abstract: Personalized text-to-image generation lets users fine-tune diffusion models into repositories of concept-specific checkpoints, but serving these repositories efficiently is difficult for two reasons: natural-language requests are often ambiguous and can be misrouted to visually similar...
RelayS2S: A Dual-Path Speculative Generation for Real-Time Dialogue
arXiv:2603.23346v1 Announce Type: new Abstract: Real-time spoken dialogue systems face a fundamental tension between latency and response quality. End-to-end speech-to-speech (S2S) models respond immediately and naturally handle turn-taking, backchanneling, and interruption, but produce semantically weaker outputs. Cascaded pipelines (ASR ->...
Optimizing Small Language Models for NL2SQL via Chain-of-Thought Fine-Tuning
arXiv:2603.22942v1 Announce Type: new Abstract: Translating Natural Language to SQL (NL2SQL) remains a critical bottleneck for democratization of data in enterprises. Although Large Language Models (LLMs) like Gemini 2.5 and other LLMs have demonstrated impressive zero-shot capabilities, their high inference...
How Utilitarian Are OpenAI's Models Really? Replicating and Reinterpreting Pfeffer, Kr\"ugel, and Uhl (2025)
arXiv:2603.22730v1 Announce Type: new Abstract: Pfeffer, Kr\"ugel, and Uhl (2025) report that OpenAI's reasoning model o1-mini produces more utilitarian responses to the trolley problem and footbridge dilemma than the non-reasoning model GPT-4o. I replicate their study with four current OpenAI...
MedCausalX: Adaptive Causal Reasoning with Self-Reflection for Trustworthy Medical Vision-Language Models
arXiv:2603.23085v1 Announce Type: new Abstract: Vision-Language Models (VLMs) have enabled interpretable medical diagnosis by integrating visual perception with linguistic reasoning. Yet, existing medical chain-of-thought (CoT) models lack explicit mechanisms to represent and enforce causal reasoning, leaving them vulnerable to spurious...
Minibal: Balanced Game-Playing Without Opponent Modeling
arXiv:2603.23059v1 Announce Type: new Abstract: Recent advances in game AI, such as AlphaZero and Ath\'enan, have achieved superhuman performance across a wide range of board games. While highly powerful, these agents are ill-suited for human-AI interaction, as they consistently overwhelm...
Improving LLM Predictions via Inter-Layer Structural Encoders
arXiv:2603.22665v1 Announce Type: new Abstract: The standard practice in Large Language Models (LLMs) is to base predictions on the final-layer token representations. Recent studies, however, show that intermediate layers encode substantial information, which may contain more task-relevant features than the...
KALAVAI: Predicting When Independent Specialist Fusion Works -- A Quantitative Model for Post-Hoc Cooperative LLM Training
arXiv:2603.22755v1 Announce Type: new Abstract: Independently trained domain specialists can be fused post-hoc into a single model that outperforms any individual specialist, and the gain is predictable: gain = 0.82 x divergence - 2.72 (R^2 = 0.856, n=6, 3-26% divergence)....
DALDALL: Data Augmentation for Lexical and Semantic Diverse in Legal Domain by leveraging LLM-Persona
arXiv:2603.22765v1 Announce Type: new Abstract: Data scarcity remains a persistent challenge in low-resource domains. While existing data augmentation methods leverage the generative capabilities of large language models (LLMs) to produce large volumes of synthetic data, these approaches often prioritize quantity...
Efficient Hallucination Detection: Adaptive Bayesian Estimation of Semantic Entropy with Guided Semantic Exploration
arXiv:2603.22812v1 Announce Type: new Abstract: Large language models (LLMs) have achieved remarkable success in various natural language processing tasks, yet they remain prone to generating factually incorrect outputs known as hallucinations. While recent approaches have shown promise for hallucination detection...
RadTimeline: Timeline Summarization for Longitudinal Radiological Lung Findings
arXiv:2603.22820v1 Announce Type: new Abstract: Tracking findings in longitudinal radiology reports is crucial for accurately identifying disease progression, and the time-consuming process would benefit from automatic summarization. This work introduces a structured summarization task, where we frame longitudinal report summarization...
Analysing LLM Persona Generation and Fairness Interpretation in Polarised Geopolitical Contexts
arXiv:2603.22837v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly utilised for social simulation and persona generation, necessitating an understanding of how they represent geopolitical identities. In this paper, we analyse personas generated for Palestinian and Israeli identities by...
EchoKV: Efficient KV Cache Compression via Similarity-Based Reconstruction
arXiv:2603.22910v1 Announce Type: new Abstract: The increasing memory demand of the Key-Value (KV) cache poses a significant bottleneck for Large Language Models (LLMs) in long-context applications. Existing low-rank compression methods often rely on irreversible parameter transformations, sacrificing the flexibility to...
Multilingual KokoroChat: A Multi-LLM Ensemble Translation Method for Creating a Multilingual Counseling Dialogue Dataset
arXiv:2603.22913v1 Announce Type: new Abstract: To address the critical scarcity of high-quality, publicly available counseling dialogue datasets, we created Multilingual KokoroChat by translating KokoroChat, a large-scale manually authored Japanese counseling corpus, into both English and Chinese. A key challenge in...
Set-Valued Prediction for Large Language Models with Feasibility-Aware Coverage Guarantees
arXiv:2603.22966v1 Announce Type: new Abstract: Large language models (LLMs) inherently operate over a large generation space, yet conventional usage typically reports the most likely generation (MLG) as a point prediction, which underestimates the model's capability: although the top-ranked response can...
PaperVoyager : Building Interactive Web with Visual Language Models
arXiv:2603.22999v1 Announce Type: new Abstract: Recent advances in visual language models have enabled autonomous agents for complex reasoning, tool use, and document understanding. However, existing document agents mainly transform papers into static artifacts such as summaries, webpages, or slides, which...
When Language Models Lose Their Mind: The Consequences of Brain Misalignment
arXiv:2603.23091v1 Announce Type: new Abstract: While brain-aligned large language models (LLMs) have garnered attention for their potential as cognitive models and for potential for enhanced safety and trustworthiness in AI, the role of this brain alignment for linguistic competence remains...
HGNet: Scalable Foundation Model for Automated Knowledge Graph Generation from Scientific Literature
arXiv:2603.23136v1 Announce Type: new Abstract: Automated knowledge graph (KG) construction is essential for navigating the rapidly expanding body of scientific literature. However, existing approaches struggle to recognize long multi-word entities, often fail to generalize across domains, and typically overlook the...
From Synthetic to Native: Benchmarking Multilingual Intent Classification in Logistics Customer Service
arXiv:2603.23172v1 Announce Type: new Abstract: Multilingual intent classification is central to customer-service systems on global logistics platforms, where models must process noisy user queries across languages and hierarchical label spaces. Yet most existing multilingual benchmarks rely on machine-translated text, which...
Beyond Hard Constraints: Budget-Conditioned Reachability For Safe Offline Reinforcement Learning
arXiv:2603.22292v1 Announce Type: new Abstract: Sequential decision making using Markov Decision Process underpins many realworld applications. Both model-based and model free methods have achieved strong results in these settings. However, real-world tasks must balance reward maximization with safety constraints, often...
Efficient Embedding-based Synthetic Data Generation for Complex Reasoning Tasks
arXiv:2603.22294v1 Announce Type: new Abstract: Synthetic Data Generation (SDG), leveraging Large Language Models (LLMs), has recently been recognized and broadly adopted as an effective approach to improve the performance of smaller but more resource and compute efficient LLMs through fine-tuning....
Between the Layers Lies the Truth: Uncertainty Estimation in LLMs Using Intra-Layer Local Information Scores
arXiv:2603.22299v1 Announce Type: new Abstract: Large language models (LLMs) are often confidently wrong, making reliable uncertainty estimation (UE) essential. Output-based heuristics are cheap but brittle, while probing internal representations is effective yet high-dimensional and hard to transfer. We propose a...
Latent Semantic Manifolds in Large Language Models
arXiv:2603.22301v1 Announce Type: new Abstract: Large Language Models (LLMs) perform internal computations in continuous vector spaces yet produce discrete tokens -- a fundamental mismatch whose geometric consequences remain poorly understood. We develop a mathematical framework that interprets LLM hidden states...
Sample Transform Cost-Based Training-Free Hallucination Detector for Large Language Models
arXiv:2603.22303v1 Announce Type: new Abstract: Hallucinations in large language models (LLMs) remain a central obstacle to trustworthy deployment, motivating detectors that are accurate, lightweight, and broadly applicable. Since an LLM with a prompt defines a conditional distribution, we argue that...