CWoMP: Morpheme Representation Learning for Interlinear Glossing
arXiv:2603.18184v1 Announce Type: new Abstract: Interlinear glossed text (IGT) is a standard notation for language documentation which is linguistically rich but laborious to produce manually. Recent automated IGT methods treat glosses as character sequences, neglecting their compositional structure. We propose...
How Psychological Learning Paradigms Shaped and Constrained Artificial Intelligence
arXiv:2603.18203v1 Announce Type: new Abstract: The dominant paradigms of artificial intelligence were shaped by learning theories from psychology: behaviorism inspired reinforcement learning, cognitivism gave rise to deep learning and memory-augmented architectures, and constructivism influenced curriculum learning and compositional approaches. This...
From Noise to Signal: When Outliers Seed New Topics
arXiv:2603.18358v1 Announce Type: new Abstract: Outliers in dynamic topic modeling are typically treated as noise, yet we show that some can serve as early signals of emerging topics. We introduce a temporal taxonomy of news-document trajectories that defines how documents...
Synthetic Data Generation for Training Diversified Commonsense Reasoning Models
arXiv:2603.18361v1 Announce Type: new Abstract: Conversational agents are required to respond to their users not only with high quality (i.e. commonsense bearing) responses, but also considering multiple plausible alternative scenarios, reflecting the diversity in their responses. Despite the growing need...
PowerFlow: Unlocking the Dual Nature of LLMs via Principled Distribution Matching
arXiv:2603.18363v1 Announce Type: new Abstract: Unsupervised Reinforcement Learning from Internal Feedback (RLIF) has emerged as a promising paradigm for eliciting the latent capabilities of Large Language Models (LLMs) without external supervision. However, current methods rely on heuristic intrinsic rewards, which...
AutoScreen-FW: An LLM-based Framework for Resume Screening
arXiv:2603.18390v1 Announce Type: new Abstract: Corporate recruiters often need to screen many resumes within a limited time, which increases their burden and may cause suitable candidates to be overlooked. To address these challenges, prior work has explored LLM-based automated resume...
TopoChunker: Topology-Aware Agentic Document Chunking Framework
arXiv:2603.18409v1 Announce Type: new Abstract: Current document chunking methods for Retrieval-Augmented Generation (RAG) typically linearize text. This forced linearization strips away intrinsic topological hierarchies, creating ``semantic fragmentation'' that degrades downstream retrieval quality. In this paper, we propose TopoChunker, an agentic...
Multimodal Task Interference: A Benchmark and Analysis of History-Target Mismatch in Multimodal LLMs
arXiv:2603.18425v1 Announce Type: new Abstract: Task interference, the performance degradation caused by task switches within a single conversation, has been studied exclusively in text-only settings despite the growing prevalence of multimodal dialogue systems. We introduce a benchmark for evaluating this...
The Truncation Blind Spot: How Decoding Strategies Systematically Exclude Human-Like Token Choices
arXiv:2603.18482v1 Announce Type: new Abstract: Standard decoding strategies for text generation, including top-k, nucleus sampling, and contrastive search, select tokens based on likelihood, restricting selection to high-probability regions. Human language production operates differently: tokens are chosen for communicative appropriateness rather...
When Names Change Verdicts: Intervention Consistency Reveals Systematic Bias in LLM Decision-Making
arXiv:2603.18530v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used for high-stakes decisions, yet their susceptibility to spurious features remains poorly characterized. We introduce ICE-Guard, a framework applying intervention consistency testing to detect three types of spurious feature...
Cross-Lingual LLM-Judge Transfer via Evaluation Decomposition
arXiv:2603.18557v1 Announce Type: new Abstract: As large language models are increasingly deployed across diverse real-world applications, extending automated evaluation beyond English has become a critical challenge. Existing evaluation approaches are predominantly English-focused, and adapting them to other languages is hindered...
ICE: Intervention-Consistent Explanation Evaluation with Statistical Grounding for LLMs
arXiv:2603.18579v1 Announce Type: new Abstract: Evaluating whether explanations faithfully reflect a model's reasoning remains an open problem. Existing benchmarks use single interventions without statistical testing, making it impossible to distinguish genuine faithfulness from chance-level performance. We introduce ICE (Intervention-Consistent Explanation),...
Language Model Maps for Prompt-Response Distributions via Log-Likelihood Vectors
arXiv:2603.18593v1 Announce Type: new Abstract: We propose a method that represents language models by log-likelihood vectors over prompt-response pairs and constructs model maps for comparing their conditional distributions. In this space, distances between models approximate the KL divergence between the...
A Comparative Empirical Study of Catastrophic Forgetting Mitigation in Sequential Task Adaptation for Continual Natural Language Processing Systems
arXiv:2603.18641v1 Announce Type: new Abstract: Neural language models deployed in real-world applications must continually adapt to new tasks and domains without forgetting previously acquired knowledge. This work presents a comparative empirical study of catastrophic forgetting mitigation in continual intent classification....
Automatic detection of Gen-AI texts: A comparative framework of neural models
arXiv:2603.18750v1 Announce Type: new Abstract: The rapid proliferation of Large Language Models has significantly increased the difficulty of distinguishing between human-written and AI generated texts, raising critical issues across academic, editorial, and social domains. This paper investigates the problem of...
Progressive Training for Explainable Citation-Grounded Dialogue: Reducing Hallucination to Zero in English-Hindi LLMs
arXiv:2603.18911v1 Announce Type: new Abstract: Knowledge-grounded dialogue systems aim to generate informative, contextually relevant responses by conditioning on external knowledge sources. However, most existing approaches focus exclusively on English, lack explicit citation mechanisms for verifying factual claims, and offer limited...
RADIUS: Ranking, Distribution, and Significance - A Comprehensive Alignment Suite for Survey Simulation
arXiv:2603.19002v1 Announce Type: new Abstract: Simulation of surveys using LLMs is emerging as a powerful application for generating human-like responses at scale. Prior work evaluates survey simulation using metrics borrowed from other domains, which are often ad hoc, fragmented, and...
Engineering Verifiable Modularity in Transformers via Per-Layer Supervision
arXiv:2603.18029v1 Announce Type: new Abstract: Transformers resist surgical control. Ablating an attention head identified as critical for capitalization produces minimal behavioral change because distributed redundancy compensates for damage. This Hydra effect renders interpretability illusory: we may identify components through correlation,...
InfoMamba: An Attention-Free Hybrid Mamba-Transformer Model
arXiv:2603.18031v1 Announce Type: new Abstract: Balancing fine-grained local modeling with long-range dependency capture under computational constraints remains a central challenge in sequence modeling. While Transformers provide strong token mixing, they suffer from quadratic complexity, whereas Mamba-style selective state-space models (SSMs)...
MST-Direct: Matching via Sinkhorn Transport for Multivariate Geostatistical Simulation with Complex Non-Linear Dependencies
arXiv:2603.18036v1 Announce Type: new Abstract: Multivariate geostatistical simulation requires the faithful reproduction of complex non-linear dependencies among geological variables, including bimodal distributions, step functions, and heteroscedastic relationships. Traditional methods such as the Gaussian Copula and LU Decomposition assume linear correlation...
Adapting Methods for Domain-Specific Japanese Small LMs: Scale, Architecture, and Quantization
arXiv:2603.18037v1 Announce Type: new Abstract: This paper presents a systematic methodology for building domain-specific Japanese small language models using QLoRA fine-tuning. We address three core questions: optimal training scale, base-model selection, and architecture-aware quantization. Stage 1 (Training scale): Scale-learning experiments...
Quotient Geometry and Persistence-Stable Metrics for Swarm Configurations
arXiv:2603.18041v1 Announce Type: new Abstract: Swarm and constellation reconfiguration can be viewed as motion of an unordered point configuration in an ambient space. Here, we provide persistence-stable, symmetry-invariant geometric representations for comparing and monitoring multi-agent configuration data. We introduce a...
SLEA-RL: Step-Level Experience Augmented Reinforcement Learning for Multi-Turn Agentic Training
arXiv:2603.18079v1 Announce Type: new Abstract: Large Language Model (LLM) agents have shown strong results on multi-turn tool-use tasks, yet they operate in isolation during training, failing to leverage experiences accumulated across episodes. Existing experience-augmented methods address this by organizing trajectories...
Probabilistic Federated Learning on Uncertain and Heterogeneous Data with Model Personalization
arXiv:2603.18083v1 Announce Type: new Abstract: Conventional federated learning (FL) frameworks often suffer from training degradation due to data uncertainty and heterogeneity across local clients. Probabilistic approaches such as Bayesian neural networks (BNNs) can mitigate this issue by explicitly modeling uncertainty,...
BoundAD: Boundary-Aware Negative Generation for Time Series Anomaly Detection
arXiv:2603.18111v1 Announce Type: new Abstract: Contrastive learning methods for time series anomaly detection (TSAD) heavily depend on the quality of negative sample construction. However, existing strategies based on random perturbations or pseudo-anomaly injection often struggle to simultaneously preserve temporal semantic...
VC-Soup: Value-Consistency Guided Multi-Value Alignment for Large Language Models
arXiv:2603.18113v1 Announce Type: new Abstract: As large language models (LLMs) increasingly shape content generation, interaction, and decision-making across the Web, aligning them with human values has become a central objective in trustworthy AI. This challenge becomes even more pronounced when...
LLM-Augmented Computational Phenotyping of Long Covid
arXiv:2603.18115v1 Announce Type: new Abstract: Phenotypic characterization is essential for understanding heterogeneity in chronic diseases and for guiding personalized interventions. Long COVID, a complex and persistent condition, yet its clinical subphenotypes remain poorly understood. In this work, we propose an...
Discovering What You Can Control: Interventional Boundary Discovery for Reinforcement Learning
arXiv:2603.18257v1 Announce Type: new Abstract: Selecting relevant state dimensions in the presence of confounded distractors is a causal identification problem: observational statistics alone cannot reliably distinguish dimensions that correlate with actions from those that actions cause. We formalize this as...
Sharpness-Aware Minimization in Logit Space Efficiently Enhances Direct Preference Optimization
arXiv:2603.18258v1 Announce Type: new Abstract: Direct Preference Optimization (DPO) has emerged as a popular algorithm for aligning pretrained large language models with human preferences, owing to its simplicity and training stability. However, DPO suffers from the recently identified squeezing effect...
Detection Is Cheap, Routing Is Learned: Why Refusal-Based Alignment Evaluation Fails
arXiv:2603.18280v1 Announce Type: new Abstract: Current alignment evaluation mostly measures whether models encode dangerous concepts and whether they refuse harmful requests. Both miss the layer where alignment often operates: routing from concept detection to behavioral policy. We study political censorship...