Why Are Linear RNNs More Parallelizable?
arXiv:2603.03612v1 Announce Type: new Abstract: The community is increasingly exploring linear RNNs (LRNNs) as language models, motivated by their expressive power and parallelizability. While prior work establishes the expressivity benefits of LRNNs over transformers, it is unclear what makes LRNNs...
JANUS: Structured Bidirectional Generation for Guaranteed Constraints and Analytical Uncertainty
arXiv:2603.03748v1 Announce Type: new Abstract: High-stakes synthetic data generation faces a fundamental Quadrilemma: achieving Fidelity to the original distribution, Control over complex logical constraints, Reliability in uncertainty estimation, and Efficiency in computational cost -- simultaneously. State-of-the-art Deep Generative Models (CTGAN,...
MOOSE-Star: Unlocking Tractable Training for Scientific Discovery by Breaking the Complexity Barrier
arXiv:2603.03756v1 Announce Type: new Abstract: While large language models (LLMs) show promise in scientific discovery, existing research focuses on inference or feedback-driven training, leaving the direct modeling of the generative reasoning process, $P(\text{hypothesis}|\text{background})$ ($P(h|b)$), unexplored. We demonstrate that directly training...
Relational In-Context Learning via Synthetic Pre-training with Structural Prior
arXiv:2603.03805v1 Announce Type: new Abstract: Relational Databases (RDBs) are the backbone of modern business, yet they lack foundation models comparable to those in text or vision. A key obstacle is that high-quality RDBs are private, scarce and structurally heterogeneous, making...
ITLC at SemEval-2026 Task 11: Normalization and Deterministic Parsing for Formal Reasoning in LLMs
arXiv:2603.02676v1 Announce Type: new Abstract: Large language models suffer from content effects in reasoning tasks, particularly in multi-lingual contexts. We introduce a novel method that reduces these biases through explicit structural abstraction that transforms syllogisms into canonical logical representations and...
HateMirage: An Explainable Multi-Dimensional Dataset for Decoding Faux Hate and Subtle Online Abuse
arXiv:2603.02684v1 Announce Type: new Abstract: Subtle and indirect hate speech remains an underexplored challenge in online safety research, particularly when harmful intent is embedded within misleading or manipulative narratives. Existing hate speech datasets primarily capture overt toxicity, underrepresenting the nuanced...
Faster, Cheaper, More Accurate: Specialised Knowledge Tracing Models Outperform LLMs
arXiv:2603.02830v1 Announce Type: new Abstract: Predicting future student responses to questions is particularly valuable for educational learning platforms where it enables effective interventions. One of the key approaches to do this has been through the use of knowledge tracing (KT)...
PrivMedChat: End-to-End Differentially Private RLHF for Medical Dialogue Systems
arXiv:2603.03054v1 Announce Type: new Abstract: Large language models are increasingly used for patient-facing medical assistance and clinical decision support, but adapting them to clinical dialogue often requires supervision derived from doctor-patient conversations that may contain sensitive information. Conventional supervised fine-tuning...
Evaluating Performance Drift from Model Switching in Multi-Turn LLM Systems
arXiv:2603.03111v1 Announce Type: new Abstract: Deployed multi-turn LLM systems routinely switch models mid-interaction due to upgrades, cross-provider routing, and fallbacks. Such handoffs create a context mismatch: the model generating later turns must condition on a dialogue prefix authored by a...
Using Learning Progressions to Guide AI Feedback for Science Learning
arXiv:2603.03249v1 Announce Type: new Abstract: Generative artificial intelligence (AI) offers scalable support for formative feedback, yet most AI-generated feedback relies on task-specific rubrics authored by domain experts. While effective, rubric authoring is time-consuming and limits scalability across instructional contexts. Learning...
MUSE: A Run-Centric Platform for Multimodal Unified Safety Evaluation of Large Language Models
arXiv:2603.02482v1 Announce Type: cross Abstract: Safety evaluation and red-teaming of large language models remain predominantly text-centric, and existing frameworks lack the infrastructure to systematically test whether alignment generalizes to audio, image, and video inputs. We present MUSE (Multimodal Unified Safety...
Through the Lens of Contrast: Self-Improving Visual Reasoning in VLMs
arXiv:2603.02556v1 Announce Type: cross Abstract: Reasoning has emerged as a key capability of large language models. In linguistic tasks, this capability can be enhanced by self-improving techniques that refine reasoning paths for subsequent finetuning. However, extending these language-based self-improving approaches...
Concept Heterogeneity-aware Representation Steering
arXiv:2603.02237v1 Announce Type: new Abstract: Representation steering offers a lightweight mechanism for controlling the behavior of large language models (LLMs) by intervening on internal activations at inference time. Most existing methods rely on a single global steering direction, typically obtained...
EdgeFLow: Serverless Federated Learning via Sequential Model Migration in Edge Networks
arXiv:2603.02562v1 Announce Type: new Abstract: Federated Learning (FL) has emerged as a transformative distributed learning paradigm in the era of Internet of Things (IoT), reconceptualizing data processing methodologies. However, FL systems face significant communication bottlenecks due to inevitable client-server data...
CoMoL: Efficient Mixture of LoRA Experts via Dynamic Core Space Merging
arXiv:2603.00573v1 Announce Type: new Abstract: Large language models (LLMs) achieve remarkable performance on diverse downstream and domain-specific tasks via parameter-efficient fine-tuning (PEFT). However, existing PEFT methods, particularly MoE-LoRA architectures, suffer from limited parameter efficiency and coarse-grained adaptation due to the...
Super Research: Answering Highly Complex Questions with Large Language Models through Super Deep and Super Wide Research
arXiv:2603.00582v1 Announce Type: new Abstract: While Large Language Models (LLMs) have demonstrated proficiency in Deep Research or Wide Search, their capacity to solve highly complex questions-those requiring long-horizon planning, massive evidence gathering, and synthesis across heterogeneous sources-remains largely unexplored. We...
SSKG Hub: An Expert-Guided Platform for LLM-Empowered Sustainability Standards Knowledge Graphs
arXiv:2603.00669v1 Announce Type: new Abstract: Sustainability disclosure standards (e.g., GRI, SASB, TCFD, IFRS S2) are comprehensive yet lengthy, terminology-dense, and highly cross-referential, hindering structured analysis and downstream use. We present SSKG Hub (Sustainability Standards Knowledge Graph Hub), a research prototype...
CHIMERA: Compact Synthetic Data for Generalizable LLM Reasoning
arXiv:2603.00889v1 Announce Type: new Abstract: Large Language Models (LLMs) have recently exhibited remarkable reasoning capabilities, largely enabled by supervised fine-tuning (SFT)- and reinforcement learning (RL)-based post-training on high-quality reasoning data. However, reproducing and extending these capabilities in open and scalable...
KVSlimmer: Theoretical Insights and Practical Optimizations for Asymmetric KV Merging
arXiv:2603.00907v1 Announce Type: new Abstract: The growing computational and memory demands of the Key-Value (KV) cache significantly limit the ability of Large Language Models (LLMs). While KV merging has emerged as a promising solution, existing methods that rely on empirical...
How RL Unlocks the Aha Moment in Geometric Interleaved Reasoning
arXiv:2603.01070v1 Announce Type: new Abstract: Solving complex geometric problems inherently requires interleaved reasoning: a tight alternation between constructing diagrams and performing logical deductions. Although recent Multimodal Large Language Models (MLLMs) have demonstrated strong capabilities in visual generation and plotting, we...
StaTS: Spectral Trajectory Schedule Learning for Adaptive Time Series Forecasting with Frequency Guided Denoiser
arXiv:2603.00037v1 Announce Type: new Abstract: Diffusion models have been used for probabilistic time series forecasting and show strong potential. However, fixed noise schedules often produce intermediate states that are hard to invert and a terminal state that deviates from the...
REMIND: Rethinking Medical High-Modality Learning under Missingness--A Long-Tailed Distribution Perspective
arXiv:2603.00046v1 Announce Type: new Abstract: Medical multi-modal learning is critical for integrating information from a large set of diverse modalities. However, when leveraging a high number of modalities in real clinical applications, it is often impractical to obtain full-modality observations...
TRIZ-RAGNER: A Retrieval-Augmented Large Language Model for TRIZ-Aware Named Entity Recognition in Patent-Based Contradiction Mining
arXiv:2602.23656v1 Announce Type: new Abstract: TRIZ-based contradiction mining is a fundamental task in patent analysis and systematic innovation, as it enables the identification of improving and worsening technical parameters that drive inventive problem solving. However, existing approaches largely rely on...
The Astonishing Ability of Large Language Models to Parse Jabberwockified Language
arXiv:2602.23928v1 Announce Type: new Abstract: We show that large language models (LLMs) have an astonishing ability to recover meaning from severely degraded English texts. Texts in which content words have been randomly substituted by nonsense strings, e.g., "At the ghybe...
CoME: Empowering Channel-of-Mobile-Experts with Informative Hybrid-Capabilities Reasoning
arXiv:2602.24142v1 Announce Type: new Abstract: Mobile Agents can autonomously execute user instructions, which requires hybrid-capabilities reasoning, including screen summary, subtask planning, action decision and action function. However, existing agents struggle to achieve both decoupled enhancement and balanced integration of these...
MT-PingEval: Evaluating Multi-Turn Collaboration with Private Information Games
arXiv:2602.24188v1 Announce Type: new Abstract: We present a scalable methodology for evaluating language models in multi-turn interactions, using a suite of collaborative games that require effective communication about private information. This enables an interactive scaling analysis, in which a fixed...
Do LLMs Benefit From Their Own Words?
arXiv:2602.24287v1 Announce Type: new Abstract: Multi-turn interactions with large language models typically retain the assistant's own past responses in the conversation history. In this work, we revisit this design choice by asking whether large language models benefit from conditioning on...
HiDrop: Hierarchical Vision Token Reduction in MLLMs via Late Injection, Concave Pyramid Pruning, and Early Exit
arXiv:2602.23699v1 Announce Type: cross Abstract: The quadratic computational cost of processing vision tokens in Multimodal Large Language Models (MLLMs) hinders their widespread adoption. While progressive vision token pruning offers a promising solution, current methods misinterpret shallow layer functions and use...
Active Value Querying to Minimize Additive Error in Subadditive Set Function Learning
arXiv:2602.23529v1 Announce Type: new Abstract: Subadditive set functions play a pivotal role in computational economics (especially in combinatorial auctions), combinatorial optimization or artificial intelligence applications such as interpretable machine learning. However, specifying a set function requires assigning values to an...
FlexGuard: Continuous Risk Scoring for Strictness-Adaptive LLM Content Moderation
arXiv:2602.23636v1 Announce Type: new Abstract: Ensuring the safety of LLM-generated content is essential for real-world deployment. Most existing guardrail models formulate moderation as a fixed binary classification task, implicitly assuming a fixed definition of harmfulness. In practice, enforcement strictness -...