How RL Unlocks the Aha Moment in Geometric Interleaved Reasoning
arXiv:2603.01070v1 Announce Type: new Abstract: Solving complex geometric problems inherently requires interleaved reasoning: a tight alternation between constructing diagrams and performing logical deductions. Although recent Multimodal Large Language Models (MLLMs) have demonstrated strong capabilities in visual generation and plotting, we...
StaTS: Spectral Trajectory Schedule Learning for Adaptive Time Series Forecasting with Frequency Guided Denoiser
arXiv:2603.00037v1 Announce Type: new Abstract: Diffusion models have been used for probabilistic time series forecasting and show strong potential. However, fixed noise schedules often produce intermediate states that are hard to invert and a terminal state that deviates from the...
Maximizing the Spectral Energy Gain in Sub-1-Bit LLMs via Latent Geometry Alignment
arXiv:2603.00042v1 Announce Type: new Abstract: We identify the Spectral Energy Gain in extreme model compression, where low-rank binary approximations outperform tiny-rank floating-point baselines for heavy-tailed spectra. However, prior attempts fail to realize this potential, trailing state-of-the-art 1-bit methods. We attribute...
Property-Driven Evaluation of GNN Expressiveness at Scale: Datasets, Framework, and Study
arXiv:2603.00044v1 Announce Type: new Abstract: Advancing trustworthy AI requires principled software engineering approaches to model evaluation. Graph Neural Networks (GNNs) have achieved remarkable success in processing graph-structured data, however, their expressiveness in capturing fundamental graph properties remains an open challenge....
MAML-KT: Addressing Cold Start Problem in Knowledge Tracing for New Students via Few-Shot Model-Agnostic Meta Learning
arXiv:2603.00137v1 Announce Type: new Abstract: Knowledge tracing (KT) models are commonly evaluated by training on early interactions from all students and testing on later responses. While effective for measuring average predictive performance, this evaluation design obscures a cold start scenario...
Bridging Policy and Real-World Dynamics: LLM-Augmented Rebalancing for Shared Micromobility Systems
arXiv:2603.00176v1 Announce Type: new Abstract: Shared micromobility services such as e-scooters and bikes have become an integral part of urban transportation, yet their efficiency critically depends on effective vehicle rebalancing. Existing methods either optimize for average demand patterns or employ...
Task-Driven Subspace Decomposition for Knowledge Sharing and Isolation in LoRA-based Continual Learning
arXiv:2603.00191v1 Announce Type: new Abstract: Continual Learning (CL) requires models to sequentially adapt to new tasks without forgetting old knowledge. Recently, Low-Rank Adaptation (LoRA), a representative Parameter-Efficient Fine-Tuning (PEFT) method, has gained increasing attention in CL. Several LoRA-based CL methods...
A medical coding language model trained on clinical narratives from a population-wide cohort of 1.8 million patients
arXiv:2603.00221v1 Announce Type: new Abstract: Medical coding translates clinical documentation into standardized codes for billing, research, and public health, but manual coding is time-consuming and error-prone. Existing automation efforts rely on small datasets that poorly represent real-world patient heterogeneity. We...
Vectorized Adaptive Histograms for Sparse Oblique Forests
arXiv:2603.00326v1 Announce Type: new Abstract: Classification using sparse oblique random forests provides guarantees on uncertainty and confidence while controlling for specific error types. However, they use more data and more compute than other tree ensembles because they create deep trees...
Quantifying Catastrophic Forgetting in IoT Intrusion Detection Systems
arXiv:2603.00363v1 Announce Type: new Abstract: Distribution shifts in attack patterns within RPL-based IoT networks pose a critical threat to the reliability and security of large-scale connected systems. Intrusion Detection Systems (IDS) trained on static datasets often fail to generalize to...
Deep Learning-Based Meat Freshness Detection with Segmentation and OOD-Aware Classification
arXiv:2603.00368v1 Announce Type: new Abstract: In this study, we present a meat freshness classification framework from Red-Green-Blue (RGB) images that supports both packaged and unpackaged meat datasets. The system classifies four in-distribution (ID) meat classes and uses an out-of-distribution (OOD)-aware...
Weight Updates as Activation Shifts: A Principled Framework for Steering
arXiv:2603.00425v1 Announce Type: new Abstract: Activation steering promises to be an extremely parameter-efficient form of adaptation, but its effectiveness depends on critical design choices -- such as intervention location and parameterization -- that currently rely on empirical heuristics rather than...
Benchmarking Few-shot Transferability of Pre-trained Models with Improved Evaluation Protocols
arXiv:2603.00478v1 Announce Type: new Abstract: Few-shot transfer has been revolutionized by stronger pre-trained models and improved adaptation algorithms.However, there lacks a unified, rigorous evaluation protocol that is both challenging and realistic for real-world usage. In this work, we establish FEWTRANS,...
Episode 41: Thinking through Rupture in International Economic Law: Views from Latin America - EJIL: The Podcast!
Multi-Agent Causal Reasoning for Suicide Ideation Detection Through Online Conversations
arXiv:2602.23577v1 Announce Type: new Abstract: Suicide remains a pressing global public health concern. While social media platforms offer opportunities for early risk detection through online conversation trees, existing approaches face two major limitations: (1) They rely on predefined rules (e.g.,...
LLM-Driven Multi-Turn Task-Oriented Dialogue Synthesis for Realistic Reasoning
arXiv:2602.23610v1 Announce Type: new Abstract: The reasoning capability of large language models (LLMs), defined as their ability to analyze, infer, and make decisions based on input information, is essential for building intelligent task-oriented dialogue systems. However, existing benchmarks do not...
Structured Prompt Optimization for Few-Shot Text Classification via Semantic Alignment in Latent Space
arXiv:2602.23753v1 Announce Type: new Abstract: This study addresses the issues of semantic entanglement, unclear label structure, and insufficient feature representation in few-shot text classification, and proposes an optimization framework based on structured prompts to enhance semantic understanding and task adaptation...
Divide and Conquer: Accelerating Diffusion-Based Large Language Models via Adaptive Parallel Decoding
arXiv:2602.23792v1 Announce Type: new Abstract: Diffusion-based large language models (dLLMs) have shown promising performance across various reasoning tasks, establishing themselves as an alternative to autoregressive large language models (LLMs). Unlike autoregressive LLMs that generate one token per step based on...
Task Complexity Matters: An Empirical Study of Reasoning in LLMs for Sentiment Analysis
arXiv:2602.24060v1 Announce Type: new Abstract: Large language models (LLMs) with reasoning capabilities have fueled a compelling narrative that reasoning universally improves performance across language tasks. We test this claim through a comprehensive evaluation of 504 configurations across seven model families--including...
CoME: Empowering Channel-of-Mobile-Experts with Informative Hybrid-Capabilities Reasoning
arXiv:2602.24142v1 Announce Type: new Abstract: Mobile Agents can autonomously execute user instructions, which requires hybrid-capabilities reasoning, including screen summary, subtask planning, action decision and action function. However, existing agents struggle to achieve both decoupled enhancement and balanced integration of these...
Task-Centric Acceleration of Small-Language Models
arXiv:2602.24174v1 Announce Type: new Abstract: Small language models (SLMs) have emerged as efficient alternatives to large language models for task-specific applications. However, they are often employed in high-volume, low-latency settings, where efficiency is crucial. We propose TASC, Task-Adaptive Sequence Compression,...
MT-PingEval: Evaluating Multi-Turn Collaboration with Private Information Games
arXiv:2602.24188v1 Announce Type: new Abstract: We present a scalable methodology for evaluating language models in multi-turn interactions, using a suite of collaborative games that require effective communication about private information. This enables an interactive scaling analysis, in which a fixed...
Controllable Reasoning Models Are Private Thinkers
arXiv:2602.24210v1 Announce Type: new Abstract: AI agents powered by reasoning models require access to sensitive user data. However, their reasoning traces are difficult to control, which can result in the unintended leakage of private information to external parties. We propose...
SWE-rebench V2: Language-Agnostic SWE Task Collection at Scale
arXiv:2602.23866v1 Announce Type: cross Abstract: Software engineering agents (SWE) are improving rapidly, with recent gains largely driven by reinforcement learning (RL). However, RL training is constrained by the scarcity of large-scale task collections with reproducible execution environments and reliable test...
U-CAN: Utility-Aware Contrastive Attenuation for Efficient Unlearning in Generative Recommendation
arXiv:2602.23400v1 Announce Type: new Abstract: Generative Recommendation (GenRec) typically leverages Large Language Models (LLMs) to redefine personalization as an instruction-driven sequence generation task. However, fine-tuning on user logs inadvertently encodes sensitive attributes into model parameters, raising critical privacy concerns. Existing...
Uncertainty-aware Language Guidance for Concept Bottleneck Models
arXiv:2602.23495v1 Announce Type: new Abstract: Concept Bottleneck Models (CBMs) provide inherent interpretability by first mapping input samples to high-level semantic concepts, followed by a combination of these concepts for the final classification. However, the annotation of human-understandable concepts requires extensive...
FedDAG: Clustered Federated Learning via Global Data and Gradient Integration for Heterogeneous Environments
arXiv:2602.23504v1 Announce Type: new Abstract: Federated Learning (FL) enables a group of clients to collaboratively train a model without sharing individual data, but its performance drops when client data are heterogeneous. Clustered FL tackles this by grouping similar clients. However,...
Rudder: Steering Prefetching in Distributed GNN Training using LLM Agents
arXiv:2602.23556v1 Announce Type: new Abstract: Large-scale Graph Neural Networks (GNNs) are typically trained by sampling a vertex's neighbors to a fixed distance. Because large input graphs are distributed, training requires frequent irregular communication that stalls forward progress. Moreover, fetched data...
BTTackler: A Diagnosis-based Framework for Efficient Deep Learning Hyperparameter Optimization
arXiv:2602.23630v1 Announce Type: new Abstract: Hyperparameter optimization (HPO) is known to be costly in deep learning, especially when leveraging automated approaches. Most of the existing automated HPO methods are accuracy-based, i.e., accuracy metrics are used to guide the trials of...
FlexGuard: Continuous Risk Scoring for Strictness-Adaptive LLM Content Moderation
arXiv:2602.23636v1 Announce Type: new Abstract: Ensuring the safety of LLM-generated content is essential for real-world deployment. Most existing guardrail models formulate moderation as a fixed binary classification task, implicitly assuming a fixed definition of harmfulness. In practice, enforcement strictness -...