On the Emotion Understanding of Synthesized Speech
arXiv:2603.16483v1 Announce Type: new Abstract: Emotion is a core paralinguistic feature in voice interaction. It is widely believed that emotion understanding models learn fundamental representations that transfer to synthesized speech, making emotion understanding results a plausible reward or evaluation metric...
AdaMem: Adaptive User-Centric Memory for Long-Horizon Dialogue Agents
arXiv:2603.16496v1 Announce Type: new Abstract: Large language model (LLM) agents increasingly rely on external memory to support long-horizon interaction, personalized assistance, and multi-step reasoning. However, existing memory systems still face three core challenges: they often rely too heavily on semantic...
How often do Answers Change? Estimating Recency Requirements in Question Answering
arXiv:2603.16544v1 Announce Type: new Abstract: Large language models (LLMs) often rely on outdated knowledge when answering time-sensitive questions, leading to confident yet incorrect responses. Without explicit signals indicating whether up-to-date information is required, models struggle to decide when to retrieve...
Discovering the Hidden Role of Gini Index In Prompt-based Classification
arXiv:2603.15654v1 Announce Type: new Abstract: In classification tasks, the long-tailed minority classes usually offer the predictions that are most important. Yet these classes consistently exhibit low accuracies, whereas a few high-performing classes dominate the game. We pursue a foundational understanding...
Transition Flow Matching
arXiv:2603.15689v1 Announce Type: new Abstract: Mainstream flow matching methods typically focus on learning the local velocity field, which inherently requires multiple integration steps during generation. In contrast, Mean Velocity Flow models establish a relationship between the local velocity field and...
Embedding-Aware Feature Discovery: Bridging Latent Representations and Interpretable Features in Event Sequences
arXiv:2603.15713v1 Announce Type: new Abstract: Industrial financial systems operate on temporal event sequences such as transactions, user actions, and system logs. While recent research emphasizes representation learning and large language models, production systems continue to rely heavily on handcrafted statistical...
Meta-TTRL: A Metacognitive Framework for Self-Improving Test-Time Reinforcement Learning in Unified Multimodal Models
arXiv:2603.15724v1 Announce Type: new Abstract: Existing test-time scaling (TTS) methods for unified multimodal models (UMMs) in text-to-image (T2I) generation primarily rely on search or sampling strategies that produce only instance-level improvements, limiting the ability to learn from prior inferences and...
Mask Is What DLLM Needs: A Masked Data Training Paradigm for Diffusion LLMs
arXiv:2603.15803v1 Announce Type: new Abstract: Discrete diffusion models offer global context awareness and flexible parallel generation. However, uniform random noise schedulers in standard DLLM training overlook the highly non-uniform information density inherent in real-world sequences. This wastes optimization resources on...
FlashSampling: Fast and Memory-Efficient Exact Sampling
arXiv:2603.15854v1 Announce Type: new Abstract: Sampling from a categorical distribution is mathematically simple, but in large-vocabulary decoding, it often triggers extra memory traffic and extra kernels after the LM head. We present FlashSampling, an exact sampling primitive that fuses sampling...
Counteractive RL: Rethinking Core Principles for Efficient and Scalable Deep Reinforcement Learning
arXiv:2603.15871v1 Announce Type: new Abstract: Following the pivotal success of learning strategies to win at tasks, solely by interacting with an environment without any supervision, agents have gained the ability to make sequential decisions in complex MDPs. Yet, reinforcement learning...
Game-Theory-Assisted Reinforcement Learning for Border Defense: Early Termination based on Analytical Solutions
arXiv:2603.15907v1 Announce Type: new Abstract: Game theory provides the gold standard for analyzing adversarial engagements, offering strong optimality guarantees. However, these guarantees often become brittle when assumptions such as perfect information are violated. Reinforcement learning (RL), by contrast, is adaptive...
Deriving Hyperparameter Scaling Laws via Modern Optimization Theory
arXiv:2603.15958v1 Announce Type: new Abstract: Hyperparameter transfer has become an important component of modern large-scale training recipes. Existing methods, such as muP, primarily focus on transfer between model sizes, with transfer across batch sizes and training horizons often relying on...
W2T: LoRA Weights Already Know What They Can Do
arXiv:2603.15990v1 Announce Type: new Abstract: Each LoRA checkpoint compactly stores task-specific updates in low-rank weight matrices, offering an efficient way to adapt large language models to new tasks and domains. In principle, these weights already encode what the adapter does...
The Importance of Being Smoothly Calibrated
arXiv:2603.16015v1 Announce Type: new Abstract: Recent work has highlighted the centrality of smooth calibration [Kakade and Foster, 2008] as a robust measure of calibration error. We generalize, unify, and extend previous results on smooth calibration, both as a robust calibration...
Noisy Data is Destructive to Reinforcement Learning with Verifiable Rewards
arXiv:2603.16140v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards (RLVR) has driven recent capability advances of large language models across various domains. Recent studies suggest that improved RLVR algorithms allow models to learn effectively from incorrect annotations, achieving performance...
Apple can delist apps "with or without cause," judge says in loss for Musi app
Judge tosses Musi case against Apple, sanctions lawyers for "mak[ing] up facts."
AI’s ‘boys’ club’ could widen the wealth gap for women, says Rana el Kaliouby
AI investor Rana el Kaliouby warns that if women are shut out of AI funding and leadership, the consequences will be grim.
Her Fundamental Right To Procreate: The Unconstitutionality Of Abortion Bans
As she was wheeled into surgery, Amber Thurman said to her mother, “Promise me you’ll take care of my son.” She was suffering a rare complication from the abortion pill that she was legally prescribed at nine weeks of pregnancy....
A Critical Analysis Of Rap Shield Laws
For years, scholars have been sounding the alarm on “rap on trial,” or the use of rap as evidence in criminal proceedings, pointing out that the fundamental characteristics of rap music make it uniquely susceptible to misinterpretation and prejudice. Scholars...
Can We Trust LLMs on Memristors? Diving into Reasoning Ability under Non-Ideality
arXiv:2603.13725v1 Announce Type: new Abstract: Memristor-based analog compute-in-memory (CIM) architectures provide a promising substrate for the efficient deployment of Large Language Models (LLMs), owing to superior energy efficiency and computational density. However, these architectures suffer from precision issues caused by...
Think First, Diffuse Fast: Improving Diffusion Language Model Reasoning via Autoregressive Plan Conditioning
arXiv:2603.13243v1 Announce Type: new Abstract: Diffusion large language models (dLLMs) generate text via iterative denoising but consistently underperform on multi-step reasoning. We hypothesize this gap stems from a coordination problem: AR models build coherence token-by-token, while diffusion models must coordinate...
LLM Routing as Reasoning: A MaxSAT View
arXiv:2603.13612v1 Announce Type: new Abstract: Routing a query through an appropriate LLM is challenging, particularly when user preferences are expressed in natural language and model attributes are only partially observable. We propose a constraint-based interpretation of language-conditioned LLM routing, formulating...
ManiBench: A Benchmark for Testing Visual-Logic Drift and Syntactic Hallucinations in Manim Code Generation
arXiv:2603.13251v1 Announce Type: new Abstract: Traditional benchmarks like HumanEval and MBPP test logic and syntax effectively, but fail when code must produce dynamic, pedagogical visuals. We introduce ManiBench, a specialized benchmark evaluating LLM performance in generating Manim CE code, where...
Traffic and weather driven hybrid digital twin for bridge monitoring
arXiv:2603.14028v1 Announce Type: new Abstract: A hybrid digital twin framework is presented for bridge condition monitoring using existing traffic cameras and weather APIs, reducing reliance on dedicated sensor installations. The approach is demonstrated on the Peace Bridge (99 years in...
Early Rug Pull Warning for BSC Meme Tokens via Multi-Granularity Wash-Trading Pattern Profiling
arXiv:2603.13830v1 Announce Type: new Abstract: The high-frequency issuance and short-cycle speculation of meme tokens in decentralized finance (DeFi) have significantly amplified rug-pull risk. Existing approaches still struggle to provide stable early warning under scarce anomalies, incomplete labels, and limited interpretability....
A Systematic Evaluation Protocol of Graph-Derived Signals for Tabular Machine Learning
arXiv:2603.13998v1 Announce Type: new Abstract: While graph-derived signals are widely used in tabular learning, existing studies typically rely on limited experimental setups and average performance comparisons, leaving the statistical reliability and robustness of observed gains largely unexplored. Consequently, it remains...
vla-eval: A Unified Evaluation Harness for Vision-Language-Action Models
arXiv:2603.13966v1 Announce Type: new Abstract: Vision Language Action VLA models are typically evaluated using per benchmark scripts maintained independently by each model repository, leading to duplicated code, dependency conflicts, and underspecified protocols. We present vla eval, an open source evaluation...
GhanaNLP Parallel Corpora: Comprehensive Multilingual Resources for Low-Resource Ghanaian Languages
arXiv:2603.13793v1 Announce Type: new Abstract: Low resource languages present unique challenges for natural language processing due to the limited availability of digitized and well structured linguistic data. To address this gap, the GhanaNLP initiative has developed and curated 41,513 parallel...
Learning When to Trust in Contextual Bandits
arXiv:2603.13356v1 Announce Type: new Abstract: Standard approaches to Robust Reinforcement Learning assume that feedback sources are either globally trustworthy or globally adversarial. In this paper, we challenge this assumption and we identify a more subtle failure mode. We term this...
DeceptGuard :A Constitutional Oversight Framework For Detecting Deception in LLM Agents
arXiv:2603.13791v1 Announce Type: new Abstract: Reliable detection of deceptive behavior in Large Language Model (LLM) agents is an essential prerequisite for safe deployment in high-stakes agentic contexts. Prior work on scheming detection has focused exclusively on black-box monitors that observe...