Immigration Enforcement and Constraints on Information Commandeering
The debate over American immigration policy reflects deep moral divides over the meaning of American identity and the scope of fundamental individual rights like due process and the freedom of movement. Although the modern American immigration system no longer includes...
DiligenceSquared uses AI, voice agents to make M&A research affordable
Instead of relying on expensive management consultants, the startup uses AI voice agents to conduct interviews with customers of the companies the PE firms are considering buying.
From Conflict to Consensus: Boosting Medical Reasoning via Multi-Round Agentic RAG
arXiv:2603.03292v1 Announce Type: cross Abstract: Large Language Models (LLMs) exhibit high reasoning capacity in medical question-answering, but their tendency to produce hallucinations and outdated knowledge poses critical risks in healthcare fields. While Retrieval-Augmented Generation (RAG) mitigates these issues, existing methods...
TATRA: Training-Free Instance-Adaptive Prompting Through Rephrasing and Aggregation
arXiv:2603.03298v1 Announce Type: cross Abstract: Large Language Models (LLMs) have improved substantially alignment, yet their behavior remains highly sensitive to prompt phrasing. This brittleness has motivated automated prompt engineering, but most existing methods (i) require a task-specific training set, (ii)...
Token-Oriented Object Notation vs JSON: A Benchmark of Plain and Constrained Decoding Generation
arXiv:2603.03306v1 Announce Type: cross Abstract: Recently presented Token-Oriented Object Notation (TOON) aims to replace JSON as a serialization format for passing structured data to LLMs with significantly reduced token usage. While showing solid accuracy in LLM comprehension, there is a...
Can Large Language Models Derive New Knowledge? A Dynamic Benchmark for Biological Knowledge Discovery
arXiv:2603.03322v1 Announce Type: cross Abstract: Recent advancements in Large Language Model (LLM) agents have demonstrated remarkable potential in automatic knowledge discovery. However, rigorously evaluating an AI's capacity for knowledge discovery remains a critical challenge. Existing benchmarks predominantly rely on static...
Fragile Thoughts: How Large Language Models Handle Chain-of-Thought Perturbations
arXiv:2603.03332v1 Announce Type: new Abstract: Chain-of-Thought (CoT) prompting has emerged as a foundational technique for eliciting reasoning from Large Language Models (LLMs), yet the robustness of this approach to corruptions in intermediate reasoning steps remains poorly understood. This paper presents...
Training-free Dropout Sampling for Semantic Token Acceptance in Speculative Decoding
arXiv:2603.03333v1 Announce Type: new Abstract: Speculative decoding accelerates large language model inference by proposing tokens with a lightweight draft model and selectively accepting them using a target model. This work introduces DropMatch, a novel approach that matches draft tokens to...
Prompt-Dependent Ranking of Large Language Models with Uncertainty Quantification
arXiv:2603.03336v1 Announce Type: new Abstract: Rankings derived from pairwise comparisons are central to many economic and computational systems. In the context of large language models (LLMs), rankings are typically constructed from human preference data and presented as leaderboards that guide...
Tracing Pharmacological Knowledge In Large Language Models
arXiv:2603.03407v1 Announce Type: new Abstract: Large language models (LLMs) have shown strong empirical performance across pharmacology and drug discovery tasks, yet the internal mechanisms by which they encode pharmacological knowledge remain poorly understood. In this work, we investigate how drug-group...
A theoretical model of dynamical grammatical gender shifting based on set-valued set function
arXiv:2603.03510v1 Announce Type: new Abstract: This study investigates the diverse characteristics of nouns, focusing on both semantic (e.g., countable/uncountable) and morphosyntactic (e.g., masculine/feminine) distinctions. We explore inter-word variations for gender markers in noun morphology. Grammatical gender shift is a widespread...
Q-Measure-Learning for Continuous State RL: Efficient Implementation and Convergence
arXiv:2603.03523v1 Announce Type: new Abstract: We study reinforcement learning in infinite-horizon discounted Markov decision processes with continuous state spaces, where data are generated online from a single trajectory under a Markovian behavior policy. To avoid maintaining an infinite-dimensional, function-valued estimate,...
Trade-offs in Ensembling, Merging and Routing Among Parameter-Efficient Experts
arXiv:2603.03535v1 Announce Type: new Abstract: While large language models (LLMs) fine-tuned with lightweight adapters achieve strong performance across diverse tasks, their performance on individual tasks depends on the fine-tuning strategy. Fusing independently trained models with different strengths has shown promise...
Online Learnability of Chain-of-Thought Verifiers: Soundness and Completeness Trade-offs
arXiv:2603.03538v1 Announce Type: new Abstract: Large language models with chain-of-thought generation have demonstrated great potential for producing complex mathematical proofs. However, their reasoning can often go astray, leading to increasing interest in formal and learned verifiers. A major challenge in...
NuMuon: Nuclear-Norm-Constrained Muon for Compressible LLM Training
arXiv:2603.03597v1 Announce Type: new Abstract: The rapid progress of large language models (LLMs) is increasingly constrained by memory and deployment costs, motivating compression methods for practical deployment. Many state-of-the-art compression pipelines leverage the low-rank structure of trained weight matrices, a...
Why Are Linear RNNs More Parallelizable?
arXiv:2603.03612v1 Announce Type: new Abstract: The community is increasingly exploring linear RNNs (LRNNs) as language models, motivated by their expressive power and parallelizability. While prior work establishes the expressivity benefits of LRNNs over transformers, it is unclear what makes LRNNs...
A Stein Identity for q-Gaussians with Bounded Support
arXiv:2603.03673v1 Announce Type: new Abstract: Stein's identity is a fundamental tool in machine learning with applications in generative models, stochastic optimization, and other problems involving gradients of expectations under Gaussian distributions. Less attention has been paid to problems with non-Gaussian...
Relational In-Context Learning via Synthetic Pre-training with Structural Prior
arXiv:2603.03805v1 Announce Type: new Abstract: Relational Databases (RDBs) are the backbone of modern business, yet they lack foundation models comparable to those in text or vision. A key obstacle is that high-quality RDBs are private, scarce and structurally heterogeneous, making...
Pretrained Vision-Language-Action Models are Surprisingly Resistant to Forgetting in Continual Learning
arXiv:2603.03818v1 Announce Type: new Abstract: Continual learning is a long-standing challenge in robot policy learning, where a policy must acquire new skills over time without catastrophically forgetting previously learned ones. While prior work has extensively studied continual learning in relatively...
Fairness Begins with State: Purifying Latent Preferences for Hierarchical Reinforcement Learning in Interactive Recommendation
arXiv:2603.03820v1 Announce Type: new Abstract: Interactive recommender systems (IRS) are increasingly optimized with Reinforcement Learning (RL) to capture the sequential nature of user-system dynamics. However, existing fairness-aware methods often suffer from a fundamental oversight: they assume the observed user state...
HateMirage: An Explainable Multi-Dimensional Dataset for Decoding Faux Hate and Subtle Online Abuse
arXiv:2603.02684v1 Announce Type: new Abstract: Subtle and indirect hate speech remains an underexplored challenge in online safety research, particularly when harmful intent is embedded within misleading or manipulative narratives. Existing hate speech datasets primarily capture overt toxicity, underrepresenting the nuanced...
Graph-GRPO: Stabilizing Multi-Agent Topology Learning via Group Relative Policy Optimization
arXiv:2603.02701v1 Announce Type: new Abstract: Optimizing communication topology is fundamental to the efficiency and effectiveness of Large Language Model (LLM)-based Multi-Agent Systems (MAS). While recent approaches utilize reinforcement learning to dynamically construct task-specific graphs, they typically rely on single-sample policy...
PrivMedChat: End-to-End Differentially Private RLHF for Medical Dialogue Systems
arXiv:2603.03054v1 Announce Type: new Abstract: Large language models are increasingly used for patient-facing medical assistance and clinical decision support, but adapting them to clinical dialogue often requires supervision derived from doctor-patient conversations that may contain sensitive information. Conventional supervised fine-tuning...
Code2Math: Can Your Code Agent Effectively Evolve Math Problems Through Exploration?
arXiv:2603.03202v1 Announce Type: new Abstract: As large language models (LLMs) advance their mathematical capabilities toward the IMO level, the scarcity of challenging, high-quality problems for training and evaluation has become a significant bottleneck. Simultaneously, recent code agents have demonstrated sophisticated...
A Directed Graph Model and Experimental Framework for Design and Study of Time-Dependent Text Visualisation
arXiv:2603.02422v1 Announce Type: cross Abstract: Exponential growth in the quantity of digital news, social media, and other textual sources makes it difficult for humans to keep up with rapidly evolving narratives about world events. Various visualisation techniques have been touted...
MUSE: A Run-Centric Platform for Multimodal Unified Safety Evaluation of Large Language Models
arXiv:2603.02482v1 Announce Type: cross Abstract: Safety evaluation and red-teaming of large language models remain predominantly text-centric, and existing frameworks lack the infrastructure to systematically test whether alignment generalizes to audio, image, and video inputs. We present MUSE (Multimodal Unified Safety...
Through the Lens of Contrast: Self-Improving Visual Reasoning in VLMs
arXiv:2603.02556v1 Announce Type: cross Abstract: Reasoning has emerged as a key capability of large language models. In linguistic tasks, this capability can be enhanced by self-improving techniques that refine reasoning paths for subsequent finetuning. However, extending these language-based self-improving approaches...
FlashEvaluator: Expanding Search Space with Parallel Evaluation
arXiv:2603.02565v1 Announce Type: cross Abstract: The Generator-Evaluator (G-E) framework, i.e., evaluating K sequences from a generator and selecting the top-ranked one according to evaluator scores, is a foundational paradigm in tasks such as Recommender Systems (RecSys) and Natural Language Processing...
RxnNano:Training Compact LLMs for Chemical Reaction and Retrosynthesis Prediction via Hierarchical Curriculum Learning
arXiv:2603.02215v1 Announce Type: new Abstract: Chemical reaction prediction is pivotal for accelerating drug discovery and synthesis planning. Despite advances in data-driven models, current approaches are hindered by an overemphasis on parameter and dataset scaling. Some methods coupled with evaluation techniques...
NExT-Guard: Training-Free Streaming Safeguard without Token-Level Labels
arXiv:2603.02219v1 Announce Type: new Abstract: Large language models are increasingly deployed in streaming scenarios, rendering conventional post-hoc safeguards ineffective as they fail to interdict unsafe content in real-time. While streaming safeguards based on token-level supervised training could address this, they...