IntPro: A Proxy Agent for Context-Aware Intent Understanding via Retrieval-conditioned Inference
arXiv:2603.03325v1 Announce Type: cross Abstract: Large language models (LLMs) have become integral to modern Human-AI collaboration workflows, where accurately understanding user intent serves as a crucial step for generating satisfactory responses. Context-aware intent understanding, which involves inferring user intentions from...
SE-Search: Self-Evolving Search Agent via Memory and Dense Reward
arXiv:2603.03293v1 Announce Type: new Abstract: Retrieval augmented generation (RAG) reduces hallucinations and factual errors in large language models (LLMs) by conditioning generation on retrieved external knowledge. Recent search agents further cast RAG as an autonomous, multi-turn information-seeking process. However, existing...
Benchmarking Legal RAG: The Promise and Limits of AI Statutory Surveys
arXiv:2603.03300v1 Announce Type: new Abstract: Retrieval-augmented generation (RAG) offers significant potential for legal AI, yet systematic benchmarks are sparse. Prior work introduced LaborBench to benchmark RAG models based on ostensible ground truth from an exhaustive, multi-month, manual enumeration of all...
StructLens: A Structural Lens for Language Models via Maximum Spanning Trees
arXiv:2603.03328v1 Announce Type: new Abstract: Language exhibits inherent structures, a property that explains both language acquisition and language change. Given this characteristic, we expect language models to manifest internal structures as well. While interpretability research has investigated the components of...
Tracing Pharmacological Knowledge In Large Language Models
arXiv:2603.03407v1 Announce Type: new Abstract: Large language models (LLMs) have shown strong empirical performance across pharmacology and drug discovery tasks, yet the internal mechanisms by which they encode pharmacological knowledge remain poorly understood. In this work, we investigate how drug-group...
A theoretical model of dynamical grammatical gender shifting based on set-valued set function
arXiv:2603.03510v1 Announce Type: new Abstract: This study investigates the diverse characteristics of nouns, focusing on both semantic (e.g., countable/uncountable) and morphosyntactic (e.g., masculine/feminine) distinctions. We explore inter-word variations for gender markers in noun morphology. Grammatical gender shift is a widespread...
Trade-offs in Ensembling, Merging and Routing Among Parameter-Efficient Experts
arXiv:2603.03535v1 Announce Type: new Abstract: While large language models (LLMs) fine-tuned with lightweight adapters achieve strong performance across diverse tasks, their performance on individual tasks depends on the fine-tuning strategy. Fusing independently trained models with different strengths has shown promise...
Riemannian Optimization in Modular Systems
arXiv:2603.03610v1 Announce Type: new Abstract: Understanding how systems built out of modular components can be jointly optimized is an important problem in biology, engineering, and machine learning. The backpropagation algorithm is one such solution and has been instrumental in the...
MOOSE-Star: Unlocking Tractable Training for Scientific Discovery by Breaking the Complexity Barrier
arXiv:2603.03756v1 Announce Type: new Abstract: While large language models (LLMs) show promise in scientific discovery, existing research focuses on inference or feedback-driven training, leaving the direct modeling of the generative reasoning process, $P(\text{hypothesis}|\text{background})$ ($P(h|b)$), unexplored. We demonstrate that directly training...
Pretrained Vision-Language-Action Models are Surprisingly Resistant to Forgetting in Continual Learning
arXiv:2603.03818v1 Announce Type: new Abstract: Continual learning is a long-standing challenge in robot policy learning, where a policy must acquire new skills over time without catastrophically forgetting previously learned ones. While prior work has extensively studied continual learning in relatively...
Nodes Are Early, Edges Are Late: Probing Diagram Representations in Large Vision-Language Models
arXiv:2603.02865v1 Announce Type: new Abstract: Large vision-language models (LVLMs) demonstrate strong performance on diagram understanding benchmarks, yet they still struggle with understanding relationships between elements, particularly those represented by nodes and directed edges (e.g., arrows and lines). To investigate the...
LaTeX Compilation: Challenges in the Era of LLMs
arXiv:2603.02873v1 Announce Type: new Abstract: As large language models (LLMs) increasingly assist scientific writing, limitations and the significant token cost of TeX become more and more visible. This paper analyzes TeX's fundamental defects in compilation and user experience design to...
Code2Math: Can Your Code Agent Effectively Evolve Math Problems Through Exploration?
arXiv:2603.03202v1 Announce Type: new Abstract: As large language models (LLMs) advance their mathematical capabilities toward the IMO level, the scarcity of challenging, high-quality problems for training and evaluation has become a significant bottleneck. Simultaneously, recent code agents have demonstrated sophisticated...
Learning When to Act or Refuse: Guarding Agentic Reasoning Models for Safe Multi-Step Tool Use
arXiv:2603.03205v1 Announce Type: new Abstract: Agentic language models operate in a fundamentally different safety regime than chat models: they must plan, call tools, and execute long-horizon actions where a single misstep, such as accessing files or entering credentials, can cause...
Using Learning Progressions to Guide AI Feedback for Science Learning
arXiv:2603.03249v1 Announce Type: new Abstract: Generative artificial intelligence (AI) offers scalable support for formative feedback, yet most AI-generated feedback relies on task-specific rubrics authored by domain experts. While effective, rubric authoring is time-consuming and limits scalability across instructional contexts. Learning...
Routing Absorption in Sparse Attention: Why Random Gates Are Hard to Beat
arXiv:2603.02227v1 Announce Type: cross Abstract: Can a transformer learn which attention entries matter during training? In principle, yes: attention distributions are highly concentrated, and a small gate network can identify the important entries post-hoc with near-perfect accuracy. In practice, barely....
Safety Training Persists Through Helpfulness Optimization in LLM Agents
arXiv:2603.02229v1 Announce Type: cross Abstract: Safety post-training has been studied extensively in single-step "chat" settings where safety typically refers to refusing harmful requests. We study an "agentic" (i.e., multi-step, tool-use) setting where safety refers to harmful actions directly taken by...
MedFeat: Model-Aware and Explainability-Driven Feature Engineering with LLMs for Clinical Tabular Prediction
arXiv:2603.02221v1 Announce Type: new Abstract: In healthcare tabular predictions, classical models with feature engineering often outperform neural approaches. Recent advances in Large Language Models enable the integration of domain knowledge into feature engineering, offering a promising direction. However, existing approaches...
Characterizing and Predicting Wildfire Evacuation Behavior: A Dual-Stage ML Approach
arXiv:2603.02223v1 Announce Type: new Abstract: Wildfire evacuation behavior is highly variable and influenced by complex interactions among household resources, preparedness, and situational cues. Using a large-scale MTurk survey of residents in California, Colorado, and Oregon, this study integrates unsupervised and...
Graph Attention Based Prioritization of Disease Responsible Genes from Multimodal Alzheimer's Network
arXiv:2603.02273v1 Announce Type: new Abstract: Prioritizing disease-associated genes is central to understanding the molecular mechanisms of complex disorders such as Alzheimer's disease (AD). Traditional network-based approaches rely on static centrality measures and often fail to capture cross-modal biological heterogeneity. We...
Rigidity-Aware Geometric Pretraining for Protein Design and Conformational Ensembles
arXiv:2603.02406v1 Announce Type: new Abstract: Generative models have recently advanced $\textit{de novo}$ protein design by learning the statistical regularities of natural structures. However, current approaches face three key limitations: (1) Existing methods cannot jointly learn protein geometry and design tasks,...
A Unified Revisit of Temperature in Classification-Based Knowledge Distillation
arXiv:2603.02430v1 Announce Type: new Abstract: A central idea of knowledge distillation is to expose relational structure embedded in the teacher's weights for the student to learn, which is often facilitated using a temperature parameter. Despite its widespread use, there remains...
Court unanimously sides with government in immigration dispute
The Supreme Court unanimously sided with the federal government on Wednesday in Urias-Orellana v. Bondi, holding in an opinion by Justice Ketanji Brown Jackson that federal courts of appeals must […]The postCourt unanimously sides with government in immigration disputeappeared first...
Lawsuit: Google Gemini sent man on violent missions, set suicide "countdown"
Gemini allegedly called man its "husband," said they could be together in death.
Distribution-Aware Companding Quantization of Large Language Models
arXiv:2603.00364v1 Announce Type: new Abstract: Large language models such as GPT and Llama are trained with a next-token prediction loss. In this work, we suggest that training language models to predict multiple future tokens at once results in higher sample...
Policy Compliance of User Requests in Natural Language for AI Systems
arXiv:2603.00369v1 Announce Type: new Abstract: Consider an organization whose users send requests in natural language to an AI system that fulfills them by carrying out specific tasks. In this paper, we consider the problem of ensuring such user requests comply...
CIRCUS: Circuit Consensus under Uncertainty via Stability Ensembles
arXiv:2603.00523v1 Announce Type: new Abstract: Mechanistic circuit discovery is notoriously sensitive to arbitrary analyst choices, especially pruning thresholds and feature dictionaries, often yielding brittle "one-shot" explanations with no principled notion of uncertainty. We reframe circuit discovery as an uncertainty-quantification problem...
Super Research: Answering Highly Complex Questions with Large Language Models through Super Deep and Super Wide Research
arXiv:2603.00582v1 Announce Type: new Abstract: While Large Language Models (LLMs) have demonstrated proficiency in Deep Research or Wide Search, their capacity to solve highly complex questions-those requiring long-horizon planning, massive evidence gathering, and synthesis across heterogeneous sources-remains largely unexplored. We...
From Literature to Hypotheses: An AI Co-Scientist System for Biomarker-Guided Drug Combination Hypothesis Generation
arXiv:2603.00612v1 Announce Type: new Abstract: The rapid growth of biomedical literature and curated databases has made it increasingly difficult for researchers to systematically connect biomarker mechanisms to actionable drug combination hypotheses. We present AI Co-Scientist (CoDHy), an interactive, human-in-the-loop system...
SSKG Hub: An Expert-Guided Platform for LLM-Empowered Sustainability Standards Knowledge Graphs
arXiv:2603.00669v1 Announce Type: new Abstract: Sustainability disclosure standards (e.g., GRI, SASB, TCFD, IFRS S2) are comprehensive yet lengthy, terminology-dense, and highly cross-referential, hindering structured analysis and downstream use. We present SSKG Hub (Sustainability Standards Knowledge Graph Hub), a research prototype...