MOOSE-Star: Unlocking Tractable Training for Scientific Discovery by Breaking the Complexity Barrier
arXiv:2603.03756v1 Announce Type: new Abstract: While large language models (LLMs) show promise in scientific discovery, existing research focuses on inference or feedback-driven training, leaving the direct modeling of the generative reasoning process, $P(\text{hypothesis}|\text{background})$ ($P(h|b)$), unexplored. We demonstrate that directly training...
LEA: Label Enumeration Attack in Vertical Federated Learning
arXiv:2603.03777v1 Announce Type: new Abstract: A typical Vertical Federated Learning (VFL) scenario involves several participants collaboratively training a machine learning model, where each party has different features for the same samples, with labels held exclusively by one party. Since labels...
Inverse Contextual Bandits without Rewards: Learning from a Non-Stationary Learner via Suffix Imitation
arXiv:2603.03778v1 Announce Type: new Abstract: We study the Inverse Contextual Bandit (ICB) problem, in which a learner seeks to optimize a policy while an observer, who cannot access the learner's rewards and only observes actions, aims to recover the underlying...
k-hop Fairness: Addressing Disparities in Graph Link Prediction Beyond First-Order Neighborhoods
arXiv:2603.03867v1 Announce Type: new Abstract: Link prediction (LP) plays a central role in graph-based applications, particularly in social recommendation. However, real-world graphs often reflect structural biases, most notably homophily, the tendency of nodes with similar attributes to connect. While this...
Graph-GRPO: Stabilizing Multi-Agent Topology Learning via Group Relative Policy Optimization
arXiv:2603.02701v1 Announce Type: new Abstract: Optimizing communication topology is fundamental to the efficiency and effectiveness of Large Language Model (LLM)-based Multi-Agent Systems (MAS). While recent approaches utilize reinforcement learning to dynamically construct task-specific graphs, they typically rely on single-sample policy...
ACE-Merging: Data-Free Model Merging with Adaptive Covariance Estimation
arXiv:2603.02945v1 Announce Type: new Abstract: Model merging aims to combine multiple task-specific expert models into a single model while preserving generalization across diverse tasks. However, interference among experts, especially when they are trained on different objectives, often leads to significant...
TrustMH-Bench: A Comprehensive Benchmark for Evaluating the Trustworthiness of Large Language Models in Mental Health
arXiv:2603.03047v1 Announce Type: new Abstract: While Large Language Models (LLMs) demonstrate significant potential in providing accessible mental health support, their practical deployment raises critical trustworthiness concerns due to the domains high-stakes and safety-sensitive nature. Existing evaluation paradigms for general-purpose LLMs...
PrivMedChat: End-to-End Differentially Private RLHF for Medical Dialogue Systems
arXiv:2603.03054v1 Announce Type: new Abstract: Large language models are increasingly used for patient-facing medical assistance and clinical decision support, but adapting them to clinical dialogue often requires supervision derived from doctor-patient conversations that may contain sensitive information. Conventional supervised fine-tuning...
Evaluating Performance Drift from Model Switching in Multi-Turn LLM Systems
arXiv:2603.03111v1 Announce Type: new Abstract: Deployed multi-turn LLM systems routinely switch models mid-interaction due to upgrades, cross-provider routing, and fallbacks. Such handoffs create a context mismatch: the model generating later turns must condition on a dialogue prefix authored by a...
UniSkill: A Dataset for Matching University Curricula to Professional Competencies
arXiv:2603.03134v1 Announce Type: new Abstract: Skill extraction and recommendation systems have been studied from recruiter, applicant, and education perspectives. While AI applications in job advertisements have received broad attention, deficiencies in the instructed skills side remain a challenge. In this...
RxnNano:Training Compact LLMs for Chemical Reaction and Retrosynthesis Prediction via Hierarchical Curriculum Learning
arXiv:2603.02215v1 Announce Type: new Abstract: Chemical reaction prediction is pivotal for accelerating drug discovery and synthesis planning. Despite advances in data-driven models, current approaches are hindered by an overemphasis on parameter and dataset scaling. Some methods coupled with evaluation techniques...
ATPO: Adaptive Tree Policy Optimization for Multi-Turn Medical Dialogue
arXiv:2603.02216v1 Announce Type: new Abstract: Effective information seeking in multi-turn medical dialogues is critical for accurate diagnosis, especially when dealing with incomplete information. Aligning Large Language Models (LLMs) for these interactive scenarios is challenging due to the uncertainty inherent in...
MedFeat: Model-Aware and Explainability-Driven Feature Engineering with LLMs for Clinical Tabular Prediction
arXiv:2603.02221v1 Announce Type: new Abstract: In healthcare tabular predictions, classical models with feature engineering often outperform neural approaches. Recent advances in Large Language Models enable the integration of domain knowledge into feature engineering, offering a promising direction. However, existing approaches...
Physics-Informed Neural Networks with Architectural Physics Embedding for Large-Scale Wave Field Reconstruction
arXiv:2603.02231v1 Announce Type: new Abstract: Large-scale wave field reconstruction requires precise solutions but faces challenges with computational efficiency and accuracy. The physics-based numerical methods like Finite Element Method (FEM) provide high accuracy but struggle with large-scale or high-frequency problems due...
Concept Heterogeneity-aware Representation Steering
arXiv:2603.02237v1 Announce Type: new Abstract: Representation steering offers a lightweight mechanism for controlling the behavior of large language models (LLMs) by intervening on internal activations at inference time. Most existing methods rely on a single global steering direction, typically obtained...
Length Generalization Bounds for Transformers
arXiv:2603.02238v1 Announce Type: new Abstract: Length generalization is a key property of a learning algorithm that enables it to make correct predictions on inputs of any length, given finite training data. To provide such a guarantee, one needs to be...
High-order Knowledge Based Network Controllability Robustness Prediction: A Hypergraph Neural Network Approach
arXiv:2603.02265v1 Announce Type: new Abstract: In order to evaluate the invulnerability of networks against various types of attacks and provide guidance for potential performance enhancement as well as controllability maintenance, network controllability robustness (NCR) has attracted increasing attention in recent...
PRISM: Exploring Heterogeneous Pretrained EEG Foundation Model Transfer to Clinical Differential Diagnosis
arXiv:2603.02268v1 Announce Type: new Abstract: EEG foundation models are typically pretrained on narrow-source clinical archives and evaluated on benchmarks from the same ecosystem, leaving unclear whether representations encode neural physiology or recording-distribution artifacts. We introduce PRISM (Population Representative Invariant Signal...
The Malignant Tail: Spectral Segregation of Label Noise in Over-Parameterized Networks
arXiv:2603.02293v1 Announce Type: new Abstract: While implicit regularization facilitates benign overfitting in low-noise regimes, recent theoretical work predicts a sharp phase transition to harmful overfitting as the noise-to-signal ratio increases. We experimentally isolate the geometric mechanism of this transition: the...
Preconditioned Score and Flow Matching
arXiv:2603.02337v1 Announce Type: new Abstract: Flow matching and score-based diffusion train vector fields under intermediate distributions $p_t$, whose geometry can strongly affect their optimization. We show that the covariance $\Sigma_t$ of $p_t$ governs optimization bias: when $\Sigma_t$ is ill-conditioned, and...
Rigidity-Aware Geometric Pretraining for Protein Design and Conformational Ensembles
arXiv:2603.02406v1 Announce Type: new Abstract: Generative models have recently advanced $\textit{de novo}$ protein design by learning the statistical regularities of natural structures. However, current approaches face three key limitations: (1) Existing methods cannot jointly learn protein geometry and design tasks,...
Spectral Regularization for Diffusion Models
arXiv:2603.02447v1 Announce Type: new Abstract: Diffusion models are typically trained using pointwise reconstruction objectives that are agnostic to the spectral and multi-scale structure of natural signals. We propose a loss-level spectral regularization framework that augments standard diffusion training with differentiable...
Thermodynamic Regulation of Finite-Time Gibbs Training in Energy-Based Models: A Restricted Boltzmann Machine Study
arXiv:2603.02525v1 Announce Type: new Abstract: Restricted Boltzmann Machines (RBMs) are typically trained using finite-length Gibbs chains under a fixed sampling temperature. This practice implicitly assumes that the stochastic regime remains valid as the energy landscape evolves during learning. We argue...
Bridging Diffusion Guidance and Anderson Acceleration via Hopfield Dynamics
arXiv:2603.02531v1 Announce Type: new Abstract: Classifier-Free Guidance (CFG) has significantly enhanced the generative quality of diffusion models by extrapolating between conditional and unconditional outputs. However, its high inference cost and limited applicability to distilled or single-step models have shifted research...
EdgeFLow: Serverless Federated Learning via Sequential Model Migration in Edge Networks
arXiv:2603.02562v1 Announce Type: new Abstract: Federated Learning (FL) has emerged as a transformative distributed learning paradigm in the era of Internet of Things (IoT), reconceptualizing data processing methodologies. However, FL systems face significant communication bottlenecks due to inevitable client-server data...
CVPR 2026 News and Resources for Press
Opinions for Wednesday, March 4
We were live as the court released its opinions in Urias-Orellana v. Bondi and Galette v. New Jersey Transit Corp..The postOpinions for Wednesday, March 4appeared first onSCOTUSblog.
The US military is still using Claude — but defense-tech clients are fleeing
As the U.S. continues its aerial attack on Iran, Anthropic models are being used for many targeting decisions.
From Literature to Hypotheses: An AI Co-Scientist System for Biomarker-Guided Drug Combination Hypothesis Generation
arXiv:2603.00612v1 Announce Type: new Abstract: The rapid growth of biomedical literature and curated databases has made it increasingly difficult for researchers to systematically connect biomarker mechanisms to actionable drug combination hypotheses. We present AI Co-Scientist (CoDHy), an interactive, human-in-the-loop system...
MedGPT-oss: Training a General-Purpose Vision-Language Model for Biomedicine
arXiv:2603.00842v1 Announce Type: new Abstract: Biomedical multimodal assistants have the potential to unify radiology, pathology, and clinical-text reasoning, yet a critical deployment gap remains: top-performing systems are either closed-source or computationally prohibitive, precluding the on-premises deployment required for patient privacy...