Boost Like a (Var)Pro: Trust-Region Gradient Boosting via Variable Projection
arXiv:2603.23658v1 Announce Type: new Abstract: Gradient boosting, a method of building additive ensembles from weak learners, has established itself as a practical and theoretically-motivated approach to approximate functions, especially using decision tree weak learners. Comparable methods for smooth parametric learners,...
CDMT-EHR: A Continuous-Time Diffusion Framework for Generating Mixed-Type Time-Series Electronic Health Records
arXiv:2603.23719v1 Announce Type: new Abstract: Electronic health records (EHRs) are invaluable for clinical research, yet privacy concerns severely restrict data sharing. Synthetic data generation offers a promising solution, but EHRs present unique challenges: they contain both numerical and categorical features...
BXRL: Behavior-Explainable Reinforcement Learning
arXiv:2603.23738v1 Announce Type: new Abstract: A major challenge of Reinforcement Learning is that agents often learn undesired behaviors that seem to defy the reward structure they were given. Explainable Reinforcement Learning (XRL) methods can answer queries such as "explain this...
Self Paced Gaussian Contextual Reinforcement Learning
arXiv:2603.23755v1 Announce Type: new Abstract: Curriculum learning improves reinforcement learning (RL) efficiency by sequencing tasks from simple to complex. However, many self-paced curriculum methods rely on computationally expensive inner-loop optimizations, limiting their scalability in high-dimensional context spaces. In this paper,...
Probabilistic Geometric Alignment via Bayesian Latent Transport for Domain-Adaptive Foundation Models
arXiv:2603.23783v1 Announce Type: new Abstract: Adapting large-scale foundation models to new domains with limited supervision remains a fundamental challenge due to latent distribution mismatch, unstable optimization dynamics, and miscalibrated uncertainty propagation. This paper introduces an uncertainty-aware probabilistic latent transport framework...
Latent Algorithmic Structure Precedes Grokking: A Mechanistic Study of ReLU MLPs on Modular Arithmetic
arXiv:2603.23784v1 Announce Type: new Abstract: Grokking-the phenomenon where validation accuracy of neural networks on modular addition of two integers rises long after training data has been memorized-has been characterized in previous works as producing sinusoidal input weight distributions in transformers...
Manifold Generalization Provably Proceeds Memorization in Diffusion Models
arXiv:2603.23792v1 Announce Type: new Abstract: Diffusion models often generate novel samples even when the learned score is only \emph{coarse} -- a phenomenon not accounted for by the standard view of diffusion training as density estimation. In this paper, we show...
Resolving gradient pathology in physics-informed epidemiological models
arXiv:2603.23799v1 Announce Type: new Abstract: Physics-informed neural networks (PINNs) are increasingly used in mathematical epidemiology to bridge the gap between noisy clinical data and compartmental models, such as the susceptible-exposed-infected-removed (SEIR) model. However, training these hybrid networks is often unstable...
Deep Neural Regression Collapse
arXiv:2603.23805v1 Announce Type: new Abstract: Neural Collapse is a phenomenon that helps identify sparse and low rank structures in deep classifiers. Recent work has extended the definition of neural collapse to regression problems, albeit only measuring the phenomenon at the...
Circuit Complexity of Hierarchical Knowledge Tracing and Implications for Log-Precision Transformers
arXiv:2603.23823v1 Announce Type: new Abstract: Knowledge tracing models mastery over interconnected concepts, often organized by prerequisites. We analyze hierarchical prerequisite propagation through a circuit-complexity lens to clarify what is provable about transformer-style computation on deep concept hierarchies. Using recent results...
Unveiling Hidden Convexity in Deep Learning: a Sparse Signal Processing Perspective
arXiv:2603.23831v1 Announce Type: new Abstract: Deep neural networks (DNNs), particularly those using Rectified Linear Unit (ReLU) activation functions, have achieved remarkable success across diverse machine learning tasks, including image recognition, audio processing, and language modeling. Despite this success, the non-convex...
Symbolic--KAN: Kolmogorov-Arnold Networks with Discrete Symbolic Structure for Interpretable Learning
arXiv:2603.23854v1 Announce Type: new Abstract: Symbolic discovery of governing equations is a long-standing goal in scientific machine learning, yet a fundamental trade-off persists between interpretability and scalable learning. Classical symbolic regression methods yield explicit analytic expressions but rely on combinatorial...
An Invariant Compiler for Neural ODEs in AI-Accelerated Scientific Simulation
arXiv:2603.23861v1 Announce Type: new Abstract: Neural ODEs are increasingly used as continuous-time models for scientific and sensor data, but unconstrained neural ODEs can drift and violate domain invariants (e.g., conservation laws), yielding physically implausible solutions. In turn, this can compound...
Deep Convolutional Neural Networks for predicting highest priority functional group in organic molecules
arXiv:2603.23862v1 Announce Type: new Abstract: Our work addresses the problem of predicting the highest priority functional group present in an organic molecule. Functional Groups are groups of bound atoms that determine the physical and chemical properties of organic molecules. In...
Can VLMs Reason Robustly? A Neuro-Symbolic Investigation
arXiv:2603.23867v1 Announce Type: new Abstract: Vision-Language Models (VLMs) have been applied to a wide range of reasoning tasks, yet it remains unclear whether they can reason robustly under distribution shifts. In this paper, we study covariate shifts in which the...
HDPO: Hybrid Distillation Policy Optimization via Privileged Self-Distillation
arXiv:2603.23871v1 Announce Type: new Abstract: Large language models trained with reinforcement learning (RL) for mathematical reasoning face a fundamental challenge: on problems the model cannot solve at all - "cliff" prompts - the RL gradient vanishes entirely, preventing any learning...
Off-Policy Safe Reinforcement Learning with Constrained Optimistic Exploration
arXiv:2603.23889v1 Announce Type: new Abstract: When safety is formulated as a limit of cumulative cost, safe reinforcement learning (RL) aims to learn policies that maximize return subject to the cost constraint in data collection and deployment. Off-policy safe RL methods,...
Optimal Variance-Dependent Regret Bounds for Infinite-Horizon MDPs
arXiv:2603.23926v1 Announce Type: new Abstract: Online reinforcement learning in infinite-horizon Markov decision processes (MDPs) remains less theoretically and algorithmically developed than its episodic counterpart, with many algorithms suffering from high ``burn-in'' costs and failing to adapt to benign instance-specific complexity....
GRMLR: Knowledge-Enhanced Small-Data Learning for Deep-Sea Cold Seep Stage Inference
arXiv:2603.23961v1 Announce Type: new Abstract: Deep-sea cold seep stage assessment has traditionally relied on costly, high-risk manned submersible operations and visual surveys of macrofauna. Although microbial communities provide a promising and more cost-effective alternative, reliable inference remains challenging because the...
Kirchhoff-Inspired Neural Networks for Evolving High-Order Perception
arXiv:2603.23977v1 Announce Type: new Abstract: Deep learning architectures are fundamentally inspired by neuroscience, particularly the structure of the brain's sensory pathways, and have achieved remarkable success in learning informative data representations. Although these architectures mimic the communication mechanisms of biological...
Transcending Classical Neural Network Boundaries: A Quantum-Classical Synergistic Paradigm for Seismic Data Processing
arXiv:2603.23984v1 Announce Type: new Abstract: In recent years, a number of neural-network (NN) methods have exhibited good performance in seismic data processing, such as denoising, interpolation, and frequency-band extension. However, these methods rely on stacked perceptrons and standard activation functions,...
Diet Your LLM: Dimension-wise Global Pruning of LLMs via Merging Task-specific Importance Score
arXiv:2603.23985v1 Announce Type: new Abstract: Large language models (LLMs) have demonstrated remarkable capabilities, but their massive scale poses significant challenges for practical deployment. Structured pruning offers a promising solution by removing entire dimensions or layers, yet existing methods face critical...
Can we generate portable representations for clinical time series data using LLMs?
arXiv:2603.23987v1 Announce Type: new Abstract: Deploying clinical ML is slow and brittle: models that work at one hospital often degrade under distribution shifts at the next. In this work, we study a simple question -- can large language models (LLMs)...
Understanding the Challenges in Iterative Generative Optimization with LLMs
arXiv:2603.23994v1 Announce Type: new Abstract: Generative optimization uses large language models (LLMs) to iteratively improve artifacts (such as code, workflows or prompts) using execution feedback. It is a promising approach to building self-improving agents, yet in practice remains brittle: despite...
i-IF-Learn: Iterative Feature Selection and Unsupervised Learning for High-Dimensional Complex Data
arXiv:2603.24025v1 Announce Type: new Abstract: Unsupervised learning of high-dimensional data is challenging due to irrelevant or noisy features obscuring underlying structures. It's common that only a few features, called the influential features, meaningfully define the clusters. Recovering these influential features...
Lagrangian Relaxation Score-based Generation for Mixed Integer linear Programming
arXiv:2603.24033v1 Announce Type: new Abstract: Predict-and-search (PaS) methods have shown promise for accelerating mixed-integer linear programming (MILP) solving. However, existing approaches typically assume variable independence and rely on deterministic single-point predictions, which limits solution diversityand often necessitates extensive downstream search...
Birthright citizenship: more on Pete Patterson’s claims
Attorney Pete Patterson’s latest post on birthright citizenship repeats the biggest mistakes of his original post and also makes some new mistakes, chasing irrelevances and mangling the key legal issues. […]The postBirthright citizenship: more on Pete Patterson’s claimsappeared first onSCOTUSblog.
Court to consider ability of federal courts to confirm arbitration awards
Next week’s argument in Jules v Andre Balazs Properties considers a technical question about the jurisdiction of federal courts to enforce an arbitration award. It is the immediate successor of […]The postCourt to consider ability of federal courts to confirm...
Justices dubious about “harsh” rules for omissions by bankrupt debtors
Yesterday’s argument in Keathley v. Buddy Ayers Construction displayed a bench almost uniformly skeptical of a lower court’s absolute standard for responding to the failure of a debtor in bankruptcy […]The postJustices dubious about “harsh” rules for omissions by bankrupt...
Justices to hear argument on whether a crime’s “contemplated effects” can expand venue beyond where offense was committed
The Supreme Court will hear oral argument on Monday in Abouammo v. United States, in which it will consider whether federal prosecutors can try a defendant not only in the […]The postJustices to hear argument on whether a crime’s “contemplated...