CapTrack: Multifaceted Evaluation of Forgetting in LLM Post-Training
arXiv:2603.06610v1 Announce Type: new Abstract: Large language model (LLM) post-training enhances latent skills, unlocks value alignment, improves performance, and enables domain adaptation. Unfortunately, post-training is known to induce forgetting, especially in the ubiquitous use-case of leveraging third-party pre-trained models, which...
Consensus is Not Verification: Why Crowd Wisdom Strategies Fail for LLM Truthfulness
arXiv:2603.06612v1 Announce Type: new Abstract: Pass@k and other methods of scaling inference compute can improve language model performance in domains with external verifiers, including mathematics and code, where incorrect candidates can be filtered reliably. This raises a natural question: can...
OptiRoulette Optimizer: A New Stochastic Meta-Optimizer for up to 5.3x Faster Convergence
arXiv:2603.06613v1 Announce Type: new Abstract: This paper presents OptiRoulette, a stochastic meta-optimizer that selects update rules during training instead of fixing a single optimizer. The method combines warmup optimizer locking, random sampling from an active optimizer pool, compatibility-aware learning-rate scaling...
Not all tokens are needed(NAT): token efficient reinforcement learning
arXiv:2603.06619v1 Announce Type: new Abstract: Reinforcement learning (RL) has become a key driver of progress in large language models, but scaling RL to long chain-of-thought (CoT) trajectories is increasingly constrained by backpropagation over every generated token. Even with optimized rollout...
Reward Under Attack: Analyzing the Robustness and Hackability of Process Reward Models
arXiv:2603.06621v1 Announce Type: new Abstract: Process Reward Models (PRMs) are rapidly becoming the backbone of LLM reasoning pipelines, yet we demonstrate that state-of-the-art PRMs are systematically exploitable under adversarial optimization pressure. To address this, we introduce a three-tiered diagnostic framework...
Pavement Missing Condition Data Imputation through Collective Learning-Based Graph Neural Networks
arXiv:2603.06625v1 Announce Type: new Abstract: Pavement condition data is important in providing information regarding the current state of the road network and in determining the needs of maintenance and rehabilitation treatments. However, the condition data is often incomplete due to...
Grouter: Decoupling Routing from Representation for Accelerated MoE Training
arXiv:2603.06626v1 Announce Type: new Abstract: Traditional Mixture-of-Experts (MoE) training typically proceeds without any structural priors, effectively requiring the model to simultaneously train expert weights while searching for an optimal routing policy within a vast combinatorial space. This entanglement often leads...
Leakage Safe Graph Features for Interpretable Fraud Detection in Temporal Transaction Networks
arXiv:2603.06632v1 Announce Type: new Abstract: Illicit transaction detection is often driven by transaction level attributes however, fraudulent behavior may also manifest through network structure such as central hubs, high flow intermediaries, and coordinated neighborhoods. This paper presents a time respecting,...
HEARTS: Benchmarking LLM Reasoning on Health Time Series
arXiv:2603.06638v1 Announce Type: new Abstract: The rise of large language models (LLMs) has shifted time series analysis from narrow analytics to general-purpose reasoning. Yet, existing benchmarks cover only a small set of health time series modalities and tasks, failing to...
ERP-RiskBench: Leakage-Safe Ensemble Learning for Financial Risk
arXiv:2603.06671v1 Announce Type: new Abstract: Financial risk detection in Enterprise Resource Planning (ERP) systems is an important but underexplored application of machine learning. Published studies in this area tend to suffer from vague dataset descriptions, leakage-prone pipelines, and evaluation practices...
From Statistical Fidelity to Clinical Consistency: Scalable Generation and Auditing of Synthetic Patient Trajectories
arXiv:2603.06720v1 Announce Type: new Abstract: Access to electronic health records (EHRs) for digital health research is often limited by privacy regulations and institutional barriers. Synthetic EHRs have been proposed as a way to enable safe and sovereign data sharing; however,...
ProtAlign: Contrastive learning paradigm for Sequence and structure alignment
arXiv:2603.06722v1 Announce Type: new Abstract: Protein language models often take into consideration the alignment between a protein sequence and its textual description. However, they do not take structural information into consideration. Traditional methods treat sequence and structure separately, limiting the...
Bi Directional Feedback Fusion for Activity Aware Forecasting of Indoor CO2 and PM2.5
arXiv:2603.06724v1 Announce Type: new Abstract: Indoor air quality (IAQ) forecasting plays a critical role in safeguarding occupant health, ensuring thermal comfort, and supporting intelligent building control. However, predicting future concentrations of key pollutants such as carbon dioxide (CO2) and fine...
Safe Transformer: An Explicit Safety Bit For Interpretable And Controllable Alignment
arXiv:2603.06727v1 Announce Type: new Abstract: Current safety alignment methods encode safe behavior implicitly within model parameters, creating a fundamental opacity: we cannot easily inspect why a model refuses a request, nor intervene when its safety judgments fail. We propose Safe...
Orion: Characterizing and Programming Apple's Neural Engine for LLM Training and Inference
arXiv:2603.06728v1 Announce Type: new Abstract: Over two billion Apple devices ship with a Neural Processing Unit (NPU) - the Apple Neural Engine (ANE) - yet this accelerator remains largely unused for large language model workloads. CoreML, Apple's public ML framework,...
Heterogeneous Decentralized Diffusion Models
arXiv:2603.06741v1 Announce Type: new Abstract: Training frontier-scale diffusion models often requires substantial computational resources concentrated in tightly coupled clusters, limiting participation to well-resourced institutions. While Decentralized Diffusion Models (DDM) enable training multiple experts in isolation, existing approaches require 1176 GPU-days...
Improved Constrained Generation by Bridging Pretrained Generative Models
arXiv:2603.06742v1 Announce Type: new Abstract: Constrained generative modeling is fundamental to applications such as robotic control and autonomous driving, where models must respect physical laws and safety-critical constraints. In real-world settings, these constraints rarely take the form of simple linear...
Enhancing Instruction Following of LLMs via Activation Steering with Dynamic Rejection
arXiv:2603.06745v1 Announce Type: new Abstract: Large Language Models (LLMs), despite advances in instruction tuning, often fail to follow complex user instructions. Activation steering techniques aim to mitigate this by manipulating model internals, but have a potential risk of oversteering, where...
Property-driven Protein Inverse Folding With Multi-Objective Preference Alignment
arXiv:2603.06748v1 Announce Type: new Abstract: Protein sequence design must balance designability, defined as the ability to recover a target backbone, with multiple, often competing, developability properties such as solubility, thermostability, and expression. Existing approaches address these properties through post hoc...
Latent Autoencoder Ensemble Kalman Filter for Data assimilation
arXiv:2603.06752v1 Announce Type: new Abstract: The ensemble Kalman filter (EnKF) is widely used for data assimilation in high-dimensional systems, but its performance often deteriorates for strongly nonlinear dynamics due to the structural mismatch between the Kalman update and the underlying...
In birthright citizenship case, Justice Department urges court to treat an old concept in a new way
Immigration Matters is a recurring series by César Cuauhtémoc García Hernández that analyzes the court’s immigration docket, highlighting emerging legal questions about new policy and enforcement practices. President Donald Trump’s […]The postIn birthright citizenship case, Justice Department urges court to...
SCOTUStoday for Monday, March 9
Just 22% of U.S. registered voters have “a great deal” (7%) or “quite a bit” (15%) of confidence in the Supreme Court, according to a new NBC News poll shared […]The postSCOTUStoday for Monday, March 9appeared first onSCOTUSblog.
Anthropic sues Defense Department over supply-chain risk designation
Anthropic filed suit against the Department of Defense on Monday after the agency labeled it a supply-chain risk. The complaint calls the DOD's actions "unprecedented and unlawful."
Qualcomm’s partnership with Neura Robotics is just the beginning
Neura Robotics is going to build new robots on top of Qualcomm's new IQ10 processors that were released at CES.
Agentic LLM Planning via Step-Wise PDDL Simulation: An Empirical Characterisation
arXiv:2603.06064v1 Announce Type: new Abstract: Task planning, the problem of sequencing actions to reach a goal from an initial state, is a core capability requirement for autonomous robotic systems. Whether large language models (LLMs) can serve as viable planners alongside...
Spatiotemporal Heterogeneity of AI-Driven Traffic Flow Patterns and Land Use Interaction: A GeoAI-Based Analysis of Multimodal Urban Mobility
arXiv:2603.05581v1 Announce Type: cross Abstract: Urban traffic flow is governed by the complex, nonlinear interaction between land use configuration and spatiotemporally heterogeneous mobility demand. Conventional global regression and time-series models cannot simultaneously capture these multi-scale dynamics across multiple travel modes....
Tool-Genesis: A Task-Driven Tool Creation Benchmark for Self-Evolving Language Agent
arXiv:2603.05578v1 Announce Type: cross Abstract: Research on self-evolving language agents has accelerated, drawing increasing attention to their ability to create, adapt, and maintain tools from task requirements. However, existing benchmarks predominantly rely on predefined specifications, which limits scalability and hinders...
PRISM: Personalized Refinement of Imitation Skills for Manipulation via Human Instructions
arXiv:2603.05574v1 Announce Type: cross Abstract: This paper presents PRISM: an instruction-conditioned refinement method for imitation policies in robotic manipulation. This approach bridges Imitation Learning (IL) and Reinforcement Learning (RL) frameworks into a seamless pipeline, such that an imitation policy on...
CBR-to-SQL: Rethinking Retrieval-based Text-to-SQL using Case-based Reasoning in the Healthcare Domain
arXiv:2603.05569v1 Announce Type: cross Abstract: Extracting insights from Electronic Health Record (EHR) databases often requires SQL expertise, creating a barrier for healthcare decision-making and research. While a promising approach is to use Large Language Models (LLMs) to translate natural language...
Towards Efficient and Stable Ocean State Forecasting: A Continuous-Time Koopman Approach
arXiv:2603.05560v1 Announce Type: cross Abstract: We investigate the Continuous-Time Koopman Autoencoder (CT-KAE) as a lightweight surrogate model for long-horizon ocean state forecasting in a two-layer quasi-geostrophic (QG) system. By projecting nonlinear dynamics into a latent space governed by a linear...