Hyperagents
arXiv:2603.19461v1 Announce Type: new Abstract: Self-improving AI systems aim to reduce reliance on human engineering by learning to improve their own learning and problem-solving processes. Existing approaches to self-improvement rely on fixed, handcrafted meta-level mechanisms, fundamentally limiting how fast such...
Transformers are Stateless Differentiable Neural Computers
arXiv:2603.19272v1 Announce Type: cross Abstract: Differentiable Neural Computers (DNCs) were introduced as recurrent architectures equipped with an addressable external memory supporting differentiable read and write operations. Transformers, in contrast, are nominally feedforward architectures based on multi-head self-attention. In this work...
LSR: Linguistic Safety Robustness Benchmark for Low-Resource West African Languages
arXiv:2603.19273v1 Announce Type: cross Abstract: Safety alignment in large language models relies predominantly on English-language training data. When harmful intent is expressed in low-resource languages, refusal mechanisms that hold in English frequently fail to activate. We introduce LSR (Linguistic Safety...
HypeLoRA: Hyper-Network-Generated LoRA Adapters for Calibrated Language Model Fine-Tuning
arXiv:2603.19278v1 Announce Type: cross Abstract: Modern Transformer-based models frequently suffer from miscalibration, producing overconfident predictions that do not reflect true empirical frequencies. This work investigates the calibration dynamics of LoRA: Low-Rank Adaptation and a novel hyper-network-based adaptation framework as parameter-efficient...
A Visualization for Comparative Analysis of Regression Models
arXiv:2603.19291v1 Announce Type: cross Abstract: As regression is a widely studied problem, many methods have been proposed to solve it, each of them often requiring setting different hyper-parameters. Therefore, selecting the proper method for a given application may be very...
Spelling Correction in Healthcare Query-Answer Systems: Methods, Retrieval Impact, and Empirical Evaluation
arXiv:2603.19249v1 Announce Type: new Abstract: Healthcare question-answering (QA) systems face a persistent challenge: users submit queries with spelling errors at rates substantially higher than those found in the professional documents they search. This paper presents the first controlled study of...
MOSAIC: Modular Opinion Summarization using Aspect Identification and Clustering
arXiv:2603.19277v1 Announce Type: new Abstract: Reviews are central to how travelers evaluate products on online marketplaces, yet existing summarization research often emphasizes end-to-end quality while overlooking benchmark reliability and the practical utility of granular insights. To address this, we propose...
Automatic Analysis of Collaboration Through Human Conversational Data Resources: A Review
arXiv:2603.19292v1 Announce Type: new Abstract: Collaboration is a task-oriented, high-level human behavior. In most cases, conversation serves as the primary medium for information exchange and coordination, making conversational data a valuable resource for the automatic analysis of collaborative processes. In...
Prompt-tuning with Attribute Guidance for Low-resource Entity Matching
arXiv:2603.19321v1 Announce Type: new Abstract: Entity Matching (EM) is an important task that determines the logical relationship between two entities, such as Same, Different, or Undecidable. Traditional EM approaches rely heavily on supervised learning, which requires large amounts of high-quality...
Scalable Prompt Routing via Fine-Grained Latent Task Discovery
arXiv:2603.19415v1 Announce Type: new Abstract: Prompt routing dynamically selects the most appropriate large language model from a pool of candidates for each query, optimizing performance while managing costs. As model pools scale to include dozens of frontier models with narrow...
Is Evaluation Awareness Just Format Sensitivity? Limitations of Probe-Based Evidence under Controlled Prompt Structure
arXiv:2603.19426v1 Announce Type: new Abstract: Prior work uses linear probes on benchmark prompts as evidence of evaluation awareness in large language models. Because evaluation context is typically entangled with benchmark format and genre, it is unclear whether probe-based signals reflect...
Vocabulary shapes cross-lingual variation of word-order learnability in language models
arXiv:2603.19427v1 Announce Type: new Abstract: Why do some languages like Czech permit free word order, while others like English do not? We address this question by pretraining transformer language models on a spectrum of synthetic word-order variants of natural languages....
BrainSCL: Subtype-Guided Contrastive Learning for Brain Disorder Diagnosis
arXiv:2603.19295v1 Announce Type: new Abstract: Mental disorder populations exhibit pronounced heterogeneity -- that is, the significant differences between samples -- poses a significant challenge to the definition of positive pairs in contrastive learning. To address this, we propose a subtype-guided...
Parameter-Efficient Token Embedding Editing for Clinical Class-Level Unlearning
arXiv:2603.19302v1 Announce Type: new Abstract: Machine unlearning is increasingly important for clinical language models, where privacy regulations and institutional policies may require removing sensitive information from deployed systems without retraining from scratch. In practice, deletion requests must balance effective forgetting...
Exploring Subnetwork Interactions in Heterogeneous Brain Network via Prior-Informed Graph Learning
arXiv:2603.19307v1 Announce Type: new Abstract: Modeling the complex interactions among functional subnetworks is crucial for the diagnosis of mental disorders and the identification of functional pathways. However, learning the interactions of the underlying subnetworks remains a significant challenge for existing...
LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels
arXiv:2603.19312v1 Announce Type: new Abstract: Joint Embedding Predictive Architectures (JEPAs) offer a compelling framework for learning world models in compact latent spaces, yet existing methods remain fragile, relying on complex multi-term losses, exponential moving averages, pre-trained encoders, or auxiliary supervision...
MSNet and LS-Net: Scalable Multi-Scale Multi-Representation Networks for Time Series Classification
arXiv:2603.19315v1 Announce Type: new Abstract: Time series classification (TSC) performance depends not only on architectural design but also on the diversity of input representations. In this work, we propose a scalable multi-scale convolutional framework that systematically integrates structured multi-representation inputs...
FalconBC: Flow matching for Amortized inference of Latent-CONditioned physiologic Boundary Conditions
arXiv:2603.19331v1 Announce Type: new Abstract: Boundary condition tuning is a fundamental step in patient-specific cardiovascular modeling. Despite an increase in offline training cost, recent methods in data-driven variational inference can efficiently estimate the joint posterior distribution of boundary conditions, with...
DAPA: Distribution Aware Piecewise Activation Functions for On-Device Transformer Inference and Training
arXiv:2603.19338v1 Announce Type: new Abstract: Non-linear activation functions play a pivotal role in on-device inference and training, as they not only consume substantial hardware resources but also impose a significant impact on system performance and energy efficiency. In this work,...
Optimizing Resource-Constrained Non-Pharmaceutical Interventions for Multi-Cluster Outbreak Control Using Hierarchical Reinforcement Learning
arXiv:2603.19397v1 Announce Type: new Abstract: Non-pharmaceutical interventions (NPIs), such as diagnostic testing and quarantine, are crucial for controlling infectious disease outbreaks but are often constrained by limited resources, particularly in early outbreak stages. In real-world public health settings, resources must...
TRACE: Trajectory Recovery with State Propagation Diffusion for Urban Mobility
arXiv:2603.19474v1 Announce Type: new Abstract: High-quality GPS trajectories are essential for location-based web services and smart city applications, including navigation, ride-sharing and delivery. However, due to low sampling rates and limited infrastructure coverage during data collection, real-world trajectories are often...
Subspace Kernel Learning on Tensor Sequences
arXiv:2603.19546v1 Announce Type: new Abstract: Learning from structured multi-way data, represented as higher-order tensors, requires capturing complex interactions across tensor modes while remaining computationally efficient. We introduce Uncertainty-driven Kernel Tensor Learning (UKTL), a novel kernel framework for $M$-mode tensors that...
Wearable Foundation Models Should Go Beyond Static Encoders
arXiv:2603.19564v1 Announce Type: new Abstract: Wearable foundation models (WFMs), trained on large volumes of data collected by affordable, always-on devices, have demonstrated strong performance on short-term, well-defined health monitoring tasks, including activity recognition, fitness tracking, and cardiovascular signal assessment. However,...
DeepStock: Reinforcement Learning with Policy Regularizations for Inventory Management
arXiv:2603.19621v1 Announce Type: new Abstract: Deep Reinforcement Learning (DRL) provides a general-purpose methodology for training inventory policies that can leverage big data and compute. However, off-the-shelf implementations of DRL have seen mixed success, often plagued by high sensitivity to the...
Alternating Diffusion for Proximal Sampling with Zeroth Order Queries
arXiv:2603.19633v1 Announce Type: new Abstract: This work introduces a new approximate proximal sampler that operates solely with zeroth-order information of the potential function. Prior theoretical analyses have revealed that proximal sampling corresponds to alternating forward and backward iterations of the...
RiboSphere: Learning Unified and Efficient Representations of RNA Structures
arXiv:2603.19636v1 Announce Type: new Abstract: Accurate RNA structure modeling remains difficult because RNA backbones are highly flexible, non-canonical interactions are prevalent, and experimentally determined 3D structures are comparatively scarce. We introduce \emph{RiboSphere}, a framework that learns \emph{discrete} geometric representations of...
Heavy-Tailed and Long-Range Dependent Noise in Stochastic Approximation: A Finite-Time Analysis
arXiv:2603.19648v1 Announce Type: new Abstract: Stochastic approximation (SA) is a fundamental iterative framework with broad applications in reinforcement learning and optimization. Classical analyses typically rely on martingale difference or Markov noise with bounded second moments, but many practical settings, including...
Scale-Dependent Radial Geometry and Metric Mismatch in Wasserstein Propagation for Reverse Diffusion
arXiv:2603.19670v1 Announce Type: new Abstract: Existing analyses of reverse diffusion often propagate sampling error in the Euclidean geometry underlying \(\Wtwo\) along the entire reverse trajectory. Under weak log-concavity, however, Gaussian smoothing can create contraction first at large separations while short...
Cursor admits its new coding model was built on top of Moonshot AI’s Kimi
Building on top of a Chinese model feels particularly fraught right now.
Elon Musk unveils chip manufacturing plans for SpaceX and Tesla
Elon Musk recently outlined ambitious plans for a chip-building collaboration Tesla and SpaceX — but he has a history of overpromising.