BEAVER: A Training-Free Hierarchical Prompt Compression Method via Structure-Aware Page Selection
arXiv:2603.19635v1 Announce Type: new Abstract: The exponential expansion of context windows in LLMs has unlocked capabilities for long-document understanding but introduced severe bottlenecks in inference latency and information utilization. Existing compression methods often suffer from high training costs or semantic...
Maximizing mutual information between user-contexts and responses improve LLM personalization with no additional data
arXiv:2603.19294v1 Announce Type: new Abstract: While post-training has successfully improved large language models (LLMs) across a variety of domains, these gains heavily rely on human-labeled data or external verifiers. Existing data has already been exploited, and new high-quality data is...
BrainSCL: Subtype-Guided Contrastive Learning for Brain Disorder Diagnosis
arXiv:2603.19295v1 Announce Type: new Abstract: Mental disorder populations exhibit pronounced heterogeneity -- that is, the significant differences between samples -- poses a significant challenge to the definition of positive pairs in contrastive learning. To address this, we propose a subtype-guided...
PRIME-CVD: A Parametrically Rendered Informatics Medical Environment for Education in Cardiovascular Risk Modelling
arXiv:2603.19299v1 Announce Type: new Abstract: In recent years, progress in medical informatics and machine learning has been accelerated by the availability of openly accessible benchmark datasets. However, patient-level electronic medical record (EMR) data are rarely available for teaching or methodological...
Exploring Subnetwork Interactions in Heterogeneous Brain Network via Prior-Informed Graph Learning
arXiv:2603.19307v1 Announce Type: new Abstract: Modeling the complex interactions among functional subnetworks is crucial for the diagnosis of mental disorders and the identification of functional pathways. However, learning the interactions of the underlying subnetworks remains a significant challenge for existing...
MemReward: Graph-Based Experience Memory for LLM Reward Prediction with Limited Labels
arXiv:2603.19310v1 Announce Type: new Abstract: Training large language models (LLMs) for complex reasoning via reinforcement learning requires reward labels that specify whether the generated rollouts are correct. However, obtaining reward labels at scale often requires expensive human labeling or time-consuming...
DPxFin: Adaptive Differential Privacy for Anti-Money Laundering Detection via Reputation-Weighted Federated Learning
arXiv:2603.19314v1 Announce Type: new Abstract: In the modern financial system, combating money laundering is a critical challenge complicated by data privacy concerns and increasingly complex fraud transaction patterns. Although federated learning (FL) is a promising problem-solving approach as it allows...
MSNet and LS-Net: Scalable Multi-Scale Multi-Representation Networks for Time Series Classification
arXiv:2603.19315v1 Announce Type: new Abstract: Time series classification (TSC) performance depends not only on architectural design but also on the diversity of input representations. In this work, we propose a scalable multi-scale convolutional framework that systematically integrates structured multi-representation inputs...
A General Deep Learning Framework for Wireless Resource Allocation under Discrete Constraints
arXiv:2603.19322v1 Announce Type: new Abstract: While deep learning (DL)-based methods have achieved remarkable success in continuous wireless resource allocation, efficient solutions for problems involving discrete variables remain challenging. This is primarily due to the zero-gradient issue in backpropagation, the difficulty...
Target Concept Tuning Improves Extreme Weather Forecasting
arXiv:2603.19325v1 Announce Type: new Abstract: Deep learning models for meteorological forecasting often fail in rare but high-impact events such as typhoons, where relevant data is scarce. Existing fine-tuning methods typically face a trade-off between overlooking these extreme events and overfitting...
DAPA: Distribution Aware Piecewise Activation Functions for On-Device Transformer Inference and Training
arXiv:2603.19338v1 Announce Type: new Abstract: Non-linear activation functions play a pivotal role in on-device inference and training, as they not only consume substantial hardware resources but also impose a significant impact on system performance and energy efficiency. In this work,...
Anatomical Heterogeneity in Transformer Language Models
arXiv:2603.19348v1 Announce Type: new Abstract: Current transformer language models are trained with uniform computational budgets across all layers, implicitly assuming layer homogeneity. We challenge this assumption through empirical analysis of SmolLM2-135M, a 30-layer, 135M-parameter causal language model, using five diagnostic...
Warm-Start Flow Matching for Guaranteed Fast Text/Image Generation
arXiv:2603.19360v1 Announce Type: new Abstract: Current auto-regressive (AR) LLMs, diffusion-based text/image generative models, and recent flow matching (FM) algorithms are capable of generating premium quality text/image samples. However, the inference or sample generation in these models is often very time-consuming...
GeoLAN: Geometric Learning of Latent Explanatory Directions in Large Language Models
arXiv:2603.19460v1 Announce Type: new Abstract: Large language models (LLMs) demonstrate strong performance, but they often lack transparency. We introduce GeoLAN, a training framework that treats token representations as geometric trajectories and applies stickiness conditions inspired by recent developments related to...
Global Convergence of Multiplicative Updates for the Matrix Mechanism: A Collaborative Proof with Gemini 3
arXiv:2603.19465v1 Announce Type: new Abstract: We analyze a fixed-point iteration $v \leftarrow \phi(v)$ arising in the optimization of a regularized nuclear norm objective involving the Hadamard product structure, posed in~\cite{denisov} in the context of an optimization problem over the space...
Adaptive Layerwise Perturbation: Unifying Off-Policy Corrections for LLM RL
arXiv:2603.19470v1 Announce Type: new Abstract: Off-policy problems such as policy staleness and training-inference mismatch, has become a major bottleneck for training stability and further exploration for LLM RL. To enhance inference efficiency, the distribution gap between the inference and updated...
ICLAD: In-Context Learning for Unified Tabular Anomaly Detection Across Supervision Regimes
arXiv:2603.19497v1 Announce Type: new Abstract: Anomaly detection on tabular data is commonly studied under three supervision regimes, including one-class settings that assume access to anomaly-free training samples, fully unsupervised settings with unlabeled and potentially contaminated training data, and semi-supervised settings...
ARMOR: Adaptive Resilience Against Model Poisoning Attacks in Continual Federated Learning for Mobile Indoor Localization
arXiv:2603.19594v1 Announce Type: new Abstract: Indoor localization has become increasingly essential for applications ranging from asset tracking to delivering personalized services. Federated learning (FL) offers a privacy-preserving approach by training a centralized global model (GM) using distributed data from mobile...
Demonstrations, CoT, and Prompting: A Theoretical Analysis of ICL
arXiv:2603.19611v1 Announce Type: new Abstract: In-Context Learning (ICL) enables pretrained LLMs to adapt to downstream tasks by conditioning on a small set of input-output demonstrations, without any parameter updates. Although there have been many theoretical efforts to explain how ICL...
On Performance Guarantees for Federated Learning with Personalized Constraints
arXiv:2603.19617v1 Announce Type: new Abstract: Federated learning (FL) has emerged as a communication-efficient algorithmic framework for distributed learning across multiple agents. While standard FL formulations capture unconstrained or globally constrained problems, many practical settings involve heterogeneous resource or model constraints,...
DeepStock: Reinforcement Learning with Policy Regularizations for Inventory Management
arXiv:2603.19621v1 Announce Type: new Abstract: Deep Reinforcement Learning (DRL) provides a general-purpose methodology for training inventory policies that can leverage big data and compute. However, off-the-shelf implementations of DRL have seen mixed success, often plagued by high sensitivity to the...
Continual Learning for Food Category Classification Dataset: Enhancing Model Adaptability and Performance
arXiv:2603.19624v1 Announce Type: new Abstract: Conventional machine learning pipelines often struggle to recognize categories absent from the original trainingset. This gap typically reduces accuracy, as fixed datasets rarely capture the full diversity of a domain. To address this, we propose...
Alternating Diffusion for Proximal Sampling with Zeroth Order Queries
arXiv:2603.19633v1 Announce Type: new Abstract: This work introduces a new approximate proximal sampler that operates solely with zeroth-order information of the potential function. Prior theoretical analyses have revealed that proximal sampling corresponds to alternating forward and backward iterations of the...
RiboSphere: Learning Unified and Efficient Representations of RNA Structures
arXiv:2603.19636v1 Announce Type: new Abstract: Accurate RNA structure modeling remains difficult because RNA backbones are highly flexible, non-canonical interactions are prevalent, and experimentally determined 3D structures are comparatively scarce. We introduce \emph{RiboSphere}, a framework that learns \emph{discrete} geometric representations of...
An exclusive tour of Amazon’s Trainium lab, the chip that’s won over Anthropic, OpenAI, even Apple
Shortly after Amazon announced its $50 billion investment in OpenAI, AWS invited me on a private tour of the chip lab at the heart of the deal.
Why Wall Street wasn’t won over by Nvidia’s big conference
Despite investor fears of an AI bubble, Nvidia's latest conference shows that most in the industry aren't concerned by that possibility.
An Onto-Relational-Sophic Framework for Governing Synthetic Minds
arXiv:2603.18633v1 Announce Type: new Abstract: The rapid evolution of artificial intelligence, from task-specific systems to foundation models exhibiting broad, flexible competence across reasoning, creative synthesis, and social interaction, has outpaced the conceptual and governance frameworks designed to manage it. Current...
TeachingCoach: A Fine-Tuned Scaffolding Chatbot for Instructional Guidance to Instructors
arXiv:2603.18189v1 Announce Type: new Abstract: Higher education instructors often lack timely and pedagogically grounded support, as scalable instructional guidance remains limited and existing tools rely on generic chatbot advice or non-scalable teaching center human-human consultations. We present TeachingCoach, a pedagogically...
ZEBRAARENA: A Diagnostic Simulation Environment for Studying Reasoning-Action Coupling in Tool-Augmented LLMs
arXiv:2603.18614v1 Announce Type: new Abstract: Tool-augmented large language models (LLMs) must tightly couple multi-step reasoning with external actions, yet existing benchmarks often confound this interplay with complex environment dynamics, memorized knowledge or dataset contamination. In this paper, we introduce ZebraArena,...
Agentic Flow Steering and Parallel Rollout Search for Spatially Grounded Text-to-Image Generation
arXiv:2603.18627v1 Announce Type: new Abstract: Precise Text-to-Image (T2I) generation has achieved great success but is hindered by the limited relational reasoning of static text encoders and the error accumulation in open-loop sampling. Without real-time feedback, initial semantic ambiguities during the...