PMIScore: An Unsupervised Approach to Quantify Dialogue Engagement
arXiv:2603.13796v1 Announce Type: new Abstract: High dialogue engagement is a crucial indicator of an effective conversation. A reliable measure of engagement could help benchmark large language models, enhance the effectiveness of human-computer interactions, or improve personal communication skills. However, quantifying...
APEX-Searcher: Augmenting LLMs' Search Capabilities through Agentic Planning and Execution
arXiv:2603.13853v1 Announce Type: new Abstract: Retrieval-augmented generation (RAG), based on large language models (LLMs), serves as a vital approach to retrieving and leveraging external knowledge in various domain applications. When confronted with complex multi-hop questions, single-round retrieval is often insufficient...
OmniCompliance-100K: A Multi-Domain, Rule-Grounded, Real-World Safety Compliance Dataset
arXiv:2603.13933v1 Announce Type: new Abstract: Ensuring the safety and compliance of large language models (LLMs) is of paramount importance. However, existing LLM safety datasets often rely on ad-hoc taxonomies for data generation and suffer from a significant shortage of rule-grounded,...
sebis at ArchEHR-QA 2026: How Much Can You Do Locally? Evaluating Grounded EHR QA on a Single Notebook
arXiv:2603.13962v1 Announce Type: new Abstract: Clinical question answering over electronic health records (EHRs) can help clinicians and patients access relevant medical information more efficiently. However, many recent approaches rely on large cloud-based models, which are difficult to deploy in clinical...
FLUX: Data Worth Training On
arXiv:2603.13972v1 Announce Type: new Abstract: Modern large language model training is no longer limited by data availability, but by the inability of existing preprocessing pipelines to simultaneously achieve massive scale and high data quality. Current approaches are forced to sacrifice...
Rethinking Evaluation in Retrieval-Augmented Personalized Dialogue: A Cognitive and Linguistic Perspective
arXiv:2603.14217v1 Announce Type: new Abstract: In cognitive science and linguistic theory, dialogue is not seen as a chain of independent utterances but rather as a joint activity sustained by coherence, consistency, and shared understanding. However, many systems for open-domain and...
SemantiCache: Efficient KV Cache Compression via Semantic Chunking and Clustered Merging
arXiv:2603.14303v1 Announce Type: new Abstract: Existing KV cache compression methods generally operate on discrete tokens or non-semantic chunks. However, such approaches often lead to semantic fragmentation, where linguistically coherent units are disrupted, causing irreversible information loss and degradation in model...
Mind the Shift: Decoding Monetary Policy Stance from FOMC Statements with Large Language Models
arXiv:2603.14313v1 Announce Type: new Abstract: Federal Open Market Committee (FOMC) statements are a major source of monetary-policy information, and even subtle changes in their wording can move global financial markets. A central task is therefore to measure the hawkish--dovish stance...
Motivation in Large Language Models
arXiv:2603.14347v1 Announce Type: new Abstract: Motivation is a central driver of human behavior, shaping decisions, goals, and task performance. As large language models (LLMs) become increasingly aligned with human preferences, we ask whether they exhibit something akin to motivation. We...
Exposing Long-Tail Safety Failures in Large Language Models through Efficient Diverse Response Sampling
arXiv:2603.14355v1 Announce Type: new Abstract: Safety tuning through supervised fine-tuning and reinforcement learning from human feedback has substantially improved the robustness of large language models (LLMs). However, it often suppresses rather than eliminates unsafe behaviors, leaving rare but critical failures...
PARSA-Bench: A Comprehensive Persian Audio-Language Model Benchmark
arXiv:2603.14456v1 Announce Type: new Abstract: Persian poses unique audio understanding challenges through its classical poetry, traditional music, and pervasive code-switching - none captured by existing benchmarks. We introduce PARSA-Bench (Persian Audio Reasoning and Speech Assessment Benchmark), the first benchmark for...
Translational Gaps in Graph Transformers for Longitudinal EHR Prediction: A Critical Appraisal of GT-BEHRT
arXiv:2603.13231v1 Announce Type: new Abstract: Transformer-based models have improved predictive modeling on longitudinal electronic health records through large-scale self-supervised pretraining. However, most EHR transformer architectures treat each clinical encounter as an unordered collection of codes, which limits their ability to...
RFX-Fuse: Breiman and Cutler's Unified ML Engine + Native Explainable Similarity
arXiv:2603.13234v1 Announce Type: new Abstract: Breiman and Cutler's original Random Forest was designed as a unified ML engine -- not merely an ensemble predictor. Their implementation included classification, regression, unsupervised learning, proximity-based similarity, outlier detection, missing value imputation, and visualization...
Beyond Attention: True Adaptive World Models via Spherical Kernel Operator
arXiv:2603.13263v1 Announce Type: new Abstract: The pursuit of world model based artificial intelligence has predominantly relied on projecting high-dimensional observations into parameterized latent spaces, wherein transition dynamics are subsequently learned. However, this conventional paradigm is mathematically flawed: it merely displaces...
Knowledge, Rules and Their Embeddings: Two Paths towards Neuro-Symbolic JEPA
arXiv:2603.13265v1 Announce Type: new Abstract: Modern self-supervised predictive architectures excel at capturing complex statistical correlations from high-dimensional data but lack mechanisms to internalize verifiable human logic, leaving them susceptible to spurious correlations and shortcut learning. Conversely, traditional rule-based inference systems...
Spatially Aware Deep Learning for Microclimate Prediction from High-Resolution Geospatial Imagery
arXiv:2603.13273v1 Announce Type: new Abstract: Microclimate models are essential for linking climate to ecological processes, yet most physically based frameworks estimate temperature independently for each spatial unit and rely on simplified representations of lateral heat exchange. As a result, the...
FastODT: A tree-based framework for efficient continual learning
arXiv:2603.13276v1 Announce Type: new Abstract: Machine learning models deployed in real-world settings must operate under evolving data distributions and constrained computational resources. This challenge is particularly acute in non-stationary domains such as energy time series, weather monitoring, and environmental sensing....
Learning Retrieval Models with Sparse Autoencoders
arXiv:2603.13277v1 Announce Type: new Abstract: Sparse autoencoders (SAEs) provide a powerful mechanism for decomposing the dense representations produced by Large Language Models (LLMs) into interpretable latent features. We posit that SAEs constitute a natural foundation for Learned Sparse Retrieval (LSR),...
Demand Acceptance using Reinforcement Learning for Dynamic Vehicle Routing Problem with Emission Quota
arXiv:2603.13279v1 Announce Type: new Abstract: This paper introduces and formalizes the Dynamic and Stochastic Vehicle Routing Problem with Emission Quota (DS-QVRP-RR), a novel routing problems that integrates dynamic demand acceptance and routing with a global emission constraint. A key contribution...
FedTreeLoRA: Reconciling Statistical and Functional Heterogeneity in Federated LoRA Fine-Tuning
arXiv:2603.13282v1 Announce Type: new Abstract: Federated Learning (FL) with Low-Rank Adaptation (LoRA) has become a standard for privacy-preserving LLM fine-tuning. However, existing personalized methods predominantly operated under a restrictive Flat-Model Assumption: they addressed client-side \textit{statistical heterogeneity} but treated the model...
Brittlebench: Quantifying LLM robustness via prompt sensitivity
arXiv:2603.13285v1 Announce Type: new Abstract: Existing evaluation methods largely rely on clean, static benchmarks, which can overestimate true model performance by failing to capture the noise and variability inherent in real-world user inputs. This is especially true for language models,...
Pragma-VL: Towards a Pragmatic Arbitration of Safety and Helpfulness in MLLMs
arXiv:2603.13292v1 Announce Type: new Abstract: Multimodal Large Language Models (MLLMs) pose critical safety challenges, as they are susceptible not only to adversarial attacks such as jailbreaking but also to inadvertently generating harmful content for benign users. While internal safety alignment...
A Robust Framework for Secure Cardiovascular Risk Prediction: An Architectural Case Study of Differentially Private Federated Learning
arXiv:2603.13293v1 Announce Type: new Abstract: Accurate cardiovascular risk prediction is crucial for preventive healthcare; however, the development of robust Artificial Intelligence (AI) models is hindered by the fragmentation of clinical data across institutions due to stringent privacy regulations. This paper...
FusionCast: Enhancing Precipitation Nowcasting with Asymmetric Cross-Modal Fusion and Future Radar Priors
arXiv:2603.13298v1 Announce Type: new Abstract: Deep learning has significantly improved the accuracy of precipitation nowcasting. However, most existing multimodal models typically use simple channel concatenation or interpolation methods for data fusion, which often overlook the feature differences between different modalities....
DreamReader: An Interpretability Toolkit for Text-to-Image Models
arXiv:2603.13299v1 Announce Type: new Abstract: Despite the rapid adoption of text-to-image (T2I) diffusion models, causal and representation-level analysis remains fragmented and largely limited to isolated probing techniques. To address this gap, we introduce DreamReader: a unified framework that formalizes diffusion...
Evidence-based Distributional Alignment for Large Language Models
arXiv:2603.13305v1 Announce Type: new Abstract: Distributional alignment enables large language models (LLMs) to predict how a target population distributes its responses across answer options, rather than collapsing disagreement into a single consensus answer. However, existing LLM-based distribution prediction is often...
Task Expansion and Cross Refinement for Open-World Conditional Modeling
arXiv:2603.13308v1 Announce Type: new Abstract: Open-world conditional modeling (OCM), requires a single model to answer arbitrary conditional queries across heterogeneous datasets, where observed variables and targets vary and arise from a vast open-ended task universe. Because any finite collection of...
Linear Predictability of Attention Heads in Large Language Models
arXiv:2603.13314v1 Announce Type: new Abstract: Large language model (LLM) inference is increasingly bottlenecked by the Key-Value (KV) cache, yet the fine-grained structure of attention-head activations remains poorly understood. We show that pretrained Transformers exhibit a pervasive inter-head linear structure: for...
Evaluating Large Language Models for Gait Classification Using Text-Encoded Kinematic Waveforms
arXiv:2603.13317v1 Announce Type: new Abstract: Background: Machine learning (ML) enhances gait analysis but often lacks the level of interpretability desired for clinical adoption. Large Language Models (LLMs) may offer explanatory capabilities and confidence-aware outputs when applied to structured kinematic data....
Residual Stream Analysis of Overfitting And Structural Disruptions
arXiv:2603.13318v1 Announce Type: new Abstract: Ensuring that large language models (LLMs) remain both helpful and harmless poses a significant challenge: fine-tuning on repetitive safety datasets, where unsafe prompts are paired with standard refusal templates, often leads to false refusals, in...