MedCausalX: Adaptive Causal Reasoning with Self-Reflection for Trustworthy Medical Vision-Language Models
arXiv:2603.23085v1 Announce Type: new Abstract: Vision-Language Models (VLMs) have enabled interpretable medical diagnosis by integrating visual perception with linguistic reasoning. Yet, existing medical chain-of-thought (CoT) models lack explicit mechanisms to represent and enforce causal reasoning, leaving them vulnerable to spurious...
Can LLM Agents Generate Real-World Evidence? Evaluating Observational Studies in Medical Databases
arXiv:2603.22767v1 Announce Type: new Abstract: Observational studies can yield clinically actionable evidence at scale, but executing them on real-world databases is open-ended and requires coherent decisions across cohort construction, analysis, and reporting. Prior evaluations of LLM agents emphasize isolated steps...
Detecting Non-Membership in LLM Training Data via Rank Correlations
arXiv:2603.22707v1 Announce Type: new Abstract: As large language models (LLMs) are trained on increasingly vast and opaque text corpora, determining which data contributed to training has become essential for copyright enforcement, compliance auditing, and user trust. While prior work focuses...
Improving Safety Alignment via Balanced Direct Preference Optimization
arXiv:2603.22829v1 Announce Type: new Abstract: With the rapid development and widespread application of Large Language Models (LLMs), their potential safety risks have attracted widespread attention. Reinforcement Learning from Human Feedback (RLHF) has been adopted to enhance the safety performance of...
CAPITU: A Benchmark for Evaluating Instruction-Following in Brazilian Portuguese with Literary Context
arXiv:2603.22576v1 Announce Type: new Abstract: We introduce CAPITU, a benchmark for evaluating instruction-following capabilities of Large Language Models (LLMs) in Brazilian Portuguese. Unlike existing benchmarks that focus on English or use generic prompts, CAPITU contextualizes all tasks within eight canonical...
Can Large Language Models Reason and Optimize Under Constraints?
arXiv:2603.23004v1 Announce Type: new Abstract: Large Language Models (LLMs) have demonstrated great capabilities across diverse natural language tasks; yet their ability to solve abstraction and optimization problems with constraints remains scarcely explored. In this paper, we investigate whether LLMs can...
Ran Score: a LLM-based Evaluation Score for Radiology Report Generation
arXiv:2603.22935v1 Announce Type: new Abstract: Chest X-ray report generation and automated evaluation are limited by poor recognition of low-prevalence abnormalities and inadequate handling of clinically important language, including negation and ambiguity. We develop a clinician-guided framework combining human expertise and...
JFTA-Bench: Evaluate LLM's Ability of Tracking and Analyzing Malfunctions Using Fault Trees
arXiv:2603.22978v1 Announce Type: new Abstract: In the maintenance of complex systems, fault trees are used to locate problems and provide targeted solutions. To enable fault trees stored as images to be directly processed by large language models, which can assist...
RelayS2S: A Dual-Path Speculative Generation for Real-Time Dialogue
arXiv:2603.23346v1 Announce Type: new Abstract: Real-time spoken dialogue systems face a fundamental tension between latency and response quality. End-to-end speech-to-speech (S2S) models respond immediately and naturally handle turn-taking, backchanneling, and interruption, but produce semantically weaker outputs. Cascaded pipelines (ASR ->...
KALAVAI: Predicting When Independent Specialist Fusion Works -- A Quantitative Model for Post-Hoc Cooperative LLM Training
arXiv:2603.22755v1 Announce Type: new Abstract: Independently trained domain specialists can be fused post-hoc into a single model that outperforms any individual specialist, and the gain is predictable: gain = 0.82 x divergence - 2.72 (R^2 = 0.856, n=6, 3-26% divergence)....
Efficient Hallucination Detection: Adaptive Bayesian Estimation of Semantic Entropy with Guided Semantic Exploration
arXiv:2603.22812v1 Announce Type: new Abstract: Large language models (LLMs) have achieved remarkable success in various natural language processing tasks, yet they remain prone to generating factually incorrect outputs known as hallucinations. While recent approaches have shown promise for hallucination detection...
RadTimeline: Timeline Summarization for Longitudinal Radiological Lung Findings
arXiv:2603.22820v1 Announce Type: new Abstract: Tracking findings in longitudinal radiology reports is crucial for accurately identifying disease progression, and the time-consuming process would benefit from automatic summarization. This work introduces a structured summarization task, where we frame longitudinal report summarization...
Analysing LLM Persona Generation and Fairness Interpretation in Polarised Geopolitical Contexts
arXiv:2603.22837v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly utilised for social simulation and persona generation, necessitating an understanding of how they represent geopolitical identities. In this paper, we analyse personas generated for Palestinian and Israeli identities by...
EchoKV: Efficient KV Cache Compression via Similarity-Based Reconstruction
arXiv:2603.22910v1 Announce Type: new Abstract: The increasing memory demand of the Key-Value (KV) cache poses a significant bottleneck for Large Language Models (LLMs) in long-context applications. Existing low-rank compression methods often rely on irreversible parameter transformations, sacrificing the flexibility to...
Multilingual KokoroChat: A Multi-LLM Ensemble Translation Method for Creating a Multilingual Counseling Dialogue Dataset
arXiv:2603.22913v1 Announce Type: new Abstract: To address the critical scarcity of high-quality, publicly available counseling dialogue datasets, we created Multilingual KokoroChat by translating KokoroChat, a large-scale manually authored Japanese counseling corpus, into both English and Chinese. A key challenge in...
Set-Valued Prediction for Large Language Models with Feasibility-Aware Coverage Guarantees
arXiv:2603.22966v1 Announce Type: new Abstract: Large language models (LLMs) inherently operate over a large generation space, yet conventional usage typically reports the most likely generation (MLG) as a point prediction, which underestimates the model's capability: although the top-ranked response can...
PaperVoyager : Building Interactive Web with Visual Language Models
arXiv:2603.22999v1 Announce Type: new Abstract: Recent advances in visual language models have enabled autonomous agents for complex reasoning, tool use, and document understanding. However, existing document agents mainly transform papers into static artifacts such as summaries, webpages, or slides, which...
From Synthetic to Native: Benchmarking Multilingual Intent Classification in Logistics Customer Service
arXiv:2603.23172v1 Announce Type: new Abstract: Multilingual intent classification is central to customer-service systems on global logistics platforms, where models must process noisy user queries across languages and hierarchical label spaces. Yet most existing multilingual benchmarks rely on machine-translated text, which...
Efficient Embedding-based Synthetic Data Generation for Complex Reasoning Tasks
arXiv:2603.22294v1 Announce Type: new Abstract: Synthetic Data Generation (SDG), leveraging Large Language Models (LLMs), has recently been recognized and broadly adopted as an effective approach to improve the performance of smaller but more resource and compute efficient LLMs through fine-tuning....
Latent Semantic Manifolds in Large Language Models
arXiv:2603.22301v1 Announce Type: new Abstract: Large Language Models (LLMs) perform internal computations in continuous vector spaces yet produce discrete tokens -- a fundamental mismatch whose geometric consequences remain poorly understood. We develop a mathematical framework that interprets LLM hidden states...
Sample Transform Cost-Based Training-Free Hallucination Detector for Large Language Models
arXiv:2603.22303v1 Announce Type: new Abstract: Hallucinations in large language models (LLMs) remain a central obstacle to trustworthy deployment, motivating detectors that are accurate, lightweight, and broadly applicable. Since an LLM with a prompt defines a conditional distribution, we argue that...
Full waveform inversion method based on diffusion model
arXiv:2603.22307v1 Announce Type: new Abstract: Seismic full-waveform inversion is a core technology for obtaining high-resolution subsurface model parameters. However, its highly nonlinear characteristics and strong dependence on the initial model often lead to the inversion process getting trapped in local...
Enhancing AI-Based Tropical Cyclone Track and Intensity Forecasting via Systematic Bias Correction
arXiv:2603.22314v1 Announce Type: new Abstract: Tropical cyclones (TCs) pose severe threats to life, infrastructure, and economies in tropical and subtropical regions, underscoring the critical need for accurate and timely forecasts of both track and intensity. Recent advances in AI-based weather...
Bridging the Gap Between Climate Science and Machine Learning in Climate Model Emulation
arXiv:2603.22320v1 Announce Type: new Abstract: While climate models provide insights for climate decision-making, their use is constrained by significant computational and technical demands. Although machine learning (ML) emulators offer a way to bypass the high computational costs, their effective use...
DAQ: Delta-Aware Quantization for Post-Training LLM Weight Compression
arXiv:2603.22324v1 Announce Type: new Abstract: We introduce Delta-Aware Quantization (DAQ), a data-free post-training quantization framework that preserves the knowledge acquired during post-training. Standard quantization objectives minimize reconstruction error but are agnostic to the base model, allowing quantization noise to disproportionately...
A Direct Classification Approach for Reliable Wind Ramp Event Forecasting under Severe Class Imbalance
arXiv:2603.22326v1 Announce Type: new Abstract: Decision support systems are essential for maintaining grid stability in low-carbon power systems, such as wind power plants, by providing real-time alerts to control room operators regarding potential events, including Wind Power Ramp Events (WPREs)....
Beyond the Mean: Distribution-Aware Loss Functions for Bimodal Regression
arXiv:2603.22328v1 Announce Type: new Abstract: Despite the strong predictive performance achieved by machine learning models across many application domains, assessing their trustworthiness through reliable estimates of predictive confidence remains a critical challenge. This issue arises in scenarios where the likelihood...
Conformal Risk Control for Safety-Critical Wildfire Evacuation Mapping: A Comparative Study of Tabular, Spatial, and Graph-Based Models
arXiv:2603.22331v1 Announce Type: new Abstract: Every wildfire prediction model deployed today shares a dangerous property: none of these methods provides formal guarantees on how much fire spread is missed. Despite extensive work on wildfire spread prediction using deep learning, no...
Large Language Models for Missing Data Imputation: Understanding Behavior, Hallucination Effects, and Control Mechanisms
arXiv:2603.22332v1 Announce Type: new Abstract: Data imputation is a cornerstone technique for handling missing values in real-world datasets, which are often plagued by missingness. Despite recent progress, prior studies on Large Language Models-based imputation remain limited by scalability challenges, restricted...
FAAR: Format-Aware Adaptive Rounding for NVFP4
arXiv:2603.22370v1 Announce Type: new Abstract: Deploying large language models (LLMs) on edge devices requires extremely low-bit quantization. Ultra-low precision formats such as NVFP4 offer a promising solution for reducing memory footprint and accelerating computation. However, existing quantization methods typically rely...