OasisSimp: An Open-source Asian-English Sentence Simplification Dataset
arXiv:2603.14111v1 Announce Type: new Abstract: Sentence simplification aims to make complex text more accessible by reducing linguistic complexity while preserving the original meaning. However, progress in this area remains limited for mid-resource and low-resource languages due to the scarcity of...
Selective Fine-Tuning of GPT Architectures for Parameter-Efficient Clinical Text Classification
arXiv:2603.14183v1 Announce Type: new Abstract: The rapid expansion of electronic health record (EHR) systems has generated large volumes of unstructured clinical narratives that contain valuable information for disease identification, patient cohort discovery, and clinical decision support. Extracting structured knowledge from...
Mitigating Overthinking in Large Reasoning Language Models via Reasoning Path Deviation Monitoring
arXiv:2603.14251v1 Announce Type: new Abstract: Large Reasoning Language Models (LRLMs) demonstrate impressive capabilities on complex tasks by utilizing long Chain-of-Thought reasoning. However, they are prone to overthinking, which generates redundant reasoning steps that degrade both performance and efficiency. Recently, early-exit...
Automatic Inter-document Multi-hop Scientific QA Generation
arXiv:2603.14257v1 Announce Type: new Abstract: Existing automatic scientific question generation studies mainly focus on single-document factoid QA, overlooking the inter-document reasoning crucial for scientific understanding. We present AIM-SciQA, an automated framework for generating multi-document, multi-hop scientific QA datasets. AIM-SciQA extracts...
SemantiCache: Efficient KV Cache Compression via Semantic Chunking and Clustered Merging
arXiv:2603.14303v1 Announce Type: new Abstract: Existing KV cache compression methods generally operate on discrete tokens or non-semantic chunks. However, such approaches often lead to semantic fragmentation, where linguistically coherent units are disrupted, causing irreversible information loss and degradation in model...
Creative Convergence or Imitation? Genre-Specific Homogeneity in LLM-Generated Chinese Literature
arXiv:2603.14430v1 Announce Type: new Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities in narrative generation. However, they often produce structurally homogenized stories, frequently following repetitive arrangements and combinations of plot events along with stereotypical resolutions. In this paper, we...
PARSA-Bench: A Comprehensive Persian Audio-Language Model Benchmark
arXiv:2603.14456v1 Announce Type: new Abstract: Persian poses unique audio understanding challenges through its classical poetry, traditional music, and pervasive code-switching - none captured by existing benchmarks. We introduce PARSA-Bench (Persian Audio Reasoning and Speech Assessment Benchmark), the first benchmark for...
Continual Fine-Tuning with Provably Accurate and Parameter-Free Task Retrieval
arXiv:2603.13235v1 Announce Type: new Abstract: Continual fine-tuning aims to adapt a pre-trained backbone to new tasks sequentially while preserving performance on earlier tasks whose data are no longer available. Existing approaches fall into two categories which include input- and parameter-adaptation....
From Stochastic Answers to Verifiable Reasoning: Interpretable Decision-Making with LLM-Generated Code
arXiv:2603.13287v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used for high-stakes decision-making, yet existing approaches struggle to reconcile scalability, interpretability, and reproducibility. Black-box models obscure their reasoning, while recent LLM-based rule systems rely on per-sample evaluation, causing...
DreamReader: An Interpretability Toolkit for Text-to-Image Models
arXiv:2603.13299v1 Announce Type: new Abstract: Despite the rapid adoption of text-to-image (T2I) diffusion models, causal and representation-level analysis remains fragmented and largely limited to isolated probing techniques. To address this gap, we introduce DreamReader: a unified framework that formalizes diffusion...
Linear Predictability of Attention Heads in Large Language Models
arXiv:2603.13314v1 Announce Type: new Abstract: Large language model (LLM) inference is increasingly bottlenecked by the Key-Value (KV) cache, yet the fine-grained structure of attention-head activations remains poorly understood. We show that pretrained Transformers exhibit a pervasive inter-head linear structure: for...
Residual Stream Analysis of Overfitting And Structural Disruptions
arXiv:2603.13318v1 Announce Type: new Abstract: Ensuring that large language models (LLMs) remain both helpful and harmless poses a significant challenge: fine-tuning on repetitive safety datasets, where unsafe prompts are paired with standard refusal templates, often leads to false refusals, in...
The Challenge of Out-Of-Distribution Detection in Motor Imagery BCIs
arXiv:2603.13324v1 Announce Type: new Abstract: Machine Learning classifiers used in Brain-Computer Interfaces make classifications based on the distribution of data they were trained on. When they need to make inferences on samples that fall outside of this distribution, they can...
AI-Driven Predictive Maintenance with Real-Time Contextual Data Fusion for Connected Vehicles: A Multi-Dataset Evaluation
arXiv:2603.13343v1 Announce Type: new Abstract: Most vehicle predictive maintenance systems rely exclusively on internal diagnostic signals and are validated on deterministic synthetic data, limiting the credibility of reported metrics. This paper presents a simulation-validated proof-of-concept framework for V2X-augmented predictive maintenance,...
Nvidia’s DLSS 5 uses generative AI to boost photorealism in video games, with ambitions beyond gaming
Nvidia’s new DLSS 5 uses generative AI and structured graphics data to make video games more realistic. CEO Jensen Huang says the approach could eventually spread to other industries.
On Using Machine Learning to Early Detect Catastrophic Failures in Marine Diesel Engines
arXiv:2603.12733v1 Announce Type: new Abstract: Catastrophic failures of marine engines imply severe loss of functionality and destroy or damage the systems irreversibly. Being sudden and often unpredictable events, they pose a severe threat to navigation, crew, and passengers. The abrupt...
Predictive Analytics for Foot Ulcers Using Time-Series Temperature and Pressure Data
arXiv:2603.12278v1 Announce Type: cross Abstract: Diabetic foot ulcers (DFUs) are a severe complication of diabetes, often resulting in significant morbidity. This paper presents a predictive analytics framework utilizing time-series data captured by wearable foot sensors -- specifically NTC thin-film thermocouples...
AI Planning Framework for LLM-Based Web Agents
arXiv:2603.12710v1 Announce Type: new Abstract: Developing autonomous agents for web-based tasks is a core challenge in AI. While Large Language Model (LLM) agents can interpret complex user requests, they often operate as black boxes, making it difficult to diagnose why...
Aligning Language Models from User Interactions
arXiv:2603.12273v1 Announce Type: cross Abstract: Multi-turn user interactions are among the most abundant data produced by language models, yet we lack effective methods to learn from them. While typically discarded, these interactions often contain useful information: follow-up user messages may...
When Right Meets Wrong: Bilateral Context Conditioning with Reward-Confidence Correction for GRPO
arXiv:2603.13134v1 Announce Type: new Abstract: Group Relative Policy Optimization (GRPO) has emerged as an effective method for training reasoning models. While it computes advantages based on group mean, GRPO treats each output as an independent sample during the optimization and...
Maximum Entropy Exploration Without the Rollouts
arXiv:2603.12325v1 Announce Type: cross Abstract: Efficient exploration remains a central challenge in reinforcement learning, serving as a useful pretraining objective for data collection, particularly when an external reward function is unavailable. A principled formulation of the exploration problem is to...
SPARROW: Learning Spatial Precision and Temporal Referential Consistency in Pixel-Grounded Video MLLMs
arXiv:2603.12382v1 Announce Type: cross Abstract: Multimodal large language models (MLLMs) have advanced from image-level reasoning to pixel-level grounding, but extending these capabilities to videos remains challenging as models must achieve spatial precision and temporally consistent reference tracking. Existing video MLLMs...
Test-Time Strategies for More Efficient and Accurate Agentic RAG
arXiv:2603.12396v1 Announce Type: cross Abstract: Retrieval-Augmented Generation (RAG) systems face challenges with complex, multihop questions, and agentic frameworks such as Search-R1 (Jin et al., 2025), which operates iteratively, have been proposed to address these complexities. However, such approaches can introduce...
Revisiting Model Stitching In the Foundation Model Era
arXiv:2603.12433v1 Announce Type: cross Abstract: Model stitching, connecting early layers of one model (source) to later layers of another (target) via a light stitch layer, has served as a probe of representational compatibility. Prior work finds that models trained on...
TRACE: Temporal Rule-Anchored Chain-of-Evidence on Knowledge Graphs for Interpretable Stock Movement Prediction
arXiv:2603.12500v1 Announce Type: cross Abstract: We present a Temporal Rule-Anchored Chain-of-Evidence (TRACE) on knowledge graphs for interpretable stock movement prediction that unifies symbolic relational priors, dynamic graph exploration, and LLM-guided decision making in a single end-to-end pipeline. The approach performs...
When LLM Judge Scores Look Good but Best-of-N Decisions Fail
arXiv:2603.12520v1 Announce Type: cross Abstract: Large language models are often used as judges to score candidate responses, then validated with a single global metric such as correlation with reference labels. This can be misleading when the real deployment task is...
Not Just the Destination, But the Journey: Reasoning Traces Causally Shape Generalization Behaviors
arXiv:2603.12397v1 Announce Type: new Abstract: Chain-of-Thought (CoT) is often viewed as a window into LLM decision-making, yet recent work suggests it may function merely as post-hoc rationalization. This raises a critical alignment question: Does the reasoning trace causally shape model...
Interpreting Negation in GPT-2: Layer- and Head-Level Causal Analysis
arXiv:2603.12423v1 Announce Type: new Abstract: Negation remains a persistent challenge for modern language models, often causing reversed meanings or factual errors. In this work, we conduct a causal analysis of how GPT-2 Small internally processes such linguistic transformations. We examine...
AgentDrift: Unsafe Recommendation Drift Under Tool Corruption Hidden by Ranking Metrics in LLM Agents
arXiv:2603.12564v1 Announce Type: new Abstract: Tool-augmented LLM agents increasingly serve as multi-turn advisors in high-stakes domains, yet their evaluation relies on ranking-quality metrics that measure what is recommended but not whether it is safe for the user. We introduce a...
LMEB: Long-horizon Memory Embedding Benchmark
arXiv:2603.12572v1 Announce Type: new Abstract: Memory embeddings are crucial for memory-augmented systems, such as OpenClaw, but their evaluation is underexplored in current text embedding benchmarks, which narrowly focus on traditional passage retrieval and fail to assess models' ability to handle...