Agentic AI for Commercial Insurance Underwriting with Adversarial Self-Critique
arXiv:2602.13213v1 Announce Type: new Abstract: Commercial insurance underwriting is a labor-intensive process that requires manual review of extensive documentation to assess risk and determine policy pricing. While AI offers substantial efficiency improvements, existing solutions lack comprehensive reasoning capabilities and internal...
Variation is the Key: A Variation-Based Framework for LLM-Generated Text Detection
arXiv:2602.13226v1 Announce Type: new Abstract: Detecting text generated by large language models (LLMs) is crucial but challenging. Existing detectors depend on impractical assumptions, such as white-box settings, or solely rely on text-level features, leading to imprecise detection ability. In this...
PlotChain: Deterministic Checkpointed Evaluation of Multimodal LLMs on Engineering Plot Reading
arXiv:2602.13232v1 Announce Type: new Abstract: We present PlotChain, a deterministic, generator-based benchmark for evaluating multimodal large language models (MLLMs) on engineering plot reading-recovering quantitative values from classic plots (e.g., Bode/FFT, step response, stress-strain, pump curves) rather than OCR-only extraction or...
NL2LOGIC: AST-Guided Translation of Natural Language into First-Order Logic with Large Language Models
arXiv:2602.13237v1 Announce Type: new Abstract: Automated reasoning is critical in domains such as law and governance, where verifying claims against facts in documents requires both accuracy and interpretability. Recent work adopts structured reasoning pipelines that translate natural language into first-order...
AST-PAC: AST-guided Membership Inference for Code
arXiv:2602.13240v1 Announce Type: new Abstract: Code Large Language Models are frequently trained on massive datasets containing restrictively licensed source code. This creates urgent data governance and copyright challenges. Membership Inference Attacks (MIAs) can serve as an auditing mechanism to detect...
DECKBench: Benchmarking Multi-Agent Frameworks for Academic Slide Generation and Editing
arXiv:2602.13318v1 Announce Type: new Abstract: Automatically generating and iteratively editing academic slide decks requires more than document summarization. It demands faithful content selection, coherent slide organization, layout-aware rendering, and robust multi-turn instruction following. However, existing benchmarks and evaluation protocols do...
Contrastive explanations of BDI agents
arXiv:2602.13323v1 Announce Type: new Abstract: The ability of autonomous systems to provide explanations is important for supporting transparency and aiding the development of (appropriate) trust. Prior work has defined a mechanism for Belief-Desire-Intention (BDI) agents to be able to answer...
On-Policy Supervised Fine-Tuning for Efficient Reasoning
arXiv:2602.13407v1 Announce Type: new Abstract: Large reasoning models (LRMs) are commonly trained with reinforcement learning (RL) to explore long chain-of-thought reasoning, achieving strong performance at high computational cost. Recent methods add multi-reward objectives to jointly optimize correctness and brevity, but...
Building Autonomous GUI Navigation via Agentic-Q Estimation and Step-Wise Policy Optimization
arXiv:2602.13653v1 Announce Type: new Abstract: Recent advances in Multimodal Large Language Models (MLLMs) have substantially driven the progress of autonomous agents for Graphical User Interface (GUI). Nevertheless, in real-world applications, GUI agents are often faced with non-stationary environments, leading to...
AllMem: A Memory-centric Recipe for Efficient Long-context Modeling
arXiv:2602.13680v1 Announce Type: new Abstract: Large Language Models (LLMs) encounter significant performance bottlenecks in long-sequence tasks due to the computational complexity and memory overhead inherent in the self-attention mechanism. To address these challenges, we introduce \textsc{AllMem}, a novel and efficient...
Language Model Memory and Memory Models for Language
arXiv:2602.13466v1 Announce Type: new Abstract: The ability of machine learning models to store input information in hidden layer vector embeddings, analogous to the concept of `memory', is widely employed but not well characterized. We find that language model embeddings typically...
Think Deep, Not Just Long: Measuring LLM Reasoning Effort via Deep-Thinking Tokens
arXiv:2602.13517v1 Announce Type: new Abstract: Large language models (LLMs) have demonstrated impressive reasoning capabilities by scaling test-time compute via long Chain-of-Thought (CoT). However, recent findings suggest that raw token counts are unreliable proxies for reasoning quality: increased generation length does...
Small Reward Models via Backward Inference
arXiv:2602.13551v1 Announce Type: new Abstract: Reward models (RMs) play a central role throughout the language model (LM) pipeline, particularly in non-verifiable domains. However, the dominant LLM-as-a-Judge paradigm relies on the strong reasoning capabilities of large models, while alternative approaches require...
DistillLens: Symmetric Knowledge Distillation Through Logit Lens
arXiv:2602.13567v1 Announce Type: new Abstract: Standard Knowledge Distillation (KD) compresses Large Language Models (LLMs) by optimizing final outputs, yet it typically treats the teacher's intermediate layer's thought process as a black box. While feature-based distillation attempts to bridge this gap,...
Elo-Evolve: A Co-evolutionary Framework for Language Model Alignment
arXiv:2602.13575v1 Announce Type: new Abstract: Current alignment methods for Large Language Models (LLMs) rely on compressing vast amounts of human preference data into static, absolute reward functions, leading to data scarcity, noise sensitivity, and training instability. We introduce Elo-Evolve, a...
Metaphors' journeys across time and genre: tracking the evolution of literary metaphors with temporal embeddings
arXiv:2602.13701v1 Announce Type: new Abstract: Metaphors are a distinctive feature of literary language, yet they remain less studied experimentally than everyday metaphors. Moreover, previous psycholinguistic and computational approaches overlooked the temporal dimension, although many literary metaphors were coined centuries apart...
Beyond Words: Evaluating and Bridging Epistemic Divergence in User-Agent Interaction via Theory of Mind
arXiv:2602.13832v1 Announce Type: new Abstract: Large Language Models (LLMs) have developed rapidly and are widely applied to both general-purpose and professional tasks to assist human users. However, they still struggle to comprehend and respond to the true user needs when...
Speculative Decoding with a Speculative Vocabulary
arXiv:2602.13836v1 Announce Type: new Abstract: Speculative decoding has rapidly emerged as a leading approach for accelerating language model (LM) inference, as it offers substantial speedups while yielding identical outputs. This relies upon a small draft model, tasked with predicting the...
PrivAct: Internalizing Contextual Privacy Preservation via Multi-Agent Preference Training
arXiv:2602.13840v1 Announce Type: new Abstract: Large language model (LLM) agents are increasingly deployed in personalized tasks involving sensitive, context-dependent information, where privacy violations may arise in agents' action due to the implicitness of contextual privacy. Existing approaches rely on external,...
Tutoring Large Language Models to be Domain-adaptive, Precise, and Safe
arXiv:2602.13860v1 Announce Type: new Abstract: The overarching research direction of this work is the development of a ''Responsible Intelligence'' framework designed to reconcile the immense generative power of Large Language Models (LLMs) with the stringent requirements of real-world deployment. As...
Pre-Editorial Normalization for Automatically Transcribed Medieval Manuscripts in Old French and Latin
arXiv:2602.13905v1 Announce Type: new Abstract: Recent advances in Automatic Text Recognition (ATR) have improved access to historical archives, yet a methodological divide persists between palaeographic transcriptions and normalized digital editions. While ATR models trained on more palaeographically-oriented datasets such as...
The Sufficiency-Conciseness Trade-off in LLM Self-Explanation from an Information Bottleneck Perspective
arXiv:2602.14002v1 Announce Type: new Abstract: Large Language Models increasingly rely on self-explanations, such as chain of thought reasoning, to improve performance on multi step question answering. While these explanations enhance accuracy, they are often verbose and costly to generate, raising...
GRRM: Group Relative Reward Modeling for Machine Translation
arXiv:2602.14028v1 Announce Type: new Abstract: While Group Relative Policy Optimization (GRPO) offers a powerful framework for LLM post-training, its effectiveness in open-ended domains like Machine Translation hinges on accurate intra-group ranking. We identify that standard Scalar Quality Metrics (SQM) fall...
LogitsCoder: Towards Efficient Chain-of-Thought Path Search via Logits Preference Decoding for Code Generation
arXiv:2602.14054v1 Announce Type: new Abstract: Code generation remains a challenging task that requires precise and structured reasoning. Existing Test Time Scaling (TTS) methods, including structured tree search, have made progress in exploring reasoning paths but still face two major challenges:...
AI Hallucination from Students' Perspective: A Thematic Analysis
arXiv:2602.17671v1 Announce Type: cross Abstract: As students increasingly rely on large language models, hallucinations pose a growing threat to learning. To mitigate this, AI literacy must expand beyond prompt engineering to address how students should detect and respond to LLM...
Robust Pre-Training of Medical Vision-and-Language Models with Domain-Invariant Multi-Modal Masked Reconstruction
arXiv:2602.17689v1 Announce Type: cross Abstract: Medical vision-language models show strong potential for joint reasoning over medical images and clinical text, but their performance often degrades under domain shift caused by variations in imaging devices, acquisition protocols, and reporting styles. Existing...
A Case Study of Selected PTQ Baselines for Reasoning LLMs on Ascend NPU
arXiv:2602.17693v1 Announce Type: cross Abstract: Post-Training Quantization (PTQ) is crucial for efficient model deployment, yet its effectiveness on Ascend NPU remains under-explored compared to GPU architectures. This paper presents a case study of representative PTQ baselines applied to reasoning-oriented models...
UBio-MolFM: A Universal Molecular Foundation Model for Bio-Systems
arXiv:2602.17709v1 Announce Type: cross Abstract: All-atom molecular simulation serves as a quintessential ``computational microscope'' for understanding the machinery of life, yet it remains fundamentally limited by the trade-off between quantum-mechanical (QM) accuracy and biological scale. We present UBio-MolFM, a universal...
"Everyone's using it, but no one is allowed to talk about it": College Students' Experiences Navigating the Higher Education Environment in a Generative AI World
arXiv:2602.17720v1 Announce Type: cross Abstract: Higher education students are increasingly using generative AI in their academic work. However, existing institutional practices have not yet adapted to this shift. Through semi-structured interviews with 23 college students, our study examines the environmental...
Five Fatal Assumptions: Why T-Shirt Sizing Systematically Fails for AI Projects
arXiv:2602.17734v1 Announce Type: cross Abstract: Agile estimation techniques, particularly T-shirt sizing, are widely used in software development for their simplicity and utility in scoping work. However, when we apply these methods to artificial intelligence initiatives -- especially those involving large...