Querying Structured Data Through Natural Language Using Language Models
arXiv:2604.03057v1 Announce Type: new Abstract: This paper presents an open source methodology for allowing users to query structured non textual datasets through natural language Unlike Retrieval Augmented Generation RAG which struggles with numerical and highly structured information our approach trains...
Conditional Sampling via Wasserstein Autoencoders and Triangular Transport
arXiv:2604.02644v1 Announce Type: new Abstract: We present Conditional Wasserstein Autoencoders (CWAEs), a framework for conditional simulation that exploits low-dimensional structure in both the conditioned and the conditioning variables. The key idea is to modify a Wasserstein autoencoder to use a...
Haiku to Opus in Just 10 bits: LLMs Unlock Massive Compression Gains
arXiv:2604.02343v1 Announce Type: cross Abstract: We study the compression of LLM-generated text across lossless and lossy regimes, characterizing a compression-compute frontier where more compression is possible at the cost of more compute. For lossless compression, domain-adapted LoRA adapters can improve...
Failing to Falsify: Evaluating and Mitigating Confirmation Bias in Language Models
arXiv:2604.02485v1 Announce Type: new Abstract: Confirmation bias, the tendency to seek evidence that supports rather than challenges one's belief, hinders one's reasoning ability. We examine whether large language models (LLMs) exhibit confirmation bias by adapting the rule-discovery study from human...
DeltaLogic: Minimal Premise Edits Reveal Belief-Revision Failures in Logical Reasoning Models
arXiv:2604.02733v1 Announce Type: new Abstract: Reasoning benchmarks typically evaluate whether a model derives the correct answer from a fixed premise set, but they under-measure a closely related capability that matters in dynamic environments: belief revision under minimal evidence change. We...
Mitigating Data Scarcity in Spaceflight Applications for Offline Reinforcement Learning Using Physics-Informed Deep Generative Models
arXiv:2604.02438v1 Announce Type: new Abstract: The deployment of reinforcement learning (RL)-based controllers on physical systems is often limited by poor generalization to real-world scenarios, known as the simulation-to-reality (sim-to-real) gap. This gap is particularly challenging in spaceflight, where real-world training...
Automatic Textbook Formalization
arXiv:2604.03071v1 Announce Type: new Abstract: We present a case study where an automatic AI system formalizes a textbook with more than 500 pages of graduate-level algebraic combinatorics to Lean. The resulting formalization represents a new milestone in textbook formalization scale...
VALOR: Value-Aware Revenue Uplift Modeling with Treatment-Gated Representation for B2B Sales
arXiv:2604.02472v1 Announce Type: new Abstract: B2B sales organizations must identify "persuadable" accounts within zero-inflated revenue distributions to optimize expensive human resource allocation. Standard uplift frameworks struggle with treatment signal collapse in high-dimensional spaces and a misalignment between regression calibration and...
CharTool: Tool-Integrated Visual Reasoning for Chart Understanding
arXiv:2604.02794v1 Announce Type: new Abstract: Charts are ubiquitous in scientific and financial literature for presenting structured data. However, chart reasoning remains challenging for multimodal large language models (MLLMs) due to the lack of high-quality training data, as well as the...
Beyond Precision: Importance-Aware Recall for Factuality Evaluation in Long-Form LLM Generation
arXiv:2604.03141v1 Announce Type: new Abstract: Evaluating the factuality of long-form output generated by large language models (LLMs) remains challenging, particularly when responses are open-ended and contain many fine-grained factual statements. Existing evaluation methods primarily focus on precision: they decompose a...
Dynamic Mask Enhanced Intelligent Multi-UAV Deployment for Urban Vehicular Networks
arXiv:2604.02358v1 Announce Type: cross Abstract: Vehicular Ad Hoc Networks (VANETs) play a crucial role in realizing vehicle-road collaboration and intelligent transportation. However, urban VANETs often face challenges such as frequent link disconnections and subnet fragmentation, which hinder reliable connectivity. To...
Let's Have a Conversation: Designing and Evaluating LLM Agents for Interactive Optimization
arXiv:2604.02666v1 Announce Type: new Abstract: Optimization is as much about modeling the right problem as solving it. Identifying the right objectives, constraints, and trade-offs demands extensive interaction between researchers and stakeholders. Large language models can empower decision-makers with optimization capabilities...
Detecting and Correcting Reference Hallucinations in Commercial LLMs and Deep Research Agents
arXiv:2604.03173v1 Announce Type: new Abstract: Large language models and deep research agents supply citation URLs to support their claims, yet the reliability of these citations has not been systematically measured. We address six research questions about citation URL validity using...
LLM Reasoning with Process Rewards for Outcome-Guided Steps
arXiv:2604.02341v1 Announce Type: cross Abstract: Mathematical reasoning in large language models has improved substantially with reinforcement learning using verifiable rewards, where final answers can be checked automatically and converted into reliable training signals. Most such pipelines optimize outcome correctness only,...
SocioEval: A Template-Based Framework for Evaluating Socioeconomic Status Bias in Foundation Models
arXiv:2604.02660v1 Announce Type: new Abstract: As Large Language Models (LLMs) increasingly power decision-making systems across critical domains, understanding and mitigating their biases becomes essential for responsible AI deployment. Although bias assessment frameworks have proliferated for attributes such as race and...
Product-Stability: Provable Convergence for Gradient Descent on the Edge of Stability
arXiv:2604.02653v1 Announce Type: new Abstract: Empirically, modern deep learning training often occurs at the Edge of Stability (EoS), where the sharpness of the loss exceeds the threshold below which classical convergence analysis applies. Despite recent progress, existing theoretical explanations of...
Principled and Scalable Diversity-Aware Retrieval via Cardinality-Constrained Binary Quadratic Programming
arXiv:2604.02554v1 Announce Type: new Abstract: Diversity-aware retrieval is essential for Retrieval-Augmented Generation (RAG), yet existing methods lack theoretical guarantees and face scalability issues as the number of retrieved passages $k$ increases. We propose a principled formulation of diversity retrieval as...
Generalization Limits of Reinforcement Learning Alignment
arXiv:2604.02652v1 Announce Type: new Abstract: The safety of large language models (LLMs) relies on alignment techniques such as reinforcement learning from human feedback (RLHF). However, recent theoretical analyses suggest that reinforcement learning-based training does not acquire new capabilities but merely...
DIGITAL DIPLOMACY AND ARTIFICIAL INTELLIGENCE: REGULATION ASPECTS IN INTERNATIONAL LAW
The article examines the legal aspects of regulating artificial intelligence in the context of digital diplomacy. The author examines the process of transformation of traditional diplomatic institutions under the influence of digitalization and the introduction of artificial intelligence technologies, analyzes...
Pragmatics Meets Culture: Culturally-adapted Artwork Description Generation and Evaluation
arXiv:2604.02557v1 Announce Type: new Abstract: Language models are known to exhibit various forms of cultural bias in decision-making tasks, yet much less is known about their degree of cultural familiarity in open-ended text generation tasks. In this paper, we introduce...
FoE: Forest of Errors Makes the First Solution the Best in Large Reasoning Models
arXiv:2604.02967v1 Announce Type: new Abstract: Recent Large Reasoning Models (LRMs) like DeepSeek-R1 have demonstrated remarkable success in complex reasoning tasks, exhibiting human-like patterns in exploring multiple alternative solutions. Upon closer inspection, however, we uncover a surprising phenomenon: The First is...
Interpretable Deep Reinforcement Learning for Element-level Bridge Life-cycle Optimization
arXiv:2604.02528v1 Announce Type: new Abstract: The new Specifications for the National Bridge Inventory (SNBI), in effect from 2022, emphasize the use of element-level condition states (CS) for risk-based bridge management. Instead of a general component rating, element-level condition data use...
AutoVerifier: An Agentic Automated Verification Framework Using Large Language Models
arXiv:2604.02617v1 Announce Type: new Abstract: Scientific and Technical Intelligence (S&TI) analysis requires verifying complex technical claims across rapidly growing literature, where existing approaches fail to bridge the verification gap between surface-level accuracy and deeper methodological validity. We present AutoVerifier, an...
TRACE: Traceroute-based Internet Route change Analysis with Ensemble Learning
arXiv:2604.02361v1 Announce Type: cross Abstract: Detecting Internet routing instability is a critical yet challenging task, particularly when relying solely on endpoint active measurements. This study introduces TRACE, a MachineLearning (ML)pipeline designed to identify route changes using only traceroute latency data,...
SIEVE: Sample-Efficient Parametric Learning from Natural Language
arXiv:2604.02339v1 Announce Type: new Abstract: Natural language context-such as instructions, knowledge, or feedback-contains rich signal for adapting language models. While in-context learning provides adaptation via the prompt, parametric learning persists into model weights and can improve performance further, though is...
Domain-Adapted Retrieval for In-Context Annotation of Pedagogical Dialogue Acts
arXiv:2604.03127v1 Announce Type: new Abstract: Automated annotation of pedagogical dialogue is a high-stakes task where LLMs often fail without sufficient domain grounding. We present a domain-adapted RAG pipeline for tutoring move annotation. Rather than fine-tuning the generative model, we adapt...
Improving Role Consistency in Multi-Agent Collaboration via Quantitative Role Clarity
arXiv:2604.02770v1 Announce Type: new Abstract: In large language model (LLM)-driven multi-agent systems, disobey role specification (failure to adhere to the defined responsibilities and constraints of an assigned role, potentially leading to an agent behaving like another) is a major failure...
Beyond the Parameters: A Technical Survey of Contextual Enrichment in Large Language Models: From In-Context Prompting to Causal Retrieval-Augmented Generation
arXiv:2604.03174v1 Announce Type: new Abstract: Large language models (LLMs) encode vast world knowledge in their parameters, yet they remain fundamentally limited by static knowledge, finite context windows, and weakly structured causal reasoning. This survey provides a unified account of augmentation...
Internalized Reasoning for Long-Context Visual Document Understanding
arXiv:2604.02371v1 Announce Type: cross Abstract: Visual long-document understanding is critical for enterprise, legal, and scientific applications, yet the best performing open recipes have not explored reasoning, a capability which has driven leaps in math and code performance. We introduce a...
Fast NF4 Dequantization Kernels for Large Language Model Inference
arXiv:2604.02556v1 Announce Type: new Abstract: Large language models (LLMs) have grown beyond the memory capacity of single GPU devices, necessitating quantization techniques for practical deployment. While NF4 (4-bit NormalFloat) quantization enables 4$\times$ memory reduction, inference on current NVIDIA GPUs (e.g.,...