Aligning Language Models from User Interactions
arXiv:2603.12273v1 Announce Type: cross Abstract: Multi-turn user interactions are among the most abundant data produced by language models, yet we lack effective methods to learn from them. While typically discarded, these interactions often contain useful information: follow-up user messages may...
Developing and evaluating a chatbot to support maternal health care
arXiv:2603.13168v1 Announce Type: new Abstract: The ability to provide trustworthy maternal health information using phone-based chatbots can have a significant impact, particularly in low-resource settings where users have low health literacy and limited access to care. However, deploying such systems...
Thermodynamics of Reinforcement Learning Curricula
arXiv:2603.12324v1 Announce Type: cross Abstract: Connections between statistical mechanics and machine learning have repeatedly proven fruitful, providing insight into optimization, generalization, and representation learning. In this work, we follow this tradition by leveraging results from non-equilibrium thermodynamics to formalize curriculum...
On Using Machine Learning to Early Detect Catastrophic Failures in Marine Diesel Engines
arXiv:2603.12733v1 Announce Type: new Abstract: Catastrophic failures of marine engines imply severe loss of functionality and destroy or damage the systems irreversibly. Being sudden and often unpredictable events, they pose a severe threat to navigation, crew, and passengers. The abrupt...
Beyond Final Answers: CRYSTAL Benchmark for Transparent Multimodal Reasoning Evaluation
arXiv:2603.13099v1 Announce Type: new Abstract: We introduce **CRYSTAL** (*__C__lear __R__easoning via __Y__ielded __S__teps, __T__raceability and __L__ogic*), a diagnostic benchmark with 6,372 instances that evaluates multimodal reasoning through verifiable intermediate steps. We propose two complementary metrics: *Match F1*, which scores step-level...
Structured Distillation for Personalized Agent Memory: 11x Token Reduction with Retrieval Preservation
arXiv:2603.13017v1 Announce Type: new Abstract: Long conversations with an AI agent create a simple problem for one user: the history is useful, but carrying it verbatim is expensive. We study personalized agent memory: one user's conversation history with an agent,...
When Right Meets Wrong: Bilateral Context Conditioning with Reward-Confidence Correction for GRPO
arXiv:2603.13134v1 Announce Type: new Abstract: Group Relative Policy Optimization (GRPO) has emerged as an effective method for training reasoning models. While it computes advantages based on group mean, GRPO treats each output as an independent sample during the optimization and...
Detecting Miscitation on the Scholarly Web through LLM-Augmented Text-Rich Graph Learning
arXiv:2603.12290v1 Announce Type: cross Abstract: Scholarly web is a vast network of knowledge connected by citations. However, this system is increasingly compromised by miscitation, where references do not support or even contradict the claims they are cited for. Current miscitation...
DART: Input-Difficulty-AwaRe Adaptive Threshold for Early-Exit DNNs
arXiv:2603.12269v1 Announce Type: cross Abstract: Early-exit deep neural networks enable adaptive inference by terminating computation when sufficient confidence is achieved, reducing cost for edge AI accelerators in resource-constrained settings. Existing methods, however, rely on suboptimal exit policies, ignore input difficulty,...
ToolTree: Efficient LLM Agent Tool Planning via Dual-Feedback Monte Carlo Tree Search and Bidirectional Pruning
arXiv:2603.12740v1 Announce Type: new Abstract: Large Language Model (LLM) agents are increasingly applied to complex, multi-step tasks that require interaction with diverse external tools across various domains. However, current LLM agent tool planning methods typically rely on greedy, reactive tool...
ODRL Policy Comparison Through Normalisation
arXiv:2603.12926v1 Announce Type: new Abstract: The ODRL language has become the standard for representing policies and regulations for digital rights. However its complexity is a barrier to its usage, which has caused many related theoretical and practical works to focus...
AI Planning Framework for LLM-Based Web Agents
arXiv:2603.12710v1 Announce Type: new Abstract: Developing autonomous agents for web-based tasks is a core challenge in AI. While Large Language Model (LLM) agents can interpret complex user requests, they often operate as black boxes, making it difficult to diagnose why...
Context is all you need: Towards autonomous model-based process design using agentic AI in flowsheet simulations
arXiv:2603.12813v1 Announce Type: new Abstract: Agentic AI systems integrating large language models (LLMs) with reasoning and tooluse capabilities are transforming various domains - in particular, software development. In contrast, their application in chemical process flowsheet modelling remains largely unexplored. In...
Efficient and Interpretable Multi-Agent LLM Routing via Ant Colony Optimization
arXiv:2603.12933v1 Announce Type: new Abstract: Large Language Model (LLM)-driven Multi-Agent Systems (MAS) have demonstrated strong capability in complex reasoning and tool use, and heterogeneous agent pools further broaden the quality--cost trade-off space. Despite these advances, real-world deployment is often constrained...
Prompt Injection as Role Confusion
arXiv:2603.12277v1 Announce Type: cross Abstract: Language models remain vulnerable to prompt injection attacks despite extensive safety training. We trace this failure to role confusion: models infer roles from how text is written, not where it comes from. We design novel...
VQQA: An Agentic Approach for Video Evaluation and Quality Improvement
arXiv:2603.12310v1 Announce Type: cross Abstract: Despite rapid advancements in video generation models, aligning their outputs with complex user intent remains challenging. Existing test-time optimization methods are typically either computationally expensive or require white-box access to model internals. To address this,...
From Garbage to Gold: A Data-Architectural Theory of Predictive Robustness
arXiv:2603.12288v1 Announce Type: cross Abstract: Tabular machine learning presents a paradox: modern models achieve state-of-the-art performance using high-dimensional (high-D), collinear, error-prone data, defying the "Garbage In, Garbage Out" mantra. To help resolve this, we synthesize principles from Information Theory, Latent...
Synthetic Data Generation for Brain-Computer Interfaces: Overview, Benchmarking, and Future Directions
arXiv:2603.12296v1 Announce Type: cross Abstract: Deep learning has achieved transformative performance across diverse domains, largely driven by the large-scale, high-quality training data. In contrast, the development of brain-computer interfaces (BCIs) is fundamentally constrained by the limited, heterogeneous, and privacy-sensitive neural...
HCP-DCNet: A Hierarchical Causal Primitive Dynamic Composition Network for Self-Improving Causal Understanding
arXiv:2603.12305v1 Announce Type: cross Abstract: The ability to understand and reason about cause and effect -- encompassing interventions, counterfactuals, and underlying mechanisms -- is a cornerstone of robust artificial intelligence. While deep learning excels at pattern recognition, it fundamentally lacks...
Diagnosing Retrieval Bias Under Multiple In-Context Knowledge Updates in Large Language Models
arXiv:2603.12271v1 Announce Type: cross Abstract: LLMs are widely used in knowledge-intensive tasks where the same fact may be revised multiple times within context. Unlike prior work focusing on one-shot updates or single conflicts, multi-update scenarios contain multiple historically valid versions...
Predictive Analytics for Foot Ulcers Using Time-Series Temperature and Pressure Data
arXiv:2603.12278v1 Announce Type: cross Abstract: Diabetic foot ulcers (DFUs) are a severe complication of diabetes, often resulting in significant morbidity. This paper presents a predictive analytics framework utilizing time-series data captured by wearable foot sensors -- specifically NTC thin-film thermocouples...
Maximum Entropy Exploration Without the Rollouts
arXiv:2603.12325v1 Announce Type: cross Abstract: Efficient exploration remains a central challenge in reinforcement learning, serving as a useful pretraining objective for data collection, particularly when an external reward function is unavailable. A principled formulation of the exploration problem is to...
Optimizing Task Completion Time Updates Using POMDPs
arXiv:2603.12340v1 Announce Type: cross Abstract: Managing announced task completion times is a fundamental control problem in project management. While extensive research exists on estimating task durations and task scheduling, the problem of when and how to update completion times communicated...
SPARROW: Learning Spatial Precision and Temporal Referential Consistency in Pixel-Grounded Video MLLMs
arXiv:2603.12382v1 Announce Type: cross Abstract: Multimodal large language models (MLLMs) have advanced from image-level reasoning to pixel-level grounding, but extending these capabilities to videos remains challenging as models must achieve spatial precision and temporally consistent reference tracking. Existing video MLLMs...
Test-Time Strategies for More Efficient and Accurate Agentic RAG
arXiv:2603.12396v1 Announce Type: cross Abstract: Retrieval-Augmented Generation (RAG) systems face challenges with complex, multihop questions, and agentic frameworks such as Search-R1 (Jin et al., 2025), which operates iteratively, have been proposed to address these complexities. However, such approaches can introduce...
Revisiting Model Stitching In the Foundation Model Era
arXiv:2603.12433v1 Announce Type: cross Abstract: Model stitching, connecting early layers of one model (source) to later layers of another (target) via a light stitch layer, has served as a probe of representational compatibility. Prior work finds that models trained on...
Unmasking Biases and Reliability Concerns in Convolutional Neural Networks Analysis of Cancer Pathology Images
arXiv:2603.12445v1 Announce Type: cross Abstract: Convolutional Neural Networks have shown promising effectiveness in identifying different types of cancer from radiographs. However, the opaque nature of CNNs makes it difficult to fully understand the way they operate, limiting their assessment to...
Operationalising Cyber Risk Management Using AI: Connecting Cyber Incidents to MITRE ATT&CK Techniques, Security Controls, and Metrics
arXiv:2603.12455v1 Announce Type: cross Abstract: The escalating frequency of cyber-attacks poses significant challenges for organisations, particularly small enterprises constrained by limited in-house expertise, insufficient knowledge, and financial resources. This research presents a novel framework that leverages Natural Language Processing to...
Shattering the Shortcut: A Topology-Regularized Benchmark for Multi-hop Medical Reasoning in LLMs
arXiv:2603.12458v1 Announce Type: cross Abstract: While Large Language Models (LLMs) achieve expert-level performance on standard medical benchmarks through single-hop factual recall, they severely struggle with the complex, multi-hop diagnostic reasoning required in real-world clinical settings. A primary obstacle is "shortcut...
CLARE: Classification-based Regression for Electron Temperature Prediction
arXiv:2603.12470v1 Announce Type: cross Abstract: Electron temperature (Te) is an important parameter governing space weather in the upper atmosphere, but has historically been underexplored in the space weather machine learning literature. We present CLARE, a machine learning model for predicting...