Boosting for Vector-Valued Prediction and Conditional Density Estimation
arXiv:2602.18866v1 Announce Type: new Abstract: Despite the widespread use of boosting in structured prediction, a general theoretical understanding of aggregation beyond scalar losses remains incomplete. We study vector-valued and conditional density prediction under general divergences and identify stability conditions under...
CUICurate: A GraphRAG-based Framework for Automated Clinical Concept Curation for NLP applications
arXiv:2602.17949v1 Announce Type: new Abstract: Background: Clinical named entity recognition tools commonly map free text to Unified Medical Language System (UMLS) Concept Unique Identifiers (CUIs). For many downstream tasks, however, the clinically meaningful unit is not a single CUI but...
Agentic Adversarial QA for Improving Domain-Specific LLMs
arXiv:2602.18137v1 Announce Type: new Abstract: Large Language Models (LLMs), despite extensive pretraining on broad internet corpora, often struggle to adapt effectively to specialized domains. There is growing interest in fine-tuning these models for such domains; however, progress is constrained by...
RVR: Retrieve-Verify-Retrieve for Comprehensive Question Answering
arXiv:2602.18425v1 Announce Type: new Abstract: Comprehensively retrieving diverse documents is crucial to address queries that admit a wide range of valid answers. We introduce retrieve-verify-retrieve (RVR), a multi-round retrieval framework designed to maximize answer coverage. Initially, a retriever takes the...
LATMiX: Learnable Affine Transformations for Microscaling Quantization of LLMs
arXiv:2602.17681v1 Announce Type: cross Abstract: Post-training quantization (PTQ) is a widely used approach for reducing the memory and compute costs of large language models (LLMs). Recent studies have shown that applying invertible transformations to activations can significantly improve quantization robustness...
Tethered Reasoning: Decoupling Entropy from Hallucination in Quantized LLMs via Manifold Steering
arXiv:2602.17691v1 Announce Type: cross Abstract: Quantized language models face a fundamental dilemma: low sampling temperatures yield repetitive, mode-collapsed outputs, while high temperatures (T > 2.0) cause trajectory divergence and semantic incoherence. We present HELIX, a geometric framework that decouples output...
ADAPT: Hybrid Prompt Optimization for LLM Feature Visualization
arXiv:2602.17867v1 Announce Type: cross Abstract: Understanding what features are encoded by learned directions in LLM activation space requires identifying inputs that strongly activate them. Feature visualization, which optimizes inputs to maximally activate a target direction, offers an alternative to costly...
BioBridge: Bridging Proteins and Language for Enhanced Biological Reasoning with LLMs
arXiv:2602.17680v1 Announce Type: new Abstract: Existing Protein Language Models (PLMs) often suffer from limited adaptability to multiple tasks and exhibit poor generalization across diverse biological contexts. In contrast, general-purpose Large Language Models (LLMs) lack the capability to interpret protein sequences...
Provable Adversarial Robustness in In-Context Learning
arXiv:2602.17743v1 Announce Type: new Abstract: Large language models adapt to new tasks through in-context learning (ICL) without parameter updates. Current theoretical explanations for this capability assume test tasks are drawn from a distribution similar to that seen during pretraining. This...
Calibrated Adaptation: Bayesian Stiefel Manifold Priors for Reliable Parameter-Efficient Fine-Tuning
arXiv:2602.17809v1 Announce Type: new Abstract: Parameter-efficient fine-tuning methods such as LoRA enable practical adaptation of large language models but provide no principled uncertainty estimates, leading to poorly calibrated predictions and unreliable behavior under domain shift. We introduce Stiefel-Bayes Adapters (SBA),...
Avoid What You Know: Divergent Trajectory Balance for GFlowNets
arXiv:2602.17827v1 Announce Type: new Abstract: Generative Flow Networks (GFlowNets) are a flexible family of amortized samplers trained to generate discrete and compositional objects with probability proportional to a reward function. However, learning efficiency is constrained by the model's ability to...
Distribution-Free Sequential Prediction with Abstentions
arXiv:2602.17918v1 Announce Type: new Abstract: We study a sequential prediction problem in which an adversary is allowed to inject arbitrarily many adversarial instances in a stream of i.i.d.\ instances, but at each round, the learner may also \emph{abstain} from making...
How AI agents could destroy the economy
Citrini Research imagines a report from two years in the future, in which unemployment has doubled and the total value of the stock market has fallen by more than a third.
Connecting the dots in trustworthy Artificial Intelligence: From AI principles, ethics, and key requirements to responsible AI systems and regulation
Trustworthy Artificial Intelligence (AI) is based on seven technical requirements sustained over three main pillars that should be met throughout the system’s entire life cycle: it should be (1) lawful, (2) ethical, and (3) robust, both from a technical and...
World-Model-Augmented Web Agents with Action Correction
arXiv:2602.15384v1 Announce Type: new Abstract: Web agents based on large language models have demonstrated promising capability in automating web tasks. However, current web agents struggle to reason out sensible actions due to the limitations of predicting environment changes, and might...
Improving LLM Reliability through Hybrid Abstention and Adaptive Detection
arXiv:2602.15391v1 Announce Type: new Abstract: Large Language Models (LLMs) deployed in production environments face a fundamental safety-utility trade-off either a strict filtering mechanisms prevent harmful outputs but often block benign queries or a relaxed controls risk unsafe content generation. Conventional...
On inferring cumulative constraints
arXiv:2602.15635v1 Announce Type: new Abstract: Cumulative constraints are central in scheduling with constraint programming, yet propagation is typically performed per constraint, missing multi-resource interactions and causing severe slowdowns on some benchmarks. I present a preprocessing method for inferring additional cumulative...
PERSONA: Dynamic and Compositional Inference-Time Personality Control via Activation Vector Algebra
arXiv:2602.15669v1 Announce Type: new Abstract: Current methods for personality control in Large Language Models rely on static prompting or expensive fine-tuning, failing to capture the dynamic and compositional nature of human traits. We introduce PERSONA, a training-free framework that achieves...
Enhancing Building Semantics Preservation in AI Model Training with Large Language Model Encodings
arXiv:2602.15791v1 Announce Type: new Abstract: Accurate representation of building semantics, encompassing both generic object types and specific subtypes, is essential for effective AI model training in the architecture, engineering, construction, and operation (AECO) industry. Conventional encoding methods (e.g., one-hot) often...
Safe-SDL:Establishing Safety Boundaries and Control Mechanisms for AI-Driven Self-Driving Laboratories
arXiv:2602.15061v1 Announce Type: cross Abstract: The emergence of Self-Driving Laboratories (SDLs) transforms scientific discovery methodology by integrating AI with robotic automation to create closed-loop experimental systems capable of autonomous hypothesis generation, experimentation, and analysis. While promising to compress research timelines...
TAROT: Test-driven and Capability-adaptive Curriculum Reinforcement Fine-tuning for Code Generation with Large Language Models
arXiv:2602.15449v1 Announce Type: new Abstract: Large Language Models (LLMs) are changing the coding paradigm, known as vibe coding, yet synthesizing algorithmically sophisticated and robust code still remains a critical challenge. Incentivizing the deep reasoning capabilities of LLMs is essential to...
Clinically Inspired Symptom-Guided Depression Detection from Emotion-Aware Speech Representations
arXiv:2602.15578v1 Announce Type: new Abstract: Depression manifests through a diverse set of symptoms such as sleep disturbance, loss of interest, and concentration difficulties. However, most existing works treat depression prediction either as a binary label or an overall severity score...
How Uncertain Is the Grade? A Benchmark of Uncertainty Metrics for LLM-Based Automatic Assessment
arXiv:2602.16039v1 Announce Type: new Abstract: The rapid rise of large language models (LLMs) is reshaping the landscape of automatic assessment in education. While these systems demonstrate substantial advantages in adaptability to diverse question types and flexibility in output formats, they...
GPSBench: Do Large Language Models Understand GPS Coordinates?
arXiv:2602.16105v1 Announce Type: new Abstract: Large Language Models (LLMs) are increasingly deployed in applications that interact with the physical world, such as navigation, robotics, or mapping, making robust geospatial reasoning a critical capability. Despite that, LLMs' ability to reason about...
Learning Personalized Agents from Human Feedback
arXiv:2602.16173v1 Announce Type: new Abstract: Modern AI agents are powerful but often fail to align with the idiosyncratic, evolving preferences of individual users. Prior approaches typically rely on static datasets, either training implicit preference models on interaction history or encoding...
EnterpriseGym Corecraft: Training Generalizable Agents on High-Fidelity RL Environments
arXiv:2602.16179v1 Announce Type: new Abstract: We show that training AI agents on high-fidelity reinforcement learning environments produces capabilities that generalize beyond the training distribution. We introduce \corecraft{}, the first environment in \textsc{EnterpriseGym}, Surge AI's suite of agentic RL environments. \corecraft{}...
Framework of Thoughts: A Foundation Framework for Dynamic and Optimized Reasoning based on Chains, Trees, and Graphs
arXiv:2602.16512v1 Announce Type: new Abstract: Prompting schemes such as Chain of Thought, Tree of Thoughts, and Graph of Thoughts can significantly enhance the reasoning capabilities of large language models. However, most existing schemes require users to define static, problem-specific reasoning...
EdgeNav-QE: QLoRA Quantization and Dynamic Early Exit for LAM-based Navigation on Edge Devices
arXiv:2602.15836v1 Announce Type: cross Abstract: Large Action Models (LAMs) have shown immense potential in autonomous navigation by bridging high-level reasoning with low-level control. However, deploying these multi-billion parameter models on edge devices remains a significant challenge due to memory constraints...
Artificial intelligence in nursing: Priorities and opportunities from an international invitational think‐tank of the Nursing and Artificial Intelligence Leadership Collaborative
Abstract Aim To develop a consensus paper on the central points of an international invitational think‐tank on nursing and artificial intelligence (AI). Methods We established the Nursing and Artificial Intelligence Leadership (NAIL) Collaborative, comprising interdisciplinary experts in AI development, biomedical...
Narrative Theory-Driven LLM Methods for Automatic Story Generation and Understanding: A Survey
arXiv:2602.15851v1 Announce Type: cross Abstract: Applications of narrative theories using large language models (LLMs) deliver promising use-cases in automatic story generation and understanding tasks. Our survey examines how natural language processing (NLP) research engages with fields of narrative studies, and...