PERSONA: Dynamic and Compositional Inference-Time Personality Control via Activation Vector Algebra
arXiv:2602.15669v1 Announce Type: new Abstract: Current methods for personality control in Large Language Models rely on static prompting or expensive fine-tuning, failing to capture the dynamic and compositional nature of human traits. We introduce PERSONA, a training-free framework that achieves...
GlobeDiff: State Diffusion Process for Partial Observability in Multi-Agent Systems
arXiv:2602.15776v1 Announce Type: new Abstract: In the realm of multi-agent systems, the challenge of \emph{partial observability} is a critical barrier to effective coordination and decision-making. Existing approaches, such as belief state estimation and inter-agent communication, often fall short. Belief-based methods...
Enhancing Building Semantics Preservation in AI Model Training with Large Language Model Encodings
arXiv:2602.15791v1 Announce Type: new Abstract: Accurate representation of building semantics, encompassing both generic object types and specific subtypes, is essential for effective AI model training in the architecture, engineering, construction, and operation (AECO) industry. Conventional encoding methods (e.g., one-hot) often...
EduResearchBench: A Hierarchical Atomic Task Decomposition Benchmark for Full-Lifecycle Educational Research
arXiv:2602.15034v1 Announce Type: cross Abstract: While Large Language Models (LLMs) are reshaping the paradigm of AI for Social Science (AI4SS), rigorously evaluating their capabilities in scholarly writing remains a major challenge. Existing benchmarks largely emphasize single-shot, monolithic generation and thus...
CircuChain: Disentangling Competence and Compliance in LLM Circuit Analysis
arXiv:2602.15037v1 Announce Type: cross Abstract: As large language models (LLMs) advance toward expert-level performance in engineering domains, reliable reasoning under user-specified constraints becomes critical. In circuit analysis, for example, a numerically correct solution is insufficient if it violates established methodological...
Indic-TunedLens: Interpreting Multilingual Models in Indian Languages
arXiv:2602.15038v1 Announce Type: cross Abstract: Multilingual large language models (LLMs) are increasingly deployed in linguistically diverse regions like India, yet most interpretability tools remain tailored to English. Prior work reveals that LLMs often operate in English centric representation spaces, making...
GRACE: an Agentic AI for Particle Physics Experiment Design and Simulation
arXiv:2602.15039v1 Announce Type: cross Abstract: We present GRACE, a simulation-native agent for autonomous experimental design in high-energy and nuclear physics. Given multimodal input in the form of a natural-language prompt or a published experimental paper, the agent extracts a structured...
Reconstructing Carbon Monoxide Reanalysis with Machine Learning
arXiv:2602.15056v1 Announce Type: cross Abstract: The Copernicus Atmospheric Monitoring Service provides reanalysis products for atmospheric composition by combining model simulations with satellite observations. The quality of these products depends strongly on the availability of the observational data, which can vary...
CLOT: Closed-Loop Global Motion Tracking for Whole-Body Humanoid Teleoperation
arXiv:2602.15060v1 Announce Type: cross Abstract: Long-horizon whole-body humanoid teleoperation remains challenging due to accumulated global pose drift, particularly on full-sized humanoids. Although recent learning-based tracking methods enable agile and coordinated motions, they typically operate in the robot's local frame and...
Safe-SDL:Establishing Safety Boundaries and Control Mechanisms for AI-Driven Self-Driving Laboratories
arXiv:2602.15061v1 Announce Type: cross Abstract: The emergence of Self-Driving Laboratories (SDLs) transforms scientific discovery methodology by integrating AI with robotic automation to create closed-loop experimental systems capable of autonomous hypothesis generation, experimentation, and analysis. While promising to compress research timelines...
Structural Divergence Between AI-Agent and Human Social Networks in Moltbook
arXiv:2602.15064v1 Announce Type: cross Abstract: Large populations of AI agents are increasingly embedded in online environments, yet little is known about how their collective interaction patterns compare to human social systems. Here, we analyze the full interaction network of Moltbook,...
An effective Genetic Programming Hyper-Heuristic for Uncertain Agile Satellite Scheduling
arXiv:2602.15070v1 Announce Type: cross Abstract: This paper investigates a novel problem, namely the Uncertain Agile Earth Observation Satellite Scheduling Problem (UAEOSSP). Unlike the static AEOSSP, it takes into account a range of uncertain factors (e.g., task profit, resource consumption, and...
ExpertWeaver: Unlocking the Inherent MoE in Dense LLMs with GLU Activation Patterns
arXiv:2602.15521v1 Announce Type: new Abstract: Mixture-of-Experts (MoE) effectively scales model capacity while preserving computational efficiency through sparse expert activation. However, training high-quality MoEs from scratch is prohibitively expensive. A promising alternative is to convert pretrained dense models into sparse MoEs....
ZeroSyl: Simple Zero-Resource Syllable Tokenization for Spoken Language Modeling
arXiv:2602.15537v1 Announce Type: new Abstract: Pure speech language models aim to learn language directly from raw audio without textual resources. A key challenge is that discrete tokens from self-supervised speech encoders result in excessively long sequences, motivating recent work on...
Beyond Static Pipelines: Learning Dynamic Workflows for Text-to-SQL
arXiv:2602.15564v1 Announce Type: new Abstract: Text-to-SQL has recently achieved impressive progress, yet remains difficult to apply effectively in real-world scenarios. This gap stems from the reliance on single static workflows, fundamentally limiting scalability to out-of-distribution and long-tail scenarios. Instead of...
STAPO: Stabilizing Reinforcement Learning for LLMs by Silencing Rare Spurious Tokens
arXiv:2602.15620v1 Announce Type: new Abstract: Reinforcement Learning (RL) has significantly improved large language model reasoning, but existing RL fine-tuning methods rely heavily on heuristic techniques such as entropy regularization and reweighting to maintain stability. In practice, they often experience late-stage...
Evidence-Grounded Subspecialty Reasoning: Evaluating a Curated Clinical Intelligence Layer on the 2025 Endocrinology Board-Style Examination
arXiv:2602.16050v1 Announce Type: new Abstract: Background: Large language models have demonstrated strong performance on general medical examinations, but subspecialty clinical reasoning remains challenging due to rapidly evolving guidelines and nuanced evidence hierarchies. Methods: We evaluated January Mirror, an evidence-grounded clinical...
Leveraging Large Language Models for Causal Discovery: a Constraint-based, Argumentation-driven Approach
arXiv:2602.16481v1 Announce Type: new Abstract: Causal discovery seeks to uncover causal relations from data, typically represented as causal graphs, and is essential for predicting the effects of interventions. While expert knowledge is required to construct principled causal graphs, many statistical...
Agent Skill Framework: Perspectives on the Potential of Small Language Models in Industrial Environments
arXiv:2602.16653v1 Announce Type: new Abstract: Agent Skill framework, now widely and officially supported by major players such as GitHub Copilot, LangChain, and OpenAI, performs especially well with proprietary models by improving context engineering, reducing hallucinations, and boosting task accuracy. Based...
Redefining boundaries in innovation and knowledge domains: Investigating the impact of generative artificial intelligence on copyright and intellectual property rights
Narrative Theory-Driven LLM Methods for Automatic Story Generation and Understanding: A Survey
arXiv:2602.15851v1 Announce Type: cross Abstract: Applications of narrative theories using large language models (LLMs) deliver promising use-cases in automatic story generation and understanding tasks. Our survey examines how natural language processing (NLP) research engages with fields of narrative studies, and...
Kalman-Inspired Runtime Stability and Recovery in Hybrid Reasoning Systems
arXiv:2602.15855v1 Announce Type: cross Abstract: Hybrid reasoning systems that combine learned components with model-based inference are increasingly deployed in tool-augmented decision loops, yet their runtime behavior under partial observability and sustained evidence mismatch remains poorly understood. In practice, failures often...
Rethinking Soft Compression in Retrieval-Augmented Generation: A Query-Conditioned Selector Perspective
arXiv:2602.15856v1 Announce Type: cross Abstract: Retrieval-Augmented Generation (RAG) effectively grounds Large Language Models (LLMs) with external knowledge and is widely applied to Web-related tasks. However, its scalability is hindered by excessive context length and redundant retrievals. Recent research on soft...
AIdentifyAGE Ontology for Decision Support in Forensic Dental Age Assessment
arXiv:2602.16714v1 Announce Type: new Abstract: Age assessment is crucial in forensic and judicial decision-making, particularly in cases involving undocumented individuals and unaccompanied minors, where legal thresholds determine access to protection, healthcare, and judicial procedures. Dental age assessment is widely recognized...
IndicJR: A Judge-Free Benchmark of Jailbreak Robustness in South Asian Languages
arXiv:2602.16832v1 Announce Type: new Abstract: Safety alignment of large language models (LLMs) is mostly evaluated in English and contract-bound, leaving multilingual vulnerabilities understudied. We introduce \textbf{Indic Jailbreak Robustness (IJR)}, a judge-free benchmark for adversarial safety across 12 Indic and South...
Narrow fine-tuning erodes safety alignment in vision-language agents
arXiv:2602.16931v1 Announce Type: new Abstract: Lifelong multimodal agents must continuously adapt to new tasks through post-training, but this creates fundamental tension between acquiring capabilities and preserving safety alignment. We demonstrate that fine-tuning aligned vision-language models on narrow-domain harmful datasets induces...
HQFS: Hybrid Quantum Classical Financial Security with VQC Forecasting, QUBO Annealing, and Audit-Ready Post-Quantum Signing
arXiv:2602.16976v1 Announce Type: new Abstract: Here's the corrected paragraph with all punctuation and formatting issues fixed: Financial risk systems usually follow a two-step routine: a model predicts return or risk, and then an optimizer makes a decision such as a...
Fundamental Limits of Black-Box Safety Evaluation: Information-Theoretic and Computational Barriers from Latent Context Conditioning
arXiv:2602.16984v1 Announce Type: new Abstract: Black-box safety evaluation of AI systems assumes model behavior on test distributions reliably predicts deployment performance. We formalize and challenge this assumption through latent context-conditioned policies -- models whose outputs depend on unobserved internal variables...
RFEval: Benchmarking Reasoning Faithfulness under Counterfactual Reasoning Intervention in Large Reasoning Models
arXiv:2602.17053v1 Announce Type: new Abstract: Large Reasoning Models (LRMs) exhibit strong performance, yet often produce rationales that sound plausible but fail to reflect their true decision process, undermining reliability and trust. We introduce a formal framework for reasoning faithfulness, defined...
Continual learning and refinement of causal models through dynamic predicate invention
arXiv:2602.17217v1 Announce Type: new Abstract: Efficiently navigating complex environments requires agents to internalize the underlying logic of their world, yet standard world modelling methods often struggle with sample inefficiency, lack of transparency, and poor scalability. We propose a framework for...