Do Personality Traits Interfere? Geometric Limitations of Steering in Large Language Models
arXiv:2602.15847v1 Announce Type: cross Abstract: Personality steering in large language models (LLMs) commonly relies on injecting trait-specific steering vectors, implicitly assuming that personality traits can be controlled independently. In this work, we examine whether this assumption holds by analysing the...
Institutionalizing trust in AI governance: from ethical principles to legal design
Building Safe and Deployable Clinical Natural Language Processing under Temporal Leakage Constraints
arXiv:2602.15852v1 Announce Type: cross Abstract: Clinical natural language processing (NLP) models have shown promise for supporting hospital discharge planning by leveraging narrative clinical documentation. However, note-based models are particularly vulnerable to temporal and lexical leakage, where documentation artifacts encode future...
Rethinking Soft Compression in Retrieval-Augmented Generation: A Query-Conditioned Selector Perspective
arXiv:2602.15856v1 Announce Type: cross Abstract: Retrieval-Augmented Generation (RAG) effectively grounds Large Language Models (LLMs) with external knowledge and is widely applied to Web-related tasks. However, its scalability is hindered by excessive context length and redundant retrievals. Recent research on soft...
State Design Matters: How Representations Shape Dynamic Reasoning in Large Language Models
arXiv:2602.15858v1 Announce Type: cross Abstract: As large language models (LLMs) move from static reasoning tasks toward dynamic environments, their success depends on the ability to navigate and respond to an environment that changes as they interact at inference time. An...
NLP Privacy Risk Identification in Social Media (NLP-PRISM): A Survey
arXiv:2602.15866v1 Announce Type: cross Abstract: Natural Language Processing (NLP) is integral to social media analytics but often processes content containing Personally Identifiable Information (PII), behavioral cues, and metadata raising privacy risks such as surveillance, profiling, and targeted advertising. To systematically...
Fly0: Decoupling Semantic Grounding from Geometric Planning for Zero-Shot Aerial Navigation
arXiv:2602.15875v1 Announce Type: cross Abstract: Current Visual-Language Navigation (VLN) methodologies face a trade-off between semantic understanding and control precision. While Multimodal Large Language Models (MLLMs) offer superior reasoning, deploying them as low-level controllers leads to high latency, trajectory oscillations, and...
IT-OSE: Exploring Optimal Sample Size for Industrial Data Augmentation
arXiv:2602.15878v1 Announce Type: cross Abstract: In industrial scenarios, data augmentation is an effective approach to improve model performance. However, its benefits are not unidirectionally beneficial. There is no theoretical research or established estimation for the optimal sample size (OSS) in...
FUTURE-VLA: Forecasting Unified Trajectories Under Real-time Execution
arXiv:2602.15882v1 Announce Type: cross Abstract: General vision-language models increasingly support unified spatiotemporal reasoning over long video streams, yet deploying such capabilities on robots remains constrained by the prohibitive latency of processing long-horizon histories and generating high-dimensional future predictions. To bridge...
Evidence for Daily and Weekly Periodic Variability in GPT-4o Performance
arXiv:2602.15889v1 Announce Type: cross Abstract: Large language models (LLMs) are increasingly used in research both as tools and as objects of investigation. Much of this work implicitly assumes that LLM performance under fixed conditions (identical model snapshot, hyperparameters, and prompt)...
Doc-to-LoRA: Learning to Instantly Internalize Contexts
arXiv:2602.15902v1 Announce Type: cross Abstract: Long input sequences are central to in-context learning, document understanding, and multi-step reasoning of Large Language Models (LLMs). However, the quadratic attention cost of Transformers makes inference memory-intensive and slow. While context distillation (CD) can...
Retrieval Augmented (Knowledge Graph), and Large Language Model-Driven Design Structure Matrix (DSM) Generation of Cyber-Physical Systems
arXiv:2602.16715v1 Announce Type: new Abstract: We explore the potential of Large Language Models (LLMs), Retrieval-Augmented Generation (RAG), and Graph-based RAG (GraphRAG) for generating Design Structure Matrices (DSMs). We test these methods on two distinct use cases -- a power screwdriver...
Mobility-Aware Cache Framework for Scalable LLM-Based Human Mobility Simulation
arXiv:2602.16727v1 Announce Type: new Abstract: Large-scale human mobility simulation is critical for applications such as urban planning, epidemiology, and transportation analysis. Recent works treat large language models (LLMs) as human agents to simulate realistic mobility behaviors using structured reasoning, but...
Improved Upper Bounds for Slicing the Hypercube
arXiv:2602.16807v1 Announce Type: new Abstract: A collection of hyperplanes $\mathcal{H}$ slices all edges of the $n$-dimensional hypercube $Q_n$ with vertex set $\{-1,1\}^n$ if, for every edge $e$ in the hypercube, there exists a hyperplane in $\mathcal{H}$ intersecting $e$ in its...
Node Learning: A Framework for Adaptive, Decentralised and Collaborative Network Edge AI
arXiv:2602.16814v1 Announce Type: new Abstract: The expansion of AI toward the edge increasingly exposes the cost and fragility of cen- tralised intelligence. Data transmission, latency, energy consumption, and dependence on large data centres create bottlenecks that scale poorly across heterogeneous,...
AgentLAB: Benchmarking LLM Agents against Long-Horizon Attacks
arXiv:2602.16901v1 Announce Type: new Abstract: LLM agents are increasingly deployed in long-horizon, complex environments to solve challenging problems, but this expansion exposes them to long-horizon attacks that exploit multi-turn user-agent-environment interactions to achieve objectives infeasible in single-turn settings. To measure...
LLM-WikiRace: Benchmarking Long-term Planning and Reasoning over Real-World Knowledge Graphs
arXiv:2602.16902v1 Announce Type: new Abstract: We introduce LLM-Wikirace, a benchmark for evaluating planning, reasoning, and world knowledge in large language models (LLMs). In LLM-Wikirace, models must efficiently navigate Wikipedia hyperlinks step by step to reach a target page from a...
DeepContext: Stateful Real-Time Detection of Multi-Turn Adversarial Intent Drift in LLMs
arXiv:2602.16935v1 Announce Type: new Abstract: While Large Language Model (LLM) capabilities have scaled, safety guardrails remain largely stateless, treating multi-turn dialogues as a series of disconnected events. This lack of temporal awareness facilitates a "Safety Gap" where adversarial tactics, like...
LLM4Cov: Execution-Aware Agentic Learning for High-coverage Testbench Generation
arXiv:2602.16953v1 Announce Type: new Abstract: Execution-aware LLM agents offer a promising paradigm for learning from tool feedback, but such feedback is often expensive and slow to obtain, making online reinforcement learning (RL) impractical. High-coverage hardware verification exemplifies this challenge due...
HQFS: Hybrid Quantum Classical Financial Security with VQC Forecasting, QUBO Annealing, and Audit-Ready Post-Quantum Signing
arXiv:2602.16976v1 Announce Type: new Abstract: Here's the corrected paragraph with all punctuation and formatting issues fixed: Financial risk systems usually follow a two-step routine: a model predicts return or risk, and then an optimizer makes a decision such as a...
M2F: Automated Formalization of Mathematical Literature at Scale
arXiv:2602.17016v1 Announce Type: new Abstract: Automated formalization of mathematics enables mechanical verification but remains limited to isolated theorems and short snippets. Scaling to textbooks and research papers is largely unaddressed, as it requires managing cross-file dependencies, resolving imports, and ensuring...
IntentCUA: Learning Intent-level Representations for Skill Abstraction and Multi-Agent Planning in Computer-Use Agents
arXiv:2602.17049v1 Announce Type: new Abstract: Computer-use agents operate over long horizons under noisy perception, multi-window contexts, evolving environment states. Existing approaches, from RL-based planners to trajectory retrieval, often drift from user intent and repeatedly solve routine subproblems, leading to error...
Retaining Suboptimal Actions to Follow Shifting Optima in Multi-Agent Reinforcement Learning
arXiv:2602.17062v1 Announce Type: new Abstract: Value decomposition is a core approach for cooperative multi-agent reinforcement learning (MARL). However, existing methods still rely on a single optimal action and struggle to adapt when the underlying value function shifts during training, often...
How AI Coding Agents Communicate: A Study of Pull Request Description Characteristics and Human Review Responses
arXiv:2602.17084v1 Announce Type: new Abstract: The rapid adoption of large language models has led to the emergence of AI coding agents that autonomously create pull requests on GitHub. However, how these agents differ in their pull request description characteristics, and...
Owen-based Semantics and Hierarchy-Aware Explanation (O-Shap)
arXiv:2602.17107v1 Announce Type: new Abstract: Shapley value-based methods have become foundational in explainable artificial intelligence (XAI), offering theoretically grounded feature attributions through cooperative game theory. However, in practice, particularly in vision tasks, the assumption of feature independence breaks down, as...
Instructor-Aligned Knowledge Graphs for Personalized Learning
arXiv:2602.17111v1 Announce Type: new Abstract: Mastering educational concepts requires understanding both their prerequisites (e.g., recursion before merge sort) and sub-concepts (e.g., merge sort as part of sorting algorithms). Capturing these dependencies is critical for identifying students' knowledge gaps and enabling...
Epistemology of Generative AI: The Geometry of Knowing
arXiv:2602.17116v1 Announce Type: new Abstract: Generative AI presents an unprecedented challenge to our understanding of knowledge and its production. Unlike previous technological transformations, where engineering understanding preceded or accompanied deployment, generative AI operates through mechanisms whose epistemic character remains obscure,...
Bonsai: A Framework for Convolutional Neural Network Acceleration Using Criterion-Based Pruning
arXiv:2602.17145v1 Announce Type: new Abstract: As the need for more accurate and powerful Convolutional Neural Networks (CNNs) increases, so too does the size, execution time, memory footprint, and power consumption. To overcome this, solutions such as pruning have been proposed...
Continual learning and refinement of causal models through dynamic predicate invention
arXiv:2602.17217v1 Announce Type: new Abstract: Efficiently navigating complex environments requires agents to internalize the underlying logic of their world, yet standard world modelling methods often struggle with sample inefficiency, lack of transparency, and poor scalability. We propose a framework for...
All Leaks Count, Some Count More: Interpretable Temporal Contamination Detection in LLM Backtesting
arXiv:2602.17234v1 Announce Type: new Abstract: To evaluate whether LLMs can accurately predict future events, we need the ability to \textit{backtest} them on events that have already resolved. This requires models to reason only with information available at a specified past...