Human-Centered Explainable AI for Security Enhancement: A Deep Intrusion Detection Framework
arXiv:2602.13271v1 Announce Type: new Abstract: The increasing complexity and frequency of cyber-threats demand intrusion detection systems (IDS) that are not only accurate but also interpretable. This paper presented a novel IDS framework that integrated Explainable Artificial Intelligence (XAI) to enhance...
TemporalBench: A Benchmark for Evaluating LLM-Based Agents on Contextual and Event-Informed Time Series Tasks
arXiv:2602.13272v1 Announce Type: new Abstract: It is unclear whether strong forecasting performance reflects genuine temporal understanding or the ability to reason under contextual and event-driven conditions. We introduce TemporalBench, a multi-domain benchmark designed to evaluate temporal reasoning behavior under progressively...
Artificial Organisations
arXiv:2602.13275v1 Announce Type: new Abstract: Alignment research focuses on making individual AI systems reliable. Human institutions achieve reliable collective behaviour differently: they mitigate the risk posed by misaligned individuals through organisational structure. Multi-agent AI systems should follow this institutional model...
BEAGLE: Behavior-Enforced Agent for Grounded Learner Emulation
arXiv:2602.13280v1 Announce Type: new Abstract: Simulating student learning behaviors in open-ended problem-solving environments holds potential for education research, from training adaptive tutoring systems to stress-testing pedagogical interventions. However, collecting authentic data is challenging due to privacy concerns and the high...
Mirror: A Multi-Agent System for AI-Assisted Ethics Review
arXiv:2602.13292v1 Announce Type: new Abstract: Ethics review is a foundational mechanism of modern research governance, yet contemporary systems face increasing strain as ethical risks arise as structural consequences of large-scale, interdisciplinary scientific practice. The demand for consistent and defensible decisions...
Situation Graph Prediction: Structured Perspective Inference for User Modeling
arXiv:2602.13319v1 Announce Type: new Abstract: Perspective-Aware AI requires modeling evolving internal states--goals, emotions, contexts--not merely preferences. Progress is limited by a data bottleneck: digital footprints are privacy-sensitive and perspective states are rarely labeled. We propose Situation Graph Prediction (SGP), a...
Information Fidelity in Tool-Using LLM Agents: A Martingale Analysis of the Model Context Protocol
arXiv:2602.13320v1 Announce Type: new Abstract: As AI agents powered by large language models (LLMs) increasingly use external tools for high-stakes decisions, a critical reliability question arises: how do errors propagate across sequential tool calls? We introduce the first theoretical framework...
Detecting Jailbreak Attempts in Clinical Training LLMs Through Automated Linguistic Feature Extraction
arXiv:2602.13321v1 Announce Type: new Abstract: Detecting jailbreak attempts in clinical training large language models (LLMs) requires accurate modeling of linguistic deviations that signal unsafe or off-task user behavior. Prior work on the 2-Sigma clinical simulation platform showed that manually annotated...
Contrastive explanations of BDI agents
arXiv:2602.13323v1 Announce Type: new Abstract: The ability of autonomous systems to provide explanations is important for supporting transparency and aiding the development of (appropriate) trust. Prior work has defined a mechanism for Belief-Desire-Intention (BDI) agents to be able to answer...
Nanbeige4.1-3B: A Small General Model that Reasons, Aligns, and Acts
arXiv:2602.13367v1 Announce Type: new Abstract: We present Nanbeige4.1-3B, a unified generalist language model that simultaneously achieves strong agentic behavior, code generation, and general reasoning with only 3B parameters. To the best of our knowledge, it is the first open-source small...
On-Policy Supervised Fine-Tuning for Efficient Reasoning
arXiv:2602.13407v1 Announce Type: new Abstract: Large reasoning models (LRMs) are commonly trained with reinforcement learning (RL) to explore long chain-of-thought reasoning, achieving strong performance at high computational cost. Recent methods add multi-reward objectives to jointly optimize correctness and brevity, but...
NeuroWeaver: An Autonomous Evolutionary Agent for Exploring the Programmatic Space of EEG Analysis Pipelines
arXiv:2602.13473v1 Announce Type: new Abstract: Although foundation models have demonstrated remarkable success in general domains, the application of these models to electroencephalography (EEG) analysis is constrained by substantial data requirements and high parameterization. These factors incur prohibitive computational costs, thereby...
OMNI-LEAK: Orchestrator Multi-Agent Network Induced Data Leakage
arXiv:2602.13477v1 Announce Type: new Abstract: As Large Language Model (LLM) agents become more capable, their coordinated use in the form of multi-agent systems is anticipated to emerge as a practical paradigm. Prior work has examined the safety and misuse risks...
OpAgent: Operator Agent for Web Navigation
arXiv:2602.13559v1 Announce Type: new Abstract: To fulfill user instructions, autonomous web agents must contend with the inherent complexity and volatile nature of real-world websites. Conventional paradigms predominantly rely on Supervised Fine-Tuning (SFT) or Offline Reinforcement Learning (RL) using static datasets....
Differentiable Rule Induction from Raw Sequence Inputs
arXiv:2602.13583v1 Announce Type: new Abstract: Rule learning-based models are widely used in highly interpretable scenarios due to their transparent structures. Inductive logic programming (ILP), a form of machine learning, induces rules from facts while maintaining interpretability. Differentiable ILP models enhance...
A First Proof Sprint
arXiv:2602.13587v1 Announce Type: new Abstract: This monograph reports a multi-agent proof sprint on ten research-level problems, combining rapid draft generation with adversarial verification, targeted repair, and explicit provenance. The workflow uses wiring-diagram decompositions of claim dependencies to localize gaps and...
Hippocampus: An Efficient and Scalable Memory Module for Agentic AI
arXiv:2602.13594v1 Announce Type: new Abstract: Agentic AI require persistent memory to store user-specific histories beyond the limited context window of LLMs. Existing memory systems use dense vector databases or knowledge-graph traversal (or hybrid), incurring high retrieval latency and poor storage...
The Quantization Trap: Breaking Linear Scaling Laws in Multi-Hop Reasoning
arXiv:2602.13595v1 Announce Type: new Abstract: Neural scaling laws provide a predictable recipe for AI advancement: reducing numerical precision should linearly improve computational efficiency and energy profile (E proportional to bits). In this paper, we demonstrate that this scaling law breaks...
Guided Collaboration in Heterogeneous LLM-Based Multi-Agent Systems via Entropy-Based Understanding Assessment and Experience Retrieval
arXiv:2602.13639v1 Announce Type: new Abstract: With recent breakthroughs in large language models (LLMs) for reasoning, planning, and complex task generation, artificial intelligence systems are transitioning from isolated single-agent architectures to multi-agent systems with collaborative intelligence. However, in heterogeneous multi-agent systems...
Building Autonomous GUI Navigation via Agentic-Q Estimation and Step-Wise Policy Optimization
arXiv:2602.13653v1 Announce Type: new Abstract: Recent advances in Multimodal Large Language Models (MLLMs) have substantially driven the progress of autonomous agents for Graphical User Interface (GUI). Nevertheless, in real-world applications, GUI agents are often faced with non-stationary environments, leading to...
AllMem: A Memory-centric Recipe for Efficient Long-context Modeling
arXiv:2602.13680v1 Announce Type: new Abstract: Large Language Models (LLMs) encounter significant performance bottlenecks in long-sequence tasks due to the computational complexity and memory overhead inherent in the self-attention mechanism. To address these challenges, we introduce \textsc{AllMem}, a novel and efficient...
PhGPO: Pheromone-Guided Policy Optimization for Long-Horizon Tool Planning
arXiv:2602.13691v1 Announce Type: new Abstract: Recent advancements in Large Language Model (LLM) agents have demonstrated strong capabilities in executing complex tasks through tool use. However, long-horizon multi-step tool planning is challenging, because the exploration space suffers from a combinatorial explosion....
Using Machine Learning to Enhance the Detection of Obfuscated Abusive Words in Swahili: A Focus on Child Safety
arXiv:2602.13455v1 Announce Type: new Abstract: The rise of digital technology has dramatically increased the potential for cyberbullying and online abuse, necessitating enhanced measures for detection and prevention, especially among children. This study focuses on detecting abusive obfuscated language in Swahili,...
Language Model Memory and Memory Models for Language
arXiv:2602.13466v1 Announce Type: new Abstract: The ability of machine learning models to store input information in hidden layer vector embeddings, analogous to the concept of `memory', is widely employed but not well characterized. We find that language model embeddings typically...
From Perceptions To Evidence: Detecting AI-Generated Content In Turkish News Media With A Fine-Tuned Bert Classifier
arXiv:2602.13504v1 Announce Type: new Abstract: The rapid integration of large language models into newsroom workflows has raised urgent questions about the prevalence of AI-generated content in online media. While computational studies have begun to quantify this phenomenon in English-language outlets,...
Think Deep, Not Just Long: Measuring LLM Reasoning Effort via Deep-Thinking Tokens
arXiv:2602.13517v1 Announce Type: new Abstract: Large language models (LLMs) have demonstrated impressive reasoning capabilities by scaling test-time compute via long Chain-of-Thought (CoT). However, recent findings suggest that raw token counts are unreliable proxies for reasoning quality: increased generation length does...
Elo-Evolve: A Co-evolutionary Framework for Language Model Alignment
arXiv:2602.13575v1 Announce Type: new Abstract: Current alignment methods for Large Language Models (LLMs) rely on compressing vast amounts of human preference data into static, absolute reward functions, leading to data scarcity, noise sensitivity, and training instability. We introduce Elo-Evolve, a...
Metaphors' journeys across time and genre: tracking the evolution of literary metaphors with temporal embeddings
arXiv:2602.13701v1 Announce Type: new Abstract: Metaphors are a distinctive feature of literary language, yet they remain less studied experimentally than everyday metaphors. Moreover, previous psycholinguistic and computational approaches overlooked the temporal dimension, although many literary metaphors were coined centuries apart...
On Theoretically-Driven LLM Agents for Multi-Dimensional Discourse Analysis
arXiv:2602.13713v1 Announce Type: new Abstract: Identifying the strategic uses of reformulation in discourse remains a key challenge for computational argumentation. While LLMs can detect surface-level similarity, they often fail to capture the pragmatic functions of rephrasing, such as its role...
RMPL: Relation-aware Multi-task Progressive Learning with Stage-wise Training for Multimedia Event Extraction
arXiv:2602.13748v1 Announce Type: new Abstract: Multimedia Event Extraction (MEE) aims to identify events and their arguments from documents that contain both text and images. It requires grounding event semantics across different modalities. Progress in MEE is limited by the lack...