Buffer Matters: Unleashing the Power of Off-Policy Reinforcement Learning in Large Language Model Reasoning
arXiv:2602.20722v1 Announce Type: new Abstract: Traditional on-policy Reinforcement Learning with Verifiable Rewards (RLVR) frameworks suffer from experience waste and reward homogeneity, which directly hinders learning efficiency on difficult samples during large language models post-training. In this paper, we introduce Batch...
Modality-Guided Mixture of Graph Experts with Entropy-Triggered Routing for Multimodal Recommendation
arXiv:2602.20723v1 Announce Type: new Abstract: Multimodal recommendation enhances ranking by integrating user-item interactions with item content, which is particularly effective under sparse feedback and long-tail distributions. However, multimodal signals are inherently heterogeneous and can conflict in specific contexts, making effective...
PyVision-RL: Forging Open Agentic Vision Models via RL
arXiv:2602.20739v1 Announce Type: new Abstract: Reinforcement learning for agentic multimodal models often suffers from interaction collapse, where models learn to reduce tool usage and multi-turn reasoning, limiting the benefits of agentic behavior. We introduce PyVision-RL, a reinforcement learning framework for...
Diagnosing Causal Reasoning in Vision-Language Models via Structured Relevance Graphs
arXiv:2602.20878v1 Announce Type: new Abstract: Large Vision-Language Models (LVLMs) achieve strong performance on visual question answering benchmarks, yet often rely on spurious correlations rather than genuine causal reasoning. Existing evaluations primarily assess the correctness of the answers, making it unclear...
Motivation is Something You Need
arXiv:2602.21064v1 Announce Type: new Abstract: This work introduces a novel training paradigm that draws from affective neuroscience. Inspired by the interplay of emotions and cognition in the human brain and more specifically the SEEKING motivational state, we design a dual-model...
The Initial Exploration Problem in Knowledge Graph Exploration
arXiv:2602.21066v1 Announce Type: new Abstract: Knowledge Graphs (KGs) enable the integration and representation of complex information across domains, but their semantic richness and structural complexity create substantial barriers for lay users without expertise in semantic web technologies. When encountering an...
Multimodal Multi-Agent Empowered Legal Judgment Prediction
arXiv:2601.12815v5 Announce Type: cross Abstract: Legal Judgment Prediction (LJP) aims to predict the outcomes of legal cases based on factual descriptions, serving as a fundamental task to advance the development of legal systems. Traditional methods often rely on statistical analyses...
Benchmarking Early Deterioration Prediction Across Hospital-Rich and MCI-Like Emergency Triage Under Constrained Sensing
arXiv:2602.20168v1 Announce Type: cross Abstract: Emergency triage decisions are made under severe information constraints, yet most data-driven deterioration models are evaluated using signals unavailable during initial assessment. We present a leakage-aware benchmarking framework for early deterioration prediction that evaluates model...
Benchmarking Distilled Language Models: Performance and Efficiency in Resource-Constrained Settings
arXiv:2602.20164v1 Announce Type: new Abstract: Knowledge distillation offers a transformative pathway to developing powerful, yet efficient, small language models (SLMs) suitable for resource-constrained environments. In this paper, we benchmark the performance and computational cost of distilled models against their vanilla...
InterviewSim: A Scalable Framework for Interview-Grounded Personality Simulation
arXiv:2602.20294v1 Announce Type: new Abstract: Simulating real personalities with large language models requires grounding generation in authentic personal data. Existing evaluation approaches rely on demographic surveys, personality questionnaires, or short AI-led interviews as proxies, but lack direct assessment against what...
Disentangling Geometry, Performance, and Training in Language Models
arXiv:2602.20433v1 Announce Type: new Abstract: Geometric properties of Transformer weights, particularly the unembedding matrix, have been widely useful in language model interpretability research. Yet, their utility for estimating downstream performance remains unclear. In this work, we systematically investigate the relationship...
Personal Information Parroting in Language Models
arXiv:2602.20580v1 Announce Type: new Abstract: Modern language models (LM) are trained on large scrapes of the Web, containing millions of personal information (PI) instances, many of which LMs memorize, increasing privacy risks. In this work, we develop the regexes and...
A Dynamic Survey of Soft Set Theory and Its Extensions
arXiv:2602.21268v1 Announce Type: new Abstract: Soft set theory provides a direct framework for parameterized decision modeling by assigning to each attribute (parameter) a subset of a given universe, thereby representing uncertainty in a structured way [1, 2]. Over the past...
The ASIR Courage Model: A Phase-Dynamic Framework for Truth Transitions in Human and AI Systems
arXiv:2602.21745v1 Announce Type: new Abstract: We introduce the ASIR (Awakened Shared Intelligence Relationship) Courage Model, a phase-dynamic framework that formalizes truth-disclosure as a state transition rather than a personality trait. The mode characterizes the shift from suppression (S0) to expression...
fEDM+: A Risk-Based Fuzzy Ethical Decision Making Framework with Principle-Level Explainability and Pluralistic Validation
arXiv:2602.21746v1 Announce Type: new Abstract: In a previous work, we introduced the fuzzy Ethical Decision-Making framework (fEDM), a risk-based ethical reasoning architecture grounded in fuzzy logic. The original model combined a fuzzy Ethical Risk Assessment module (fERA) with ethical decision...
Prompt Architecture Determines Reasoning Quality: A Variable Isolation Study on the Car Wash Problem
arXiv:2602.21814v1 Announce Type: new Abstract: Large language models consistently fail the "car wash problem," a viral reasoning benchmark requiring implicit physical constraint inference. We present a variable isolation study (n=20 per condition, 6 conditions, 120 total trials) examining which prompt...
Distill and Align Decomposition for Enhanced Claim Verification
arXiv:2602.21857v1 Announce Type: new Abstract: Complex claim verification requires decomposing sentences into verifiable subclaims, yet existing methods struggle to align decomposition quality with verification performance. We propose a reinforcement learning (RL) approach that jointly optimizes decomposition quality and verifier alignment...
2-Step Agent: A Framework for the Interaction of a Decision Maker with AI Decision Support
arXiv:2602.21889v1 Announce Type: new Abstract: Across a growing number of fields, human decision making is supported by predictions from AI models. However, we still lack a deep understanding of the effects of adoption of these technologies. In this paper, we...
Petri Net Relaxation for Infeasibility Explanation and Sequential Task Planning
arXiv:2602.22094v1 Announce Type: new Abstract: Plans often change due to changes in the situation or our understanding of the situation. Sometimes, a feasible plan may not even exist, and identifying such infeasibilities is useful to determine when requirements need adjustment....
Inference-time Alignment via Sparse Junction Steering
arXiv:2602.21215v1 Announce Type: cross Abstract: Token-level steering has emerged as a pivotal approach for inference-time alignment, enabling fine grained control over large language models by modulating their output distributions without parameter updates. While effective, existing methods rely on dense intervention...
EQ-5D Classification Using Biomedical Entity-Enriched Pre-trained Language Models and Multiple Instance Learning
arXiv:2602.21216v1 Announce Type: cross Abstract: The EQ-5D (EuroQol 5-Dimensions) is a standardized instrument for the evaluation of health-related quality of life. In health economics, systematic literature reviews (SLRs) depend on the correct identification of publications that use the EQ-5D, but...
Applied Sociolinguistic AI for Community Development (ASA-CD): A New Scientific Paradigm for Linguistically-Grounded Social Intervention
arXiv:2602.21217v1 Announce Type: cross Abstract: This paper establishes Applied Sociolinguistic AI for Community Development (ASA-CD) as a novel scientific paradigm for addressing community challenges through linguistically grounded, AI-enabled intervention. ASA-CD introduces three key contributions: (1) linguistic biomarkers as computational indicators...
Field-Theoretic Memory for AI Agents: Continuous Dynamics for Context Preservation
arXiv:2602.21220v1 Announce Type: cross Abstract: We present a memory system for AI agents that treats stored information as continuous fields governed by partial differential equations rather than discrete entries in a database. The approach draws from classical field theory: memories...
Task-Aware LoRA Adapter Composition via Similarity Retrieval in Vector Databases
arXiv:2602.21222v1 Announce Type: cross Abstract: Parameter efficient fine tuning methods like LoRA have enabled task specific adaptation of large language models, but efficiently composing multiple specialized adapters for unseen tasks remains challenging. We present a novel framework for dynamic LoRA...
Gains, Losses, and Judges: Framing and the Judiciary
ARTICLE Gains, Losses, and Judges: Framing and the Judiciary Jeffrey J. Rachlinski* & Andrew J. Wistrich** Losses hurt more than foregone gains—an asymmetry that psychologists call “loss aversion.” Losses cause more regret than foregone gains, and people struggle harder to...
Google looks to tackle longstanding RCS spam in India — but not alone
Google is integrating carrier-level filtering into RCS in India through a partnership with Airtel to strengthen protections against spam.
OpenAI reveals more details about its agreement with the Pentagon
By CEO Sam Altman’s own admission, OpenAI’s deal with the Department of Defense was “definitely rushed,” and “the optics don’t look good.”
Anthropic’s Claude rises to No. 1 in the App Store following Pentagon dispute
Anthropic’s chatbot Claude seems to have benefited from the attention around the company’s fraught negotiations with the Pentagon.
SaaS in, SaaS out: Here’s what’s driving the SaaSpocalypse
What's behind the SaaSpocalypse? It simply seems a new supreme has risen.
Global Trade Realignment: How Geopolitical Shifts Are Reshaping International Commerce
The global trade landscape is undergoing a fundamental transformation driven by geopolitical tensions, technological competition, and shifting alliances.