International Law

LOW Academic European Union

Learning to Rewrite Tool Descriptions for Reliable LLM-Agent Tool Use

arXiv:2602.20426v1 Announce Type: new Abstract: The performance of LLM-based agents depends not only on the agent itself but also on the quality of the tool interfaces it consumes. While prior work has focused heavily on agent fine-tuning, tool interfaces-including natural...

1 min 1 month, 3 weeks ago

ear

LOW Academic International

PreScience: A Benchmark for Forecasting Scientific Contributions

arXiv:2602.20459v1 Announce Type: new Abstract: Can AI systems trained on the scientific record up to a fixed point in time forecast the scientific advances that follow? Such a capability could help researchers identify collaborators and impactful research directions, and anticipate...

1 min 1 month, 3 weeks ago

ear

LOW Academic United States

KairosVL: Orchestrating Time Series and Semantics for Unified Reasoning

arXiv:2602.20494v1 Announce Type: new Abstract: Driven by the increasingly complex and decision-oriented demands of time series analysis, we introduce the Semantic-Conditional Time Series Reasoning task, which extends conventional time series analysis beyond purely numerical modeling to incorporate contextual and semantic...

1 min 1 month, 3 weeks ago

ear

LOW Academic International

Inner Speech as Behavior Guides: Steerable Imitation of Diverse Behaviors for Human-AI coordination

arXiv:2602.20517v1 Announce Type: new Abstract: Effective human-AI coordination requires artificial agents capable of exhibiting and responding to human-like behaviors while adapting to changing contexts. Imitation learning has emerged as one of the prominent approaches to build such agents by training...

1 min 1 month, 3 weeks ago

ear

LOW Academic International

From Logs to Language: Learning Optimal Verbalization for LLM-Based Recommendation in Production

arXiv:2602.20558v1 Announce Type: new Abstract: Large language models (LLMs) are promising backbones for generative recommender systems, yet a key challenge remains underexplored: verbalization, i.e., converting structured user interaction logs into effective natural language inputs. Existing methods rely on rigid templates...

1 min 1 month, 3 weeks ago

ear

LOW Academic United States

CausalReasoningBenchmark: A Real-World Benchmark for Disentangled Evaluation of Causal Identification and Estimation

arXiv:2602.20571v1 Announce Type: new Abstract: Many benchmarks for automated causal inference evaluate a system's performance based on a single numerical output, such as an Average Treatment Effect (ATE). This approach conflates two distinct steps in causal analysis: identification-formulating a valid...

1 min 1 month, 3 weeks ago

ear

LOW Academic United States

Physics-based phenomenological characterization of cross-modal bias in multimodal models

arXiv:2602.20624v1 Announce Type: new Abstract: The term 'algorithmic fairness' is used to evaluate whether AI models operate fairly in both comparative (where fairness is understood as formal equality, such as "treat like cases as like") and non-comparative (where unfairness arises...

1 min 1 month, 3 weeks ago

ear

LOW Academic International

When can we trust untrusted monitoring? A safety case sketch across collusion strategies

arXiv:2602.20628v1 Announce Type: new Abstract: AIs are increasingly being deployed with greater autonomy and capabilities, which increases the risk that a misaligned AI may be able to cause catastrophic harm. Untrusted monitoring -- using one untrusted model to oversee another...

1 min 1 month, 3 weeks ago

ear

LOW Academic International

Identifying two piecewise linear additive value functions from anonymous preference information

arXiv:2602.20638v1 Announce Type: new Abstract: Eliciting a preference model involves asking a person, named decision-maker, a series of questions. We assume that these preferences can be represented by an additive value function. In this work, we query simultaneously two decision-makers...

1 min 1 month, 3 weeks ago

ear

LOW Academic International

How Foundational Skills Influence VLM-based Embodied Agents:A Native Perspective

arXiv:2602.20687v1 Announce Type: new Abstract: Recent advances in vision-language models (VLMs) have shown promise for human-level embodied intelligence. However, existing benchmarks for VLM-driven embodied agents often rely on high-level commands or discretized action spaces, which are non-native settings that differ...

1 min 1 month, 3 weeks ago

ear

LOW Academic United States

Online Algorithms with Unreliable Guidance

arXiv:2602.20706v1 Announce Type: new Abstract: This paper introduces a new model for ML-augmented online decision making, called online algorithms with unreliable guidance (OAG). This model completely separates between the predictive and algorithmic components, thus offering a single well-defined analysis framework...

1 min 1 month, 3 weeks ago

ear

LOW Academic European Union

Buffer Matters: Unleashing the Power of Off-Policy Reinforcement Learning in Large Language Model Reasoning

arXiv:2602.20722v1 Announce Type: new Abstract: Traditional on-policy Reinforcement Learning with Verifiable Rewards (RLVR) frameworks suffer from experience waste and reward homogeneity, which directly hinders learning efficiency on difficult samples during large language models post-training. In this paper, we introduce Batch...

1 min 1 month, 3 weeks ago

ear

LOW Academic International

Modality-Guided Mixture of Graph Experts with Entropy-Triggered Routing for Multimodal Recommendation

arXiv:2602.20723v1 Announce Type: new Abstract: Multimodal recommendation enhances ranking by integrating user-item interactions with item content, which is particularly effective under sparse feedback and long-tail distributions. However, multimodal signals are inherently heterogeneous and can conflict in specific contexts, making effective...

1 min 1 month, 3 weeks ago

ear

LOW Academic International

Balancing Multiple Objectives in Urban Traffic Control with Reinforcement Learning from AI Feedback

arXiv:2602.20728v1 Announce Type: new Abstract: Reward design has been one of the central challenges for real world reinforcement learning (RL) deployment, especially in settings with multiple objectives. Preference-based RL offers an appealing alternative by learning from human preferences over pairs...

1 min 1 month, 3 weeks ago

ear

LOW Academic International

PyVision-RL: Forging Open Agentic Vision Models via RL

arXiv:2602.20739v1 Announce Type: new Abstract: Reinforcement learning for agentic multimodal models often suffers from interaction collapse, where models learn to reduce tool usage and multi-turn reasoning, limiting the benefits of agentic behavior. We introduce PyVision-RL, a reinforcement learning framework for...

1 min 1 month, 3 weeks ago

ear

LOW Academic International

POMDPPlanners: Open-Source Package for POMDP Planning

arXiv:2602.20810v1 Announce Type: new Abstract: We present POMDPPlanners, an open-source Python package for empirical evaluation of Partially Observable Markov Decision Process (POMDP) planning algorithms. The package integrates state-of-the-art planning algorithms, a suite of benchmark environments with safety-critical variants, automated hyperparameter...

1 min 1 month, 3 weeks ago

ear

LOW Academic International

Pressure Reveals Character: Behavioural Alignment Evaluation at Depth

arXiv:2602.20813v1 Announce Type: new Abstract: Evaluating alignment in language models requires testing how they behave under realistic pressure, not just what they claim they would do. While alignment failures increasingly cause real-world harm, comprehensive evaluation frameworks with realistic multi-turn scenarios...

1 min 1 month, 3 weeks ago

ear

LOW Academic International

Diagnosing Causal Reasoning in Vision-Language Models via Structured Relevance Graphs

arXiv:2602.20878v1 Announce Type: new Abstract: Large Vision-Language Models (LVLMs) achieve strong performance on visual question answering benchmarks, yet often rely on spurious correlations rather than genuine causal reasoning. Existing evaluations primarily assess the correctness of the answers, making it unclear...

1 min 1 month, 3 weeks ago

ear

LOW Academic European Union

Predicting Sentence Acceptability Judgments in Multimodal Contexts

arXiv:2602.20918v1 Announce Type: new Abstract: Previous work has examined the capacity of deep neural networks (DNNs), particularly transformers, to predict human sentence acceptability judgments, both independently of context, and in document contexts. We consider the effect of prior exposure to...

1 min 1 month, 3 weeks ago

ear

LOW Academic International

Architecting AgentOS: From Token-Level Context to Emergent System-Level Intelligence

arXiv:2602.20934v1 Announce Type: new Abstract: The paradigm of Large Language Models is undergoing a fundamental transition from static inference engines to dynamic autonomous cognitive systems.While current research primarily focuses on scaling context windows or optimizing prompt engineering the theoretical bridge...

1 min 1 month, 3 weeks ago

ear

LOW Academic European Union

LogicGraph : Benchmarking Multi-Path Logical Reasoning via Neuro-Symbolic Generation and Verification

arXiv:2602.21044v1 Announce Type: new Abstract: Evaluations of large language models (LLMs) primarily emphasize convergent logical reasoning, where success is defined by producing a single correct proof. However, many real-world reasoning problems admit multiple valid derivations, requiring models to explore diverse...

1 min 1 month, 3 weeks ago

ear

LOW Academic International

Tool Building as a Path to "Superintelligence"

arXiv:2602.21061v1 Announce Type: new Abstract: The Diligent Learner framework suggests LLMs can achieve superintelligence via test-time search, provided a sufficient step-success probability $\gamma$. In this work, we design a benchmark to measure $\gamma$ on logical out-of-distribution inference. We construct a...

1 min 1 month, 3 weeks ago

ear

LOW Academic International

The Initial Exploration Problem in Knowledge Graph Exploration

arXiv:2602.21066v1 Announce Type: new Abstract: Knowledge Graphs (KGs) enable the integration and representation of complex information across domains, but their semantic richness and structural complexity create substantial barriers for lay users without expertise in semantic web technologies. When encountering an...

1 min 1 month, 3 weeks ago

ear

LOW Academic United States

CG-DMER: Hybrid Contrastive-Generative Framework for Disentangled Multimodal ECG Representation Learning

arXiv:2602.21154v1 Announce Type: new Abstract: Accurate interpretation of electrocardiogram (ECG) signals is crucial for diagnosing cardiovascular diseases. Recent multimodal approaches that integrate ECGs with accompanying clinical reports show strong potential, but they still face two main concerns from a modality...

1 min 1 month, 3 weeks ago

ear

LOW Academic International

Interpretable Medical Image Classification using Prototype Learning and Privileged Information

arXiv:2310.15741v1 Announce Type: cross Abstract: Interpretability is often an essential requirement in medical imaging. Advanced deep learning methods are required to address this need for explainability and high performance. In this work, we investigate whether additional information available during the...

1 min 1 month, 3 weeks ago

ear

LOW Academic International

ConceptRM: The Quest to Mitigate Alert Fatigue through Consensus-Based Purity-Driven Data Cleaning for Reflection Modelling

arXiv:2602.20166v1 Announce Type: cross Abstract: In many applications involving intelligent agents, the overwhelming volume of alerts (mostly false) generated by the agents may desensitize users and cause them to overlook critical issues, leading to the so-called ''alert fatigue''. A common...

1 min 1 month, 3 weeks ago

ear

LOW Academic International

Benchmarking Early Deterioration Prediction Across Hospital-Rich and MCI-Like Emergency Triage Under Constrained Sensing

arXiv:2602.20168v1 Announce Type: cross Abstract: Emergency triage decisions are made under severe information constraints, yet most data-driven deterioration models are evaluated using signals unavailable during initial assessment. We present a leakage-aware benchmarking framework for early deterioration prediction that evaluates model...

1 min 1 month, 3 weeks ago

ear

LOW Academic International

InterviewSim: A Scalable Framework for Interview-Grounded Personality Simulation

arXiv:2602.20294v1 Announce Type: new Abstract: Simulating real personalities with large language models requires grounding generation in authentic personal data. Existing evaluation approaches rely on demographic surveys, personality questionnaires, or short AI-led interviews as proxies, but lack direct assessment against what...

1 min 1 month, 3 weeks ago

ear

LOW Academic International

What Makes a Good Query? Measuring the Impact of Human-Confusing Linguistic Features on LLM Performance

arXiv:2602.20300v1 Announce Type: new Abstract: Large Language Model (LLM) hallucinations are usually treated as defects of the model or its decoding strategy. Drawing on classical linguistics, we argue that a query's form can also shape a listener's (and model's) response....

1 min 1 month, 3 weeks ago

ear

LOW Academic International

No One Size Fits All: QueryBandits for Hallucination Mitigation

arXiv:2602.20332v1 Announce Type: new Abstract: Advanced reasoning capabilities in Large Language Models (LLMs) have led to more frequent hallucinations; yet most mitigation work focuses on open-source models for post-hoc detection and parameter editing. The dearth of studies focusing on hallucinations...

1 min 1 month, 3 weeks ago

ear

Learning to Rewrite Tool Descriptions for Reliable LLM-Agent Tool Use

PreScience: A Benchmark for Forecasting Scientific Contributions

KairosVL: Orchestrating Time Series and Semantics for Unified Reasoning

Inner Speech as Behavior Guides: Steerable Imitation of Diverse Behaviors for Human-AI coordination

From Logs to Language: Learning Optimal Verbalization for LLM-Based Recommendation in Production

CausalReasoningBenchmark: A Real-World Benchmark for Disentangled Evaluation of Causal Identification and Estimation

Physics-based phenomenological characterization of cross-modal bias in multimodal models

When can we trust untrusted monitoring? A safety case sketch across collusion strategies

Identifying two piecewise linear additive value functions from anonymous preference information

How Foundational Skills Influence VLM-based Embodied Agents:A Native Perspective

Online Algorithms with Unreliable Guidance

Buffer Matters: Unleashing the Power of Off-Policy Reinforcement Learning in Large Language Model Reasoning

Modality-Guided Mixture of Graph Experts with Entropy-Triggered Routing for Multimodal Recommendation

Balancing Multiple Objectives in Urban Traffic Control with Reinforcement Learning from AI Feedback

PyVision-RL: Forging Open Agentic Vision Models via RL

POMDPPlanners: Open-Source Package for POMDP Planning

Pressure Reveals Character: Behavioural Alignment Evaluation at Depth

Diagnosing Causal Reasoning in Vision-Language Models via Structured Relevance Graphs

Predicting Sentence Acceptability Judgments in Multimodal Contexts

Architecting AgentOS: From Token-Level Context to Emergent System-Level Intelligence

LogicGraph : Benchmarking Multi-Path Logical Reasoning via Neuro-Symbolic Generation and Verification

Tool Building as a Path to "Superintelligence"

The Initial Exploration Problem in Knowledge Graph Exploration

CG-DMER: Hybrid Contrastive-Generative Framework for Disentangled Multimodal ECG Representation Learning

Interpretable Medical Image Classification using Prototype Learning and Privileged Information

ConceptRM: The Quest to Mitigate Alert Fatigue through Consensus-Based Purity-Driven Data Cleaning for Reflection Modelling

Benchmarking Early Deterioration Prediction Across Hospital-Rich and MCI-Like Emergency Triage Under Constrained Sensing

InterviewSim: A Scalable Framework for Interview-Grounded Personality Simulation

What Makes a Good Query? Measuring the Impact of Human-Confusing Linguistic Features on LLM Performance

No One Size Fits All: QueryBandits for Hallucination Mitigation

Impact Distribution

Related Practice Areas

JCG, PC

HSOLLC Co., Ltd.