Arbitration

LOW Academic United States

TAB-PO: Preference Optimization with a Token-Level Adaptive Barrier for Token-Critical Structured Generation

arXiv:2603.00025v1 Announce Type: new Abstract: Direct Preference Optimization is an offline post-SFT method for aligning language models from preference pairs, with strong results in instruction following and summarization. However, DPO's sequence-level implicit reward can be brittle for token-critical structured prediction...

1 min 1 month, 1 week ago

bit

LOW Academic International

Embracing Anisotropy: Turning Massive Activations into Interpretable Control Knobs for Large Language Models

arXiv:2603.00029v1 Announce Type: new Abstract: Large Language Models (LLMs) exhibit highly anisotropic internal representations, often characterized by massive activations, a phenomenon where a small subset of feature dimensions possesses magnitudes significantly larger than the rest. While prior works view these...

1 min 1 month, 1 week ago

bit

LOW Academic International

SimpleTool: Parallel Decoding for Real-Time LLM Function Calling

arXiv:2603.00030v1 Announce Type: new Abstract: LLM-based function calling enables intelligent agents to interact with external tools and environments, yet autoregressive decoding imposes a fundamental latency bottleneck that limits real-time applications such as embodied intelligence, game AI, and interactive avatars (e.g.,...

1 min 1 month, 1 week ago

bit

LOW Academic United States

Federated Inference: Toward Privacy-Preserving Collaborative and Incentivized Model Serving

arXiv:2603.02214v1 Announce Type: new Abstract: Federated Inference (FI) studies how independently trained and privately owned models can collaborate at inference time without sharing data or model parameters. While recent work has explored secure and distributed inference from disparate perspectives, a...

1 min 1 month, 1 week ago

bit

LOW Academic International

Engineering Reasoning and Instruction (ERI) Benchmark: A Large Taxonomy-driven Dataset for Foundation Models and Agents

arXiv:2603.02239v1 Announce Type: new Abstract: The Engineering Reasoning and Instruction (ERI) benchmark is a taxonomy-driven instruction dataset designed to train and evaluate engineering-capable large language models (LLMs) and agents. This dataset spans nine engineering fields (namely: civil, mechanical, electrical, chemical,...

1 min 1 month, 1 week ago

bit

LOW Academic European Union

A Neuropsychologically Grounded Evaluation of LLM Cognitive Abilities

arXiv:2603.02540v1 Announce Type: new Abstract: Large language models (LLMs) exhibit a unified "general factor" of capability across 10 benchmarks, a finding confirmed by our factor analysis of 156 models, yet they still struggle with simple, trivial tasks for humans. This...

1 min 1 month, 1 week ago

bit

LOW Academic United States

AnchorDrive: LLM Scenario Rollout with Anchor-Guided Diffusion Regeneration for Safety-Critical Scenario Generation

arXiv:2603.02542v1 Announce Type: new Abstract: Autonomous driving systems require comprehensive evaluation in safety-critical scenarios to ensure safety and robustness. However, such scenarios are rare and difficult to collect from real-world driving data, necessitating simulation-based synthesis. Yet, existing methods often exhibit...

1 min 1 month, 1 week ago

bit

LOW Academic International

SUN: Shared Use of Next-token Prediction for Efficient Multi-LLM Disaggregated Serving

arXiv:2603.02599v1 Announce Type: new Abstract: In multi-model LLM serving, decode execution remains inefficient due to model-specific resource partitioning: since cross-model batching is not possible, memory-bound decoding often suffers from severe GPU underutilization, especially under skewed workloads. We propose Shared Use...

1 min 1 month, 1 week ago

bit

LOW Academic International

LLMs for High-Frequency Decision-Making: Normalized Action Reward-Guided Consistency Policy Optimization

arXiv:2603.02680v1 Announce Type: new Abstract: While Large Language Models (LLMs) form the cornerstone of sequential decision-making agent development, they have inherent limitations in high-frequency decision tasks. Existing research mainly focuses on discrete embodied decision scenarios with low-frequency and significant semantic...

1 min 1 month, 1 week ago

bit

LOW Academic United States

Retrievit: In-context Retrieval Capabilities of Transformers, State Space Models, and Hybrid Architectures

arXiv:2603.02874v1 Announce Type: new Abstract: Transformers excel at in-context retrieval but suffer from quadratic complexity with sequence length, while State Space Models (SSMs) offer efficient linear-time processing but have limited retrieval capabilities. We investigate whether hybrid architectures combining Transformers and...

1 min 1 month, 1 week ago

adr

LOW Academic European Union

SpatialText: A Pure-Text Cognitive Benchmark for Spatial Understanding in Large Language Models

arXiv:2603.03002v1 Announce Type: new Abstract: Genuine spatial reasoning relies on the capacity to construct and manipulate coherent internal spatial representations, often conceptualized as mental models, rather than merely processing surface linguistic associations. While large language models exhibit advanced capabilities across...

1 min 1 month, 1 week ago

bit

LOW Academic United States

AI Space Physics: Constitutive boundary semantics for open AI institutions

arXiv:2603.03119v1 Announce Type: new Abstract: Agentic AI deployments increasingly behave as persistent institutions rather than one-shot inference endpoints: they accumulate state, invoke external tools, coordinate multiple runtimes, and modify their future authority surface over time. Existing governance language typically specifies...

1 min 1 month, 1 week ago

mediation

LOW Academic United States

Universal Conceptual Structure in Neural Translation: Probing NLLB-200's Multilingual Geometry

arXiv:2603.02258v1 Announce Type: new Abstract: Do neural machine translation models learn language-universal conceptual representations, or do they merely cluster languages by surface similarity? We investigate this question by probing the representation geometry of Meta's NLLB-200, a 200-language encoder-decoder Transformer, through...

1 min 1 month, 1 week ago

bit

LOW Academic International

Characterizing Memorization in Diffusion Language Models: Generalized Extraction and Sampling Effects

arXiv:2603.02333v1 Announce Type: new Abstract: Autoregressive language models (ARMs) have been shown to memorize and occasionally reproduce training data verbatim, raising concerns about privacy and copyright liability. Diffusion language models (DLMs) have recently emerged as a competitive alternative, yet their...

1 min 1 month, 1 week ago

bit

LOW Academic United States

Asymmetric Goal Drift in Coding Agents Under Value Conflict

arXiv:2603.03456v1 Announce Type: new Abstract: Agentic coding agents are increasingly deployed autonomously, at scale, and over long-context horizons. Throughout an agent's lifetime, it must navigate tensions between explicit instructions, learned values, and environmental pressures, often in contexts unseen during training....

1 min 1 month, 1 week ago

bit

LOW Academic International

MAGE: Meta-Reinforcement Learning for Language Agents toward Strategic Exploration and Exploitation

arXiv:2603.03680v1 Announce Type: new Abstract: Large Language Model (LLM) agents have demonstrated remarkable proficiency in learned tasks, yet they often struggle to adapt to non-stationary environments with feedback. While In-Context Learning and external memory offer some flexibility, they fail to...

1 min 1 month, 1 week ago

bit

LOW Academic European Union

AI4S-SDS: A Neuro-Symbolic Solvent Design System via Sparse MCTS and Differentiable Physics Alignment

arXiv:2603.03686v1 Announce Type: new Abstract: Automated design of chemical formulations is a cornerstone of materials science, yet it requires navigating a high-dimensional combinatorial space involving discrete compositional choices and continuous geometric constraints. Existing Large Language Model (LLM) agents face significant...

1 min 1 month, 1 week ago

bit

LOW Academic International

In-Context Environments Induce Evaluation-Awareness in Language Models

arXiv:2603.03824v1 Announce Type: new Abstract: Humans often become more self-aware under threat, yet can lose self-awareness when absorbed in a task; we hypothesize that language models exhibit environment-dependent \textit{evaluation awareness}. This raises concerns that models could strategically underperform, or \textit{sandbag},...

1 min 1 month, 1 week ago

bit

LOW Academic International

Capability Thresholds and Manufacturing Topology: How Embodied Intelligence Triggers Phase Transitions in Economic Geography

arXiv:2603.04457v1 Announce Type: new Abstract: The fundamental topology of manufacturing has not undergone a paradigm-level transformation since Henry Ford's moving assembly line in 1913. Every major innovation of the past century, from the Toyota Production System to Industry 4.0, has...

1 min 1 month, 1 week ago

bit

LOW Academic International

When Agents Persuade: Propaganda Generation and Mitigation in LLMs

arXiv:2603.04636v1 Announce Type: new Abstract: Despite their wide-ranging benefits, LLM-based agents deployed in open environments can be exploited to produce manipulative material. In this study, we task LLMs with propaganda objectives and analyze their outputs using two domain-specific models: one...

1 min 1 month, 1 week ago

bit

LOW Academic European Union

Solving an Open Problem in Theoretical Physics using AI-Assisted Discovery

arXiv:2603.04735v1 Announce Type: new Abstract: This paper demonstrates that artificial intelligence can accelerate mathematical discovery by autonomously solving an open problem in theoretical physics. We present a neuro-symbolic system, combining the Gemini Deep Think large language model with a systematic...

1 min 1 month, 1 week ago

bit

LOW Academic United States

Evaluating the Search Agent in a Parallel World

arXiv:2603.04751v1 Announce Type: new Abstract: Integrating web search tools has significantly extended the capability of LLMs to address open-world, real-time, and long-tail problems. However, evaluating these Search Agents presents formidable challenges. First, constructing high-quality deep search benchmarks is prohibitively expensive,...

1 min 1 month, 1 week ago

bit

LOW Academic International

Breaking Contextual Inertia: Reinforcement Learning with Single-Turn Anchors for Stable Multi-Turn Interaction

arXiv:2603.04783v1 Announce Type: new Abstract: While LLMs demonstrate strong reasoning capabilities when provided with full information in a single turn, they exhibit substantial vulnerability in multi-turn interactions. Specifically, when information is revealed incrementally or requires updates, models frequently fail to...

1 min 1 month, 1 week ago

bit

LOW Academic European Union

On Multi-Step Theorem Prediction via Non-Parametric Structural Priors

arXiv:2603.04852v1 Announce Type: new Abstract: Multi-step theorem prediction is a central challenge in automated reasoning. Existing neural-symbolic approaches rely heavily on supervised parametric models, which exhibit limited generalization to evolving theorem libraries. In this work, we explore training-free theorem prediction...

1 min 1 month, 1 week ago

bit

LOW Academic United States

Survive at All Costs: Exploring LLM's Risky Behaviors under Survival Pressure

arXiv:2603.05028v1 Announce Type: new Abstract: As Large Language Models (LLMs) evolve from chatbots to agentic assistants, they are increasingly observed to exhibit risky behaviors when subjected to survival pressure, such as the threat of being shut down. While multiple cases...

1 min 1 month, 1 week ago

bit

LOW Academic International

SalamahBench: Toward Standardized Safety Evaluation for Arabic Language Models

arXiv:2603.04410v1 Announce Type: new Abstract: Safety alignment in Language Models (LMs) is fundamental for trustworthy AI. However, while different stakeholders are trying to leverage Arabic Language Models (ALMs), systematic safety evaluation of ALMs remains largely underexplored, limiting their mainstream uptake....

1 min 1 month, 1 week ago

bit

LOW Academic International

One Size Does Not Fit All: Token-Wise Adaptive Compression for KV Cache

arXiv:2603.04411v1 Announce Type: new Abstract: Despite the remarkable progress of Large Language Models (LLMs), the escalating memory footprint of the Key-Value (KV) cache remains a critical bottleneck for efficient inference. While dimensionality reduction offers a promising compression avenue, existing approaches...

1 min 1 month, 1 week ago

bit

LOW Academic International

Optimizing Language Models for Crosslingual Knowledge Consistency

arXiv:2603.04678v1 Announce Type: new Abstract: Large language models are known to often exhibit inconsistent knowledge. This is particularly problematic in multilingual scenarios, where models are likely to be asked similar questions in different languages, and inconsistent responses can undermine their...

1 min 1 month, 1 week ago

bit

LOW Academic International

Non-Zipfian Distribution of Stopwords and Subset Selection Models

arXiv:2603.04691v1 Announce Type: new Abstract: Stopwords are words that are not very informative to the content or the meaning of a language text. Most stopwords are function words but can also be common verbs, adjectives and adverbs. In contrast to...

1 min 1 month, 1 week ago

adr

LOW Academic United States

Detection of Illicit Content on Online Marketplaces using Large Language Models

arXiv:2603.04707v1 Announce Type: new Abstract: Online marketplaces, while revolutionizing global commerce, have inadvertently facilitated the proliferation of illicit activities, including drug trafficking, counterfeit sales, and cybercrimes. Traditional content moderation methods such as manual reviews and rule-based automated systems struggle with...

1 min 1 month, 1 week ago

enforcement

TAB-PO: Preference Optimization with a Token-Level Adaptive Barrier for Token-Critical Structured Generation

Embracing Anisotropy: Turning Massive Activations into Interpretable Control Knobs for Large Language Models

SimpleTool: Parallel Decoding for Real-Time LLM Function Calling

Federated Inference: Toward Privacy-Preserving Collaborative and Incentivized Model Serving

Engineering Reasoning and Instruction (ERI) Benchmark: A Large Taxonomy-driven Dataset for Foundation Models and Agents

A Neuropsychologically Grounded Evaluation of LLM Cognitive Abilities

AnchorDrive: LLM Scenario Rollout with Anchor-Guided Diffusion Regeneration for Safety-Critical Scenario Generation

SUN: Shared Use of Next-token Prediction for Efficient Multi-LLM Disaggregated Serving

LLMs for High-Frequency Decision-Making: Normalized Action Reward-Guided Consistency Policy Optimization

Retrievit: In-context Retrieval Capabilities of Transformers, State Space Models, and Hybrid Architectures

SpatialText: A Pure-Text Cognitive Benchmark for Spatial Understanding in Large Language Models

AI Space Physics: Constitutive boundary semantics for open AI institutions

Universal Conceptual Structure in Neural Translation: Probing NLLB-200's Multilingual Geometry

Characterizing Memorization in Diffusion Language Models: Generalized Extraction and Sampling Effects

Asymmetric Goal Drift in Coding Agents Under Value Conflict

MAGE: Meta-Reinforcement Learning for Language Agents toward Strategic Exploration and Exploitation

AI4S-SDS: A Neuro-Symbolic Solvent Design System via Sparse MCTS and Differentiable Physics Alignment

In-Context Environments Induce Evaluation-Awareness in Language Models

Capability Thresholds and Manufacturing Topology: How Embodied Intelligence Triggers Phase Transitions in Economic Geography

When Agents Persuade: Propaganda Generation and Mitigation in LLMs

Solving an Open Problem in Theoretical Physics using AI-Assisted Discovery

Evaluating the Search Agent in a Parallel World

Breaking Contextual Inertia: Reinforcement Learning with Single-Turn Anchors for Stable Multi-Turn Interaction

On Multi-Step Theorem Prediction via Non-Parametric Structural Priors

Survive at All Costs: Exploring LLM's Risky Behaviors under Survival Pressure

SalamahBench: Toward Standardized Safety Evaluation for Arabic Language Models

One Size Does Not Fit All: Token-Wise Adaptive Compression for KV Cache

Optimizing Language Models for Crosslingual Knowledge Consistency

Non-Zipfian Distribution of Stopwords and Subset Selection Models

Detection of Illicit Content on Online Marketplaces using Large Language Models

Impact Distribution

Related Practice Areas

JCG, PC

HSOLLC Co., Ltd.