International Law

LOW Academic International

Beyond Test-Time Compute Strategies: Advocating Energy-per-Token in LLM Inference

arXiv:2603.20224v1 Announce Type: new Abstract: Large Language Models (LLMs) demonstrate exceptional performance across diverse tasks but come with substantial energy and computational costs, particularly in request-heavy scenarios. In many real-world applications, the full scale and capabilities of LLMs are often...

1 min 3 weeks, 4 days ago

ear

LOW Academic International

Do LLM-Driven Agents Exhibit Engagement Mechanisms? Controlled Tests of Information Load, Descriptive Norms, and Popularity Cues

arXiv:2603.20911v1 Announce Type: new Abstract: Large language models make agent-based simulation more behaviorally expressive, but they also sharpen a basic methodological tension: fluent, human-like output is not, by itself, evidence for theory. We evaluate what an LLM-driven simulation can credibly...

1 min 3 weeks, 4 days ago

ear

LOW Academic International

Seed1.8 Model Card: Towards Generalized Real-World Agency

arXiv:2603.20633v1 Announce Type: new Abstract: We present Seed1.8, a foundation model aimed at generalized real-world agency: going beyond single-turn prediction to multi-turn interaction, tool use, and multi-step execution. Seed1.8 keeps strong LLM and vision-language performance while supporting a unified agentic...

1 min 3 weeks, 4 days ago

ear

LOW Academic International

Expected Reward Prediction, with Applications to Model Routing

arXiv:2603.20217v1 Announce Type: new Abstract: Reward models are a standard tool to score responses from LLMs. Reward models are built to rank responses to a fixed prompt sampled from a single model, for example to choose the best of n...

1 min 3 weeks, 4 days ago

ear

LOW Academic International

Towards Intelligent Geospatial Data Discovery: a knowledge graph-driven multi-agent framework powered by large language models

arXiv:2603.20670v1 Announce Type: new Abstract: The rapid growth in the volume, variety, and velocity of geospatial data has created data ecosystems that are highly distributed, heterogeneous, and semantically inconsistent. Existing data catalogs, portals, and infrastructures still rely largely on keyword-based...

1 min 3 weeks, 4 days ago

ear

LOW Academic International

Position: Multi-Agent Algorithmic Care Systems Demand Contestability for Trustworthy AI

arXiv:2603.20595v1 Announce Type: new Abstract: Multi-agent systems (MAS) are increasingly used in healthcare to support complex decision-making through collaboration among specialized agents. Because these systems act as collective decision-makers, they raise challenges for trust, accountability, and human oversight. Existing approaches...

1 min 3 weeks, 4 days ago

ear

LOW Academic International

Knowledge Boundary Discovery for Large Language Models

arXiv:2603.21022v1 Announce Type: new Abstract: We propose Knowledge Boundary Discovery (KBD), a reinforcement learning based framework to explore the knowledge boundaries of the Large Language Models (LLMs). We define the knowledge boundary by automatically generating two types of questions: (i)...

1 min 3 weeks, 4 days ago

ear

LOW Academic International

RoboAlign: Learning Test-Time Reasoning for Language-Action Alignment in Vision-Language-Action Models

arXiv:2603.21341v1 Announce Type: new Abstract: Improving embodied reasoning in multimodal-large-language models (MLLMs) is essential for building vision-language-action models (VLAs) on top of them to readily translate multimodal understanding into low-level actions. Accordingly, recent work has explored enhancing embodied reasoning in...

1 min 3 weeks, 4 days ago

ear

LOW Academic International

The AI Scientific Community: Agentic Virtual Lab Swarms

arXiv:2603.21344v1 Announce Type: new Abstract: In this short note we propose using agentic swarms of virtual labs as a model of an AI Science Community. In this paradigm, each particle in the swarm represents a complete virtual laboratory instance, enabling...

1 min 3 weeks, 4 days ago

ear

LOW Academic International

ProMAS: Proactive Error Forecasting for Multi-Agent Systems Using Markov Transition Dynamics

arXiv:2603.20260v1 Announce Type: new Abstract: The integration of Large Language Models into Multi-Agent Systems (MAS) has enabled the so-lution of complex, long-horizon tasks through collaborative reasoning. However, this collec-tive intelligence is inherently fragile, as a single logical fallacy can rapidly...

1 min 3 weeks, 4 days ago

ear

LOW Academic International

Abjad-Kids: An Arabic Speech Classification Dataset for Primary Education

arXiv:2603.20255v1 Announce Type: new Abstract: Speech-based AI educational applications have gained significant interest in recent years, particularly for children. However, children speech research remains limited due to the lack of publicly available datasets, especially for low-resource languages such as Arabic.This...

1 min 3 weeks, 4 days ago

ear

LOW Academic International

gUFO: A Gentle Foundational Ontology for Semantic Web Knowledge Graphs

arXiv:2603.20948v1 Announce Type: new Abstract: gUFO is a lightweight implementation of the Unified Foundational Ontology (UFO) suitable for Semantic Web OWL 2 DL applications. UFO is a mature foundational ontology with a rich axiomatization and that has been employed in...

1 min 3 weeks, 4 days ago

ear

LOW Academic International

Locally Coherent Parallel Decoding in Diffusion Language Models

arXiv:2603.20216v1 Announce Type: new Abstract: Diffusion language models (DLMs) have emerged as a promising alternative to autoregressive (AR) models, offering sub-linear generation latency and bidirectional capabilities that are particularly appealing for code generation and editing. Achieving sub-linear latency in discrete...

1 min 3 weeks, 4 days ago

ear

LOW Academic International

Context Cartography: Toward Structured Governance of Contextual Space in Large Language Model Systems

arXiv:2603.20578v1 Announce Type: new Abstract: The prevailing approach to improving large language model (LLM) reasoning has centered on expanding context windows, implicitly assuming that more tokens yield better performance. However, empirical evidence - including the "lost in the middle" effect...

1 min 3 weeks, 4 days ago

ear

LOW Academic International

FactorSmith: Agentic Simulation Generation via Markov Decision Process Decomposition with Planner-Designer-Critic Refinement

arXiv:2603.20270v1 Announce Type: new Abstract: Generating executable simulations from natural language specifications remains a challenging problem due to the limited reasoning capacity of large language models (LLMs) when confronted with large, interconnected codebases. This paper presents FactorSmith, a framework that...

1 min 3 weeks, 4 days ago

ear

LOW Academic International

Children's Intelligence Tests Pose Challenges for MLLMs? KidGym: A 2D Grid-Based Reasoning Benchmark for MLLMs

arXiv:2603.20209v1 Announce Type: new Abstract: Multimodal Large Language Models (MLLMs) combine the linguistic strengths of LLMs with the ability to process multimodal data, enbaling them to address a broader range of visual tasks. Because MLLMs aim at more general, human-like...

1 min 3 weeks, 4 days ago

ear

LOW Academic International

Reasoning Traces Shape Outputs but Models Won't Say So

arXiv:2603.20620v1 Announce Type: new Abstract: Can we trust the reasoning traces that large reasoning models (LRMs) produce? We investigate whether these traces faithfully reflect what drives model outputs, and whether models will honestly report their influence. We introduce Thought Injection,...

1 min 3 weeks, 4 days ago

ear

LOW Academic International

AgentComm-Bench: Stress-Testing Cooperative Embodied AI Under Latency, Packet Loss, and Bandwidth Collapse

arXiv:2603.20285v1 Announce Type: new Abstract: Cooperative multi-agent methods for embodied AI are almost universally evaluated under idealized communication: zero latency, no packet loss, and unlimited bandwidth. Real-world deployment on robots with wireless links, autonomous vehicles on congested networks, or drone...

1 min 3 weeks, 4 days ago

ear

LOW Academic International

Fast-Slow Thinking RM: Efficient Integration of Scalar and Generative Reward Models

arXiv:2603.20212v1 Announce Type: new Abstract: Reward models (RMs) are critical for aligning Large Language Models via Reinforcement Learning from Human Feedback (RLHF). While Generative Reward Models (GRMs) achieve superior accuracy through chain-of-thought (CoT) reasoning, they incur substantial computational costs. Conversely,...

1 min 3 weeks, 4 days ago

ear

LOW Academic International

The Intelligent Disobedience Game: Formulating Disobedience in Stackelberg Games and Markov Decision Processes

arXiv:2603.20994v1 Announce Type: new Abstract: In shared autonomy, a critical tension arises when an automated assistant must choose between obeying a human's instruction and deliberately overriding it to prevent harm. This safety-critical behavior is known as intelligent disobedience. To formalize...

1 min 3 weeks, 4 days ago

ear

LOW Academic International

Compression is all you need: Modeling Mathematics

arXiv:2603.20396v1 Announce Type: new Abstract: Human mathematics (HM), the mathematics humans discover and value, is a vanishingly small subset of formal mathematics (FM), the totality of all valid deductions. We argue that HM is distinguished by its compressibility through hierarchically...

1 min 3 weeks, 4 days ago

ear

LOW Academic International

Me, Myself, and $\pi$ : Evaluating and Explaining LLM Introspection

arXiv:2603.20276v1 Announce Type: new Abstract: A hallmark of human intelligence is Introspection-the ability to assess and reason about one's own cognitive processes. Introspection has emerged as a promising but contested capability in large language models (LLMs). However, current evaluations often...

1 min 3 weeks, 4 days ago

ear

LOW Academic International

SciNav: A General Agent Framework for Scientific Coding Tasks

arXiv:2603.20256v1 Announce Type: new Abstract: Autonomous science agents built on large language models (LLMs) are increasingly used to generate hypotheses, design experiments, and produce reports. However, prior work mainly targets open-ended scientific problems with subjective outputs that are difficult to...

1 min 3 weeks, 4 days ago

ear

LOW Academic International

Coding Agents are Effective Long-Context Processors

arXiv:2603.20432v1 Announce Type: new Abstract: Large Language Models (LLMs) have demonstrated remarkable progress in scaling to access massive contexts. However, the access is via the latent and uninterpretable attention mechanisms, and LLMs fail to effective process long context, exhibiting significant...

1 min 3 weeks, 4 days ago

ear

LOW Academic International

JUBAKU: An Adversarial Benchmark for Exposing Culturally Grounded Stereotypes in Japanese LLMs

arXiv:2603.20581v1 Announce Type: new Abstract: Social biases reflected in language are inherently shaped by cultural norms, which vary significantly across regions and lead to diverse manifestations of stereotypes. Existing evaluations of social bias in large language models (LLMs) for non-English...

1 min 3 weeks, 4 days ago

ear

LOW Academic International

Hear Both Sides: Efficient Multi-Agent Debate via Diversity-Aware Message Retention

arXiv:2603.20640v1 Announce Type: new Abstract: Multi-Agent Debate has emerged as a promising framework for improving the reasoning quality of large language models through iterative inter-agent communication. However, broadcasting all agent messages at every round introduces noise and redundancy that can...

1 min 3 weeks, 4 days ago

ear

LOW Academic International

Can I guess where you are from? Modeling dialectal morphosyntactic similarities in Brazilian Portuguese

arXiv:2603.20695v1 Announce Type: new Abstract: This paper investigates morphosyntactic covariation in Brazilian Portuguese (BP) to assess whether dialectal origin can be inferred from the combined behavior of linguistic variables. Focusing on four grammatical phenomena related to pronouns, correlation and clustering...

1 min 3 weeks, 4 days ago

ear

LOW Academic International

BenchBench: Benchmarking Automated Benchmark Generation

arXiv:2603.20807v1 Announce Type: new Abstract: Benchmarks are the de facto standard for tracking progress in large language models (LLMs), yet static test sets can rapidly saturate, become vulnerable to contamination, and are costly to refresh. Scalable evaluation of open-ended items...

1 min 3 weeks, 4 days ago

ear

LOW Academic International

Can ChatGPT Really Understand Modern Chinese Poetry?

arXiv:2603.20851v1 Announce Type: new Abstract: ChatGPT has demonstrated remarkable capabilities on both poetry generation and translation, yet its ability to truly understand poetry remains unexplored. Previous poetry-related work merely analyzed experimental outcomes without addressing fundamental issues of comprehension. This paper...

1 min 3 weeks, 4 days ago

ear

LOW Academic International

NoveltyAgent: Autonomous Novelty Reporting Agent with Point-wise Novelty Analysis and Self-Validation

arXiv:2603.20884v1 Announce Type: new Abstract: The exponential growth of academic publications has led to a surge in papers of varying quality, increasing the cost of paper screening. Current approaches either use novelty assessment within general AI Reviewers or repurpose DeepResearch,...

1 min 3 weeks, 4 days ago

ear

Beyond Test-Time Compute Strategies: Advocating Energy-per-Token in LLM Inference

Do LLM-Driven Agents Exhibit Engagement Mechanisms? Controlled Tests of Information Load, Descriptive Norms, and Popularity Cues

Seed1.8 Model Card: Towards Generalized Real-World Agency

Expected Reward Prediction, with Applications to Model Routing

Towards Intelligent Geospatial Data Discovery: a knowledge graph-driven multi-agent framework powered by large language models

Position: Multi-Agent Algorithmic Care Systems Demand Contestability for Trustworthy AI

Knowledge Boundary Discovery for Large Language Models

RoboAlign: Learning Test-Time Reasoning for Language-Action Alignment in Vision-Language-Action Models

The AI Scientific Community: Agentic Virtual Lab Swarms

ProMAS: Proactive Error Forecasting for Multi-Agent Systems Using Markov Transition Dynamics

Abjad-Kids: An Arabic Speech Classification Dataset for Primary Education

gUFO: A Gentle Foundational Ontology for Semantic Web Knowledge Graphs

Locally Coherent Parallel Decoding in Diffusion Language Models

Context Cartography: Toward Structured Governance of Contextual Space in Large Language Model Systems

FactorSmith: Agentic Simulation Generation via Markov Decision Process Decomposition with Planner-Designer-Critic Refinement

Children's Intelligence Tests Pose Challenges for MLLMs? KidGym: A 2D Grid-Based Reasoning Benchmark for MLLMs

Reasoning Traces Shape Outputs but Models Won't Say So

AgentComm-Bench: Stress-Testing Cooperative Embodied AI Under Latency, Packet Loss, and Bandwidth Collapse

Fast-Slow Thinking RM: Efficient Integration of Scalar and Generative Reward Models

The Intelligent Disobedience Game: Formulating Disobedience in Stackelberg Games and Markov Decision Processes

Compression is all you need: Modeling Mathematics

Me, Myself, and $\pi$ : Evaluating and Explaining LLM Introspection

SciNav: A General Agent Framework for Scientific Coding Tasks

Coding Agents are Effective Long-Context Processors

JUBAKU: An Adversarial Benchmark for Exposing Culturally Grounded Stereotypes in Japanese LLMs

Hear Both Sides: Efficient Multi-Agent Debate via Diversity-Aware Message Retention

Can I guess where you are from? Modeling dialectal morphosyntactic similarities in Brazilian Portuguese

BenchBench: Benchmarking Automated Benchmark Generation

Can ChatGPT Really Understand Modern Chinese Poetry?

NoveltyAgent: Autonomous Novelty Reporting Agent with Point-wise Novelty Analysis and Self-Validation

Impact Distribution

Related Practice Areas

JCG, PC

HSOLLC Co., Ltd.