Arbitration

LOW Academic European Union

Deep Distance Measurement Method for Unsupervised Multivariate Time Series Similarity Retrieval

arXiv:2603.12544v1 Announce Type: new Abstract: We propose the Deep Distance Measurement Method (DDMM) to improve retrieval accuracy in unsupervised multivariate time series similarity retrieval. DDMM enables learning of minute differences within states in the entire time series and thereby recognition...

1 min 1 month ago

bit

LOW Academic International

Social, Legal, Ethical, Empathetic and Cultural Norm Operationalisation for AI Agents

arXiv:2603.11864v1 Announce Type: new Abstract: As AI agents are increasingly used in high-stakes domains like healthcare and law enforcement, aligning their behaviour with social, legal, ethical, empathetic, and cultural (SLEEC) norms has become a critical engineering challenge. While international frameworks...

1 min 1 month ago

enforcement

LOW Academic United States

GPT4o-Receipt: A Dataset and Human Study for AI-Generated Document Forensics

arXiv:2603.11442v1 Announce Type: new Abstract: Can humans detect AI-generated financial documents better than machines? We present GPT4o-Receipt, a benchmark of 1,235 receipt images pairing GPT-4o-generated receipts with authentic ones from established datasets, evaluated by five state-of-the-art multimodal LLMs and a...

1 min 1 month ago

bit

LOW Academic International

Examining Users' Behavioural Intention to Use OpenClaw Through the Cognition--Affect--Conation Framework

arXiv:2603.11455v1 Announce Type: new Abstract: This study examines users' behavioural intention to use OpenClaw through the Cognition--Affect--Conation (CAC) framework. The research investigates how cognitive perceptions of the system influence affective responses and subsequently shape behavioural intention. Enabling factors include perceived...

1 min 1 month ago

bit

LOW Academic International

CreativeBench: Benchmarking and Enhancing Machine Creativity via Self-Evolving Challenges

arXiv:2603.11863v1 Announce Type: new Abstract: The saturation of high-quality pre-training data has shifted research focus toward evolutionary systems capable of continuously generating novel artifacts, leading to the success of AlphaEvolve. However, the progress of such systems is hindered by the...

1 min 1 month ago

bit

LOW Academic International

Stop Listening to Me! How Multi-turn Conversations Can Degrade Diagnostic Reasoning

arXiv:2603.11394v1 Announce Type: new Abstract: Patients and clinicians are increasingly using chatbots powered by large language models (LLMs) for healthcare inquiries. While state-of-the-art LLMs exhibit high performance on static diagnostic reasoning benchmarks, their efficacy across multi-turn conversations, which better reflect...

1 min 1 month ago

bit

LOW Academic United States

Counterweights and Complementarities: The Convergence of AI and Blockchain Powering a Decentralized Future

arXiv:2603.11299v1 Announce Type: new Abstract: This editorial addresses the critical intersection of artificial intelligence (AI) and blockchain technologies, highlighting their contrasting tendencies toward centralization and decentralization, respectively. While AI, particularly with the rise of large language models (LLMs), exhibits a...

1 min 1 month ago

bit

LOW Academic International

Explicit Logic Channel for Validation and Enhancement of MLLMs on Zero-Shot Tasks

arXiv:2603.11689v1 Announce Type: new Abstract: Frontier Multimodal Large Language Models (MLLMs) exhibit remarkable capabilities in Visual-Language Comprehension (VLC) tasks. However, they are often deployed as zero-shot solution to new tasks in a black-box manner. Validating and understanding the behavior of...

1 min 1 month ago

bit

LOW Academic International

ThReadMed-QA: A Multi-Turn Medical Dialogue Benchmark from Real Patient Questions

arXiv:2603.11281v1 Announce Type: new Abstract: Medical question-answering benchmarks predominantly evaluate single-turn exchanges, failing to capture the iterative, clarification-seeking nature of real patient consultations. We introduce ThReadMed-QA, a benchmark of 2,437 fully-answered patient-physician conversation threads extracted from r/AskDocs, comprising 8,204 question-answer...

1 min 1 month ago

bit

LOW Academic United States

AI Knows What's Wrong But Cannot Fix It: Helicoid Dynamics in Frontier LLMs Under High-Stakes Decisions

arXiv:2603.11559v1 Announce Type: new Abstract: Large language models perform reliably when their outputs can be checked: solving equations, writing code, retrieving facts. They perform differently when checking is impossible, as when a clinician chooses an irreversible treatment on incomplete data,...

1 min 1 month ago

bit

LOW Academic International

QChunker: Learning Question-Aware Text Chunking for Domain RAG via Multi-Agent Debate

arXiv:2603.11650v1 Announce Type: new Abstract: The effectiveness upper bound of retrieval-augmented generation (RAG) is fundamentally constrained by the semantic integrity and information granularity of text chunks in its knowledge base. To address these challenges, this paper proposes QChunker, which restructures...

1 min 1 month ago

bit

LOW Academic International

Multi-Task Reinforcement Learning for Enhanced Multimodal LLM-as-a-Judge

arXiv:2603.11665v1 Announce Type: new Abstract: Multimodal Large Language Models (MLLMs) have been widely adopted as MLLM-as-a-Judges due to their strong alignment with human judgment across various visual tasks. However, most existing judge models are optimized for single-task scenarios and struggle...

1 min 1 month ago

bit

LOW Academic International

SemBench: A Universal Semantic Framework for LLM Evaluation

arXiv:2603.11687v1 Announce Type: new Abstract: Recent progress in Natural Language Processing (NLP) has been driven by the emergence of Large Language Models (LLMs), which exhibit remarkable generative and reasoning capabilities. However, despite their success, evaluating the true semantic understanding of...

1 min 1 month ago

bit

LOW Academic European Union

Structure-Aware Epistemic Uncertainty Quantification for Neural Operator PDE Surrogates

arXiv:2603.11052v1 Announce Type: new Abstract: Neural operators (NOs) provide fast, resolution-invariant surrogates for mapping input fields to PDE solution fields, but their predictions can exhibit significant epistemic uncertainty due to finite data, imperfect optimization, and distribution shift. For practical deployment...

1 min 1 month ago

bit

LOW Academic United States

Task-Conditioned Routing Signatures in Sparse Mixture-of-Experts Transformers

arXiv:2603.11114v1 Announce Type: new Abstract: Sparse Mixture-of-Experts (MoE) architectures enable efficient scaling of large language models through conditional computation, yet the routing mechanisms responsible for expert selection remain poorly understood. In this work, we introduce routing signatures, a vector representation...

1 min 1 month ago

bit

LOW Academic United States

A Learning-Based Superposition Operator for Non-Renewal Arrival Processes in Queueing Networks

arXiv:2603.11118v1 Announce Type: new Abstract: The superposition of arrival processes is a fundamental yet analytically intractable operation in queueing networks when inputs are general non-renewal streams. Classical methods either reduce merged flows to renewal surrogates, rely on computationally prohibitive Markovian...

1 min 1 month ago

bit

LOW Academic European Union

High-resolution weather-guided surrogate modeling for data-efficient cross-location building energy prediction

arXiv:2603.11121v1 Announce Type: new Abstract: Building design optimization often depends on physics-based simulation tools such as EnergyPlus, which, although accurate, are computationally expensive and slow. Surrogate models provide a faster alternative, yet most are location-specific, and even weather-informed variants require...

1 min 1 month ago

bit

LOW Academic European Union

Algorithmic Capture, Computational Complexity, and Inductive Bias of Infinite Transformers

arXiv:2603.11161v1 Announce Type: new Abstract: We formally define Algorithmic Capture (i.e., ``grokking'' an algorithm) as the ability of a neural network to generalize to arbitrary problem sizes ($T$) with controllable error and minimal sample adaptation, distinguishing true algorithmic learning from...

1 min 1 month ago

bit

LOW Academic United Kingdom

Bayesian Optimization of Partially Known Systems using Hybrid Models

arXiv:2603.11199v1 Announce Type: new Abstract: Bayesian optimization (BO) has gained attention as an efficient algorithm for black-box optimization of expensive-to-evaluate systems, where the BO algorithm iteratively queries the system and suggests new trials based on a probabilistic model fitted to...

1 min 1 month ago

bit

LOW Academic International

On the Robustness of Langevin Dynamics to Score Function Error

arXiv:2603.11319v1 Announce Type: new Abstract: We consider the robustness of score-based generative modeling to errors in the estimate of the score function. In particular, we show that Langevin dynamics is not robust to the L^2 errors (more generally L^p errors)...

1 min 1 month ago

bit

LOW Academic International

Multilingual Financial Fraud Detection Using Machine Learning and Transformer Models: A Bangla-English Study

arXiv:2603.11358v1 Announce Type: new Abstract: Financial fraud detection has emerged as a critical research challenge amid the rapid expansion of digital financial platforms. Although machine learning approaches have demonstrated strong performance in identifying fraudulent activities, most existing research focuses exclusively...

1 min 1 month ago

bit

LOW Academic European Union

UniHetCO: A Unified Heterogeneous Representation for Multi-Problem Learning in Unsupervised Neural Combinatorial Optimization

arXiv:2603.11456v1 Announce Type: new Abstract: Unsupervised neural combinatorial optimization (NCO) offers an appealing alternative to supervised approaches by training learning-based solvers without ground-truth solutions, directly minimizing instance objectives and constraint violations. Yet for graph node subset-selection problems (e.g., Maximum Clique...

1 min 1 month ago

adr

LOW Academic International

Does LLM Alignment Really Need Diversity? An Empirical Study of Adapting RLVR Methods for Moral Reasoning

arXiv:2603.10588v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards (RLVR) has achieved remarkable success in logical reasoning tasks, yet whether large language model (LLM) alignment requires fundamentally different approaches remains unclear. Given the apparent tolerance for multiple valid responses...

1 min 1 month ago

bit

LOW Academic International

CUAAudit: Meta-Evaluation of Vision-Language Models as Auditors of Autonomous Computer-Use Agents

arXiv:2603.10577v1 Announce Type: new Abstract: Computer-Use Agents (CUAs) are emerging as a new paradigm in human-computer interaction, enabling autonomous execution of tasks in desktop environment by perceiving high-level natural-language instructions. As such agents become increasingly capable and are deployed across...

1 min 1 month ago

bit

LOW Academic International

The Dunning-Kruger Effect in Large Language Models: An Empirical Study of Confidence Calibration

arXiv:2603.09985v1 Announce Type: cross Abstract: Large language models (LLMs) have demonstrated remarkable capabilities across diverse tasks, yet their ability to accurately assess their own confidence remains poorly understood. We present an empirical study investigating whether LLMs exhibit patterns reminiscent of...

1 min 1 month ago

bit

LOW Academic International

One Model, Many Skills: Parameter-Efficient Fine-Tuning for Multitask Code Analysis

arXiv:2603.09978v1 Announce Type: cross Abstract: Large language models have recently surpassed specialized systems on code generation, yet their effectiveness on other code-analysis tasks remains less clear. At the same time, multi-task learning offers a way to unify diverse objectives within...

1 min 1 month ago

bit

LOW Academic United States

Quantifying Hallucinations in Language Language Models on Medical Textbooks

arXiv:2603.09986v1 Announce Type: cross Abstract: Hallucinations, the tendency for large language models to provide responses with factually incorrect and unsupported claims, is a serious problem within natural language processing for which we do not yet have an effective solution to...

1 min 1 month ago

adr

LOW Academic International

A Two-Stage Architecture for NDA Analysis: LLM-based Segmentation and Transformer-based Clause Classification

arXiv:2603.09990v1 Announce Type: cross Abstract: In business-to-business relations, it is common to establish NonDisclosure Agreements (NDAs). However, these documents exhibit significant variation in format, structure, and writing style, making manual analysis slow and error-prone. We propose an architecture based on...

1 min 1 month ago

bit

LOW Academic International

Explainable LLM Unlearning Through Reasoning

arXiv:2603.09980v1 Announce Type: cross Abstract: LLM unlearning is essential for mitigating safety, copyright, and privacy concerns in pre-trained large language models (LLMs). Compared to preference alignment, it offers a more explicit way by removing undesirable knowledge characterized by specific unlearning...

1 min 1 month ago

bit

LOW Academic International

Beyond the Prompt in Large Language Models: Comprehension, In-Context Learning, and Chain-of-Thought

arXiv:2603.10000v1 Announce Type: new Abstract: Large Language Models (LLMs) have demonstrated remarkable proficiency across diverse tasks, exhibiting emergent properties such as semantic prompt comprehension, In-Context Learning (ICL), and Chain-of-Thought (CoT) reasoning. Despite their empirical success, the theoretical mechanisms driving these...

1 min 1 month ago

bit

Deep Distance Measurement Method for Unsupervised Multivariate Time Series Similarity Retrieval

Social, Legal, Ethical, Empathetic and Cultural Norm Operationalisation for AI Agents

GPT4o-Receipt: A Dataset and Human Study for AI-Generated Document Forensics

Examining Users' Behavioural Intention to Use OpenClaw Through the Cognition--Affect--Conation Framework

CreativeBench: Benchmarking and Enhancing Machine Creativity via Self-Evolving Challenges

Stop Listening to Me! How Multi-turn Conversations Can Degrade Diagnostic Reasoning

Counterweights and Complementarities: The Convergence of AI and Blockchain Powering a Decentralized Future

Explicit Logic Channel for Validation and Enhancement of MLLMs on Zero-Shot Tasks

ThReadMed-QA: A Multi-Turn Medical Dialogue Benchmark from Real Patient Questions

AI Knows What's Wrong But Cannot Fix It: Helicoid Dynamics in Frontier LLMs Under High-Stakes Decisions

QChunker: Learning Question-Aware Text Chunking for Domain RAG via Multi-Agent Debate

Multi-Task Reinforcement Learning for Enhanced Multimodal LLM-as-a-Judge

SemBench: A Universal Semantic Framework for LLM Evaluation

Structure-Aware Epistemic Uncertainty Quantification for Neural Operator PDE Surrogates

Task-Conditioned Routing Signatures in Sparse Mixture-of-Experts Transformers

A Learning-Based Superposition Operator for Non-Renewal Arrival Processes in Queueing Networks

High-resolution weather-guided surrogate modeling for data-efficient cross-location building energy prediction

Algorithmic Capture, Computational Complexity, and Inductive Bias of Infinite Transformers

Bayesian Optimization of Partially Known Systems using Hybrid Models

On the Robustness of Langevin Dynamics to Score Function Error

Multilingual Financial Fraud Detection Using Machine Learning and Transformer Models: A Bangla-English Study

UniHetCO: A Unified Heterogeneous Representation for Multi-Problem Learning in Unsupervised Neural Combinatorial Optimization

Does LLM Alignment Really Need Diversity? An Empirical Study of Adapting RLVR Methods for Moral Reasoning

CUAAudit: Meta-Evaluation of Vision-Language Models as Auditors of Autonomous Computer-Use Agents

The Dunning-Kruger Effect in Large Language Models: An Empirical Study of Confidence Calibration

One Model, Many Skills: Parameter-Efficient Fine-Tuning for Multitask Code Analysis

Quantifying Hallucinations in Language Language Models on Medical Textbooks

A Two-Stage Architecture for NDA Analysis: LLM-based Segmentation and Transformer-based Clause Classification

Explainable LLM Unlearning Through Reasoning

Beyond the Prompt in Large Language Models: Comprehension, In-Context Learning, and Chain-of-Thought

Impact Distribution

Related Practice Areas

JCG, PC

HSOLLC Co., Ltd.