Intellectual Property

LOW Academic International

Pitfalls in Evaluating Interpretability Agents

arXiv:2603.20101v1 Announce Type: new Abstract: Automated interpretability systems aim to reduce the need for human labor and scale analysis to increasingly large models and diverse tasks. Recent efforts toward this goal leverage large language models (LLMs) at increasing levels of...

1 min 1 month ago

nda

LOW Academic International

Utility-Guided Agent Orchestration for Efficient LLM Tool Use

arXiv:2603.19896v1 Announce Type: new Abstract: Tool-using large language model (LLM) agents often face a fundamental tension between answer quality and execution cost. Fixed workflows are stable but inflexible, while free-form multi-step reasoning methods such as ReAct may improve task performance...

1 min 1 month ago

nda

LOW Academic International

ItinBench: Benchmarking Planning Across Multiple Cognitive Dimensions with Large Language Models

arXiv:2603.19515v1 Announce Type: new Abstract: Large language models (LLMs) with advanced cognitive capabilities are emerging as agents for various reasoning and planning tasks. Traditional evaluations often focus on specific reasoning or planning questions within controlled environments. Recent studies have explored...

1 min 1 month ago

ip

LOW Academic European Union

Generative Active Testing: Efficient LLM Evaluation via Proxy Task Adaptation

arXiv:2603.19264v1 Announce Type: cross Abstract: With the widespread adoption of pre-trained Large Language Models (LLM), there exists a high demand for task-specific test sets to benchmark their performance in domains such as healthcare and biomedicine. However, the cost of labeling...

1 min 1 month ago

nda

LOW Academic International

Hyperagents

arXiv:2603.19461v1 Announce Type: new Abstract: Self-improving AI systems aim to reduce reliance on human engineering by learning to improve their own learning and problem-solving processes. Existing approaches to self-improvement rely on fixed, handcrafted meta-level mechanisms, fundamentally limiting how fast such...

1 min 1 month ago

nda

LOW Academic International

GeoChallenge: A Multi-Answer Multiple-Choice Benchmark for Geometric Reasoning with Diagrams

arXiv:2603.19252v1 Announce Type: cross Abstract: Evaluating the symbolic reasoning of large language models (LLMs) calls for geometry benchmarks that require multi-step proofs grounded in both text and diagrams. However, existing benchmarks are often limited in scale and rarely provide visually...

1 min 1 month ago

ip

LOW Academic International

Full-Stack Domain Enhancement for Combustion LLMs: Construction and Optimization

arXiv:2603.19268v1 Announce Type: cross Abstract: Large language models (LLMs) in the direction of task adaptation and capability enhancement for professional fields demonstrate significant application potential. Nevertheless, for complex physical systems such as combustion science, general-purpose LLMs often generate severe hallucinations...

1 min 1 month ago

nda

LOW Academic International

A Human-Centered Workflow for Using Large Language Models in Content Analysis

arXiv:2603.19271v1 Announce Type: cross Abstract: While many researchers use Large Language Models (LLMs) through chat-based access, their real potential lies in leveraging LLMs via application programming interfaces (APIs). This paper conceptualizes LLMs as universal text processing machines and presents a...

1 min 1 month ago

ip

LOW Academic European Union

Transformers are Stateless Differentiable Neural Computers

arXiv:2603.19272v1 Announce Type: cross Abstract: Differentiable Neural Computers (DNCs) were introduced as recurrent architectures equipped with an addressable external memory supporting differentiable read and write operations. Transformers, in contrast, are nominally feedforward architectures based on multi-head self-attention. In this work...

1 min 1 month ago

ip

LOW Academic International

CURE: A Multimodal Benchmark for Clinical Understanding and Retrieval Evaluation

arXiv:2603.19274v1 Announce Type: cross Abstract: Multimodal large language models (MLLMs) demonstrate considerable potential in clinical diagnostics, a domain that inherently requires synthesizing complex visual and textual data alongside consulting authoritative medical literature. However, existing benchmarks primarily evaluate MLLMs in end-to-end...

1 min 1 month ago

nda

LOW Academic European Union

CDEoH: Category-Driven Automatic Algorithm Design With Large Language Models

arXiv:2603.19284v1 Announce Type: cross Abstract: With the rapid advancement of large language models (LLMs), LLM-based heuristic search methods have demonstrated strong capabilities in automated algorithm generation. However, their evolutionary processes often suffer from instability and premature convergence. Existing approaches mainly...

1 min 1 month ago

ip

LOW Academic International

Generalized Stock Price Prediction for Multiple Stocks Combined with News Fusion

arXiv:2603.19286v1 Announce Type: cross Abstract: Predicting stock prices presents challenges in financial forecasting. While traditional approaches such as ARIMA and RNNs are prevalent, recent developments in Large Language Models (LLMs) offer alternative methodologies. This paper introduces an approach that integrates...

1 min 1 month ago

ip

LOW Academic International

Speculating Experts Accelerates Inference for Mixture-of-Experts

arXiv:2603.19289v1 Announce Type: cross Abstract: Mixture-of-Experts (MoE) models have gained popularity as a means of scaling the capacity of large language models (LLMs) while maintaining sparse activations and reduced per-token compute. However, in memory-constrained inference settings, expert weights must be...

1 min 1 month ago

ip

LOW Academic International

Spelling Correction in Healthcare Query-Answer Systems: Methods, Retrieval Impact, and Empirical Evaluation

arXiv:2603.19249v1 Announce Type: new Abstract: Healthcare question-answering (QA) systems face a persistent challenge: users submit queries with spelling errors at rates substantially higher than those found in the professional documents they search. This paper presents the first controlled study of...

1 min 1 month ago

nda

LOW Academic United States

Can Structural Cues Save LLMs? Evaluating Language Models in Massive Document Streams

arXiv:2603.19250v1 Announce Type: new Abstract: Evaluating language models in streaming environments is critical, yet underexplored. Existing benchmarks either focus on single complex events or provide curated inputs for each query, and do not evaluate models under the conflicts that arise...

1 min 1 month ago

ip

LOW Academic International

From Comprehension to Reasoning: A Hierarchical Benchmark for Automated Financial Research Reporting

arXiv:2603.19254v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used to generate financial research reports, shifting from auxiliary analytic tools to primary content producers. Yet recent real-world deployments reveal persistent failures--factual errors, numerical inconsistencies, fabricated references, and shallow...

1 min 1 month ago

nda

LOW Academic International

Constraint-aware Path Planning from Natural Language Instructions Using Large Language Models

arXiv:2603.19257v1 Announce Type: new Abstract: Real-world path planning tasks typically involve multiple constraints beyond simple route optimization, such as the number of routes, maximum route length, depot locations, and task-specific requirements. Traditional approaches rely on dedicated formulations and algorithms for...

1 min 1 month ago

ip

LOW Academic International

Significance-Gain Pair Encoding for LLMs: A Statistical Alternative to Frequency-Based Subword Merging

arXiv:2603.19261v1 Announce Type: new Abstract: Subword tokenization is a key design choice for modern language models, including large language models (LLMs), with byte- and character-level BPE serving as a widely used baseline. Standard BPE selects merges by raw pair frequency,...

1 min 1 month ago

nda

LOW Academic United States

Reviewing the Reviewer: Graph-Enhanced LLMs for E-commerce Appeal Adjudication

arXiv:2603.19267v1 Announce Type: new Abstract: Hierarchical review workflows, where a second-tier reviewer (Checker) corrects first-tier (Maker) decisions, generate valuable correction signals that encode why initial judgments failed. However, learning from these signals is hindered by information asymmetry: corrections often depend...

1 min 1 month ago

nda

LOW Academic United States

Autonoma: A Hierarchical Multi-Agent Framework for End-to-End Workflow Automation

arXiv:2603.19270v1 Announce Type: new Abstract: The increasing complexity of user demands necessitates automation frameworks that can reliably translate open-ended instructions into robust, multi-step workflows. Current monolithic agent architectures often struggle with the challenges of scalability, error propagation, and maintaining focus...

1 min 1 month ago

ip

LOW Academic International

MOSAIC: Modular Opinion Summarization using Aspect Identification and Clustering

arXiv:2603.19277v1 Announce Type: new Abstract: Reviews are central to how travelers evaluate products on online marketplaces, yet existing summarization research often emphasizes end-to-end quality while overlooking benchmark reliability and the practical utility of granular insights. To address this, we propose...

1 min 1 month ago

nda

LOW Academic International

Automated Motif Indexing on the Arabian Nights

arXiv:2603.19283v1 Announce Type: new Abstract: Motifs are non-commonplace, recurring narrative elements, often found originally in folk stories. In addition to being of interest to folklorists, motifs appear as metaphoric devices in modern news, literature, propaganda, and other cultural texts. Finding...

1 min 1 month ago

nda

LOW Academic International

LLM-MRD: LLM-Guided Multi-View Reasoning Distillation for Fake News Detection

arXiv:2603.19293v1 Announce Type: new Abstract: Multimodal fake news detection is crucial for mitigating societal disinformation. Existing approaches attempt to address this by fusing multimodal features or leveraging Large Language Models (LLMs) for advanced reasoning. However, these methods suffer from serious...

1 min 1 month ago

nda

LOW Academic United States

PrefPO: Pairwise Preference Prompt Optimization

arXiv:2603.19311v1 Announce Type: new Abstract: Prompt engineering is effective but labor-intensive, motivating automated optimization methods. Existing methods typically require labeled datasets, which are often unavailable, and produce verbose, repetitive prompts. We introduce PrefPO, a minimal prompt optimization approach inspired by...

1 min 1 month ago

ip

LOW Academic International

Memory-Driven Role-Playing: Evaluation and Enhancement of Persona Knowledge Utilization in LLMs

arXiv:2603.19313v1 Announce Type: new Abstract: A core challenge for faithful LLM role-playing is sustaining consistent characterization throughout long, open-ended dialogues, as models frequently fail to recall and accurately apply their designated persona knowledge without explicit cues. To tackle this, we...

1 min 1 month ago

nda

LOW Academic International

Prompt-tuning with Attribute Guidance for Low-resource Entity Matching

arXiv:2603.19321v1 Announce Type: new Abstract: Entity Matching (EM) is an important task that determines the logical relationship between two entities, such as Same, Different, or Undecidable. Traditional EM approaches rely heavily on supervised learning, which requires large amounts of high-quality...

1 min 1 month ago

ip

LOW Academic International

Is Evaluation Awareness Just Format Sensitivity? Limitations of Probe-Based Evidence under Controlled Prompt Structure

arXiv:2603.19426v1 Announce Type: new Abstract: Prior work uses linear probes on benchmark prompts as evidence of evaluation awareness in large language models. Because evaluation context is typically entangled with benchmark format and genre, it is unclear whether probe-based signals reflect...

1 min 1 month ago

nda

LOW Academic International

BrainSCL: Subtype-Guided Contrastive Learning for Brain Disorder Diagnosis

arXiv:2603.19295v1 Announce Type: new Abstract: Mental disorder populations exhibit pronounced heterogeneity -- that is, the significant differences between samples -- poses a significant challenge to the definition of positive pairs in contrastive learning. To address this, we propose a subtype-guided...

1 min 1 month ago

ip

LOW Academic International

TTQ: Activation-Aware Test-Time Quantization to Accelerate LLM Inference On The Fly

arXiv:2603.19296v1 Announce Type: new Abstract: To tackle the huge computational demand of large foundation models, activation-aware compression techniques without retraining have been introduced. However, since these methods highly rely on calibration data, domain shift issues may arise for unseen downstream...

1 min 1 month ago

nda

LOW Academic United States

CLaRE-ty Amid Chaos: Quantifying Representational Entanglement to Predict Ripple Effects in LLM Editing

arXiv:2603.19297v1 Announce Type: new Abstract: The static knowledge representations of large language models (LLMs) inevitably become outdated or incorrect over time. While model-editing techniques offer a promising solution by modifying a model's factual associations, they often produce unpredictable ripple effects,...

1 min 1 month ago

ip

Pitfalls in Evaluating Interpretability Agents

Utility-Guided Agent Orchestration for Efficient LLM Tool Use

ItinBench: Benchmarking Planning Across Multiple Cognitive Dimensions with Large Language Models

Generative Active Testing: Efficient LLM Evaluation via Proxy Task Adaptation

Hyperagents

GeoChallenge: A Multi-Answer Multiple-Choice Benchmark for Geometric Reasoning with Diagrams

Full-Stack Domain Enhancement for Combustion LLMs: Construction and Optimization

A Human-Centered Workflow for Using Large Language Models in Content Analysis

Transformers are Stateless Differentiable Neural Computers

CURE: A Multimodal Benchmark for Clinical Understanding and Retrieval Evaluation

CDEoH: Category-Driven Automatic Algorithm Design With Large Language Models

Generalized Stock Price Prediction for Multiple Stocks Combined with News Fusion

Speculating Experts Accelerates Inference for Mixture-of-Experts

Spelling Correction in Healthcare Query-Answer Systems: Methods, Retrieval Impact, and Empirical Evaluation

Can Structural Cues Save LLMs? Evaluating Language Models in Massive Document Streams

From Comprehension to Reasoning: A Hierarchical Benchmark for Automated Financial Research Reporting

Constraint-aware Path Planning from Natural Language Instructions Using Large Language Models

Significance-Gain Pair Encoding for LLMs: A Statistical Alternative to Frequency-Based Subword Merging

Reviewing the Reviewer: Graph-Enhanced LLMs for E-commerce Appeal Adjudication

Autonoma: A Hierarchical Multi-Agent Framework for End-to-End Workflow Automation

MOSAIC: Modular Opinion Summarization using Aspect Identification and Clustering

Automated Motif Indexing on the Arabian Nights

LLM-MRD: LLM-Guided Multi-View Reasoning Distillation for Fake News Detection

PrefPO: Pairwise Preference Prompt Optimization

Memory-Driven Role-Playing: Evaluation and Enhancement of Persona Knowledge Utilization in LLMs

Prompt-tuning with Attribute Guidance for Low-resource Entity Matching

Is Evaluation Awareness Just Format Sensitivity? Limitations of Probe-Based Evidence under Controlled Prompt Structure

BrainSCL: Subtype-Guided Contrastive Learning for Brain Disorder Diagnosis

TTQ: Activation-Aware Test-Time Quantization to Accelerate LLM Inference On The Fly

CLaRE-ty Amid Chaos: Quantifying Representational Entanglement to Predict Ripple Effects in LLM Editing

Impact Distribution

Related Practice Areas

JCG, PC

HSOLLC Co., Ltd.