International Law

LOW Academic International

Learning When to Sample: Confidence-Aware Self-Consistency for Efficient LLM Chain-of-Thought Reasoning

arXiv:2603.08999v1 Announce Type: new Abstract: Large language models (LLMs) achieve strong reasoning performance through chain-of-thought (CoT) reasoning, yet often generate unnecessarily long reasoning paths that incur high inference cost. Recent self-consistency-based approaches further improve accuracy but require sampling and aggregating...

1 min 1 month, 1 week ago

ear

LOW Academic United States

Interpretable Markov-Based Spatiotemporal Risk Surfaces for Missing-Child Search Planning with Reinforcement Learning and LLM-Based Quality Assurance

arXiv:2603.08933v1 Announce Type: new Abstract: The first 72 hours of a missing-child investigation are critical for successful recovery. However, law enforcement agencies often face fragmented, unstructured data and a lack of dynamic, geospatial predictive tools. Our system, Guardian, provides an...

1 min 1 month, 1 week ago

ear

LOW Academic European Union

Automated Thematic Analysis for Clinical Qualitative Data: Iterative Codebook Refinement with Full Provenance

arXiv:2603.08989v1 Announce Type: new Abstract: Thematic analysis (TA) is widely used in health research to extract patterns from patient interviews, yet manual TA faces challenges in scalability and reproducibility. LLM-based automation can help, but existing approaches produce codebooks with limited...

1 min 1 month, 1 week ago

ear

LOW Academic International

Influencing LLM Multi-Agent Dialogue via Policy-Parameterized Prompts

arXiv:2603.09890v1 Announce Type: new Abstract: Large Language Models (LLMs) have emerged as a new paradigm for multi-agent systems. However, existing research on the behaviour of LLM-based multi-agents relies on ad hoc prompts and lacks a principled policy perspective. Different from...

1 min 1 month, 1 week ago

ear

LOW Academic International

SPAR-K: Scheduled Periodic Alternating Early Exit for Spoken Language Models

arXiv:2603.09215v1 Announce Type: new Abstract: Interleaved spoken language models (SLMs) alternately generate text and speech tokens, but decoding at full transformer depth for every step becomes costly, especially due to long speech sequences. We propose SPAR-K, a modality-aware early exit...

1 min 1 month, 1 week ago

ear

LOW Academic United States

PrivPRISM: Automatically Detecting Discrepancies Between Google Play Data Safety Declarations and Developer Privacy Policies

arXiv:2603.09214v1 Announce Type: new Abstract: End-users seldom read verbose privacy policies, leading app stores like Google Play to mandate simplified data safety declarations as a user-friendly alternative. However, these self-declared disclosures often contradict the full privacy policies, deceiving users about...

1 min 1 month, 1 week ago

ear

LOW Academic International

Meissa: Multi-modal Medical Agentic Intelligence

arXiv:2603.09018v1 Announce Type: new Abstract: Multi-modal large language models (MM-LLMs) have shown strong performance in medical image understanding and clinical reasoning. Recent medical agent systems extend them with tool use and multi-agent collaboration, enabling complex decision-making. However, these systems rely...

1 min 1 month, 1 week ago

ear

LOW Academic United States

Does the Question Really Matter? Training-Free Data Selection for Vision-Language SFT

arXiv:2603.09715v1 Announce Type: new Abstract: Visual instruction tuning is crucial for improving vision-language large models (VLLMs). However, many samples can be solved via linguistic patterns or common-sense shortcuts, without genuine cross-modal reasoning, limiting the effectiveness of multimodal learning. Prior data...

1 min 1 month, 1 week ago

ear

LOW Academic International

Evaluate-as-Action: Self-Evaluated Process Rewards for Retrieval-Augmented Agents

arXiv:2603.09203v1 Announce Type: new Abstract: Retrieval-augmented agents can query external evidence, yet their reliability in multi-step reasoning remains limited: noisy retrieval may derail multi-hop question answering, while outcome-only reinforcement learning provides credit signals that are too coarse to optimize intermediate...

1 min 1 month, 1 week ago

ear

LOW Academic United States

The Reasoning Trap -- Logical Reasoning as a Mechanistic Pathway to Situational Awareness

arXiv:2603.09200v1 Announce Type: new Abstract: Situational awareness, the capacity of an AI system to recognize its own nature, understand its training and deployment context, and reason strategically about its circumstances, is widely considered among the most dangerous emergent capabilities in...

1 min 1 month, 1 week ago

ear

LOW Academic International

MEMO: Memory-Augmented Model Context Optimization for Robust Multi-Turn Multi-Agent LLM Games

arXiv:2603.09022v1 Announce Type: new Abstract: Multi-turn, multi-agent LLM game evaluations often exhibit substantial run-to-run variance. In long-horizon interactions, small early deviations compound across turns and are amplified by multi-agent coupling. This biases win rate estimates and makes rankings unreliable across...

1 min 1 month, 1 week ago

ear

LOW Academic European Union

LooComp: Leverage Leave-One-Out Strategy to Encoder-only Transformer for Efficient Query-aware Context Compression

arXiv:2603.09222v1 Announce Type: new Abstract: Efficient context compression is crucial for improving the accuracy and scalability of question answering. For the efficiency of Retrieval Augmented Generation, context should be delivered fast, compact, and precise to ensure clue sufficiency and budget-friendly...

1 min 1 month, 1 week ago

ear

LOW Academic International

Context Engineering: From Prompts to Corporate Multi-Agent Architecture

arXiv:2603.09619v1 Announce Type: new Abstract: As artificial intelligence (AI) systems evolve from stateless chatbots to autonomous multi-step agents, prompt engineering (PE), the discipline of crafting individual queries, proves necessary but insufficient. This paper introduces context engineering (CE) as a standalone...

1 min 1 month, 1 week ago

ear

LOW Academic International

Social-R1: Towards Human-like Social Reasoning in LLMs

arXiv:2603.09249v1 Announce Type: new Abstract: While large language models demonstrate remarkable capabilities across numerous domains, social intelligence - the capacity to perceive social cues, infer mental states, and generate appropriate responses - remains a critical challenge, particularly for enabling effective...

1 min 1 month, 1 week ago

ear

LOW Academic European Union

The Confidence Gate Theorem: When Should Ranked Decision Systems Abstain?

arXiv:2603.09947v1 Announce Type: new Abstract: Ranked decision systems -- recommenders, ad auctions, clinical triage queues -- must decide when to intervene in ranked outputs and when to abstain. We study when confidence-based abstention monotonically improves decision quality, and when it...

1 min 1 month, 1 week ago

ear

LOW Academic International

SciTaRC: Benchmarking QA on Scientific Tabular Data that Requires Language Reasoning and Complex Computation

arXiv:2603.08910v1 Announce Type: new Abstract: We introduce SciTaRC, an expert-authored benchmark of questions about tabular data in scientific papers requiring both deep language reasoning and complex computation. We show that current state-of-the-art AI models fail on at least 23% of...

1 min 1 month, 1 week ago

itar

LOW Academic International

Reward Prediction with Factorized World States

arXiv:2603.09400v1 Announce Type: new Abstract: Agents must infer action outcomes and select actions that maximize a reward signal indicating how close the goal is to being reached. Supervised learning of reward models could introduce biases inherent to training data, limiting...

1 min 1 month, 1 week ago

ear

LOW Academic International

One Language, Two Scripts: Probing Script-Invariance in LLM Concept Representations

arXiv:2603.08869v1 Announce Type: new Abstract: Do the features learned by Sparse Autoencoders (SAEs) represent abstract meaning, or are they tied to how text is written? We investigate this question using Serbian digraphia as a controlled testbed: Serbian is written interchangeably...

1 min 1 month, 1 week ago

ear

LOW Academic International

ConFu: Contemplate the Future for Better Speculative Sampling

arXiv:2603.08899v1 Announce Type: new Abstract: Speculative decoding has emerged as a powerful approach to accelerate large language model (LLM) inference by employing lightweight draft models to propose candidate tokens that are subsequently verified by the target model. The effectiveness of...

1 min 1 month, 1 week ago

ear

LOW Academic European Union

AI Act Evaluation Benchmark: An Open, Transparent, and Reproducible Evaluation Dataset for NLP and RAG Systems

arXiv:2603.09435v1 Announce Type: new Abstract: The rapid rollout of AI in heterogeneous public and societal sectors has subsequently escalated the need for compliance with regulatory standards and frameworks. The EU AI Act has emerged as a landmark in the regulatory...

1 min 1 month, 1 week ago

ear

LOW Academic United States

EPOCH: An Agentic Protocol for Multi-Round System Optimization

arXiv:2603.09049v1 Announce Type: new Abstract: Autonomous agents are increasingly used to improve prompts, code, and machine learning systems through iterative execution and feedback. Yet existing approaches are usually designed as task-specific optimization loops rather than as a unified protocol for...

1 min 1 month, 1 week ago

ear

LOW Academic International

Chaotic Dynamics in Multi-LLM Deliberation

arXiv:2603.09127v1 Announce Type: new Abstract: Collective AI systems increasingly rely on multi-LLM deliberation, but their stability under repeated execution remains poorly characterized. We model five-agent LLM committees as random dynamical systems and quantify inter-run sensitivity using an empirical Lyapunov exponent...

1 min 1 month, 1 week ago

ear

LOW Academic International

LCA: Local Classifier Alignment for Continual Learning

arXiv:2603.09888v1 Announce Type: new Abstract: A fundamental requirement for intelligent systems is the ability to learn continuously under changing environments. However, models trained in this regime often suffer from catastrophic forgetting. Leveraging pre-trained models has recently emerged as a promising...

1 min 1 month, 1 week ago

ear

LOW Academic International

A Consensus-Driven Multi-LLM Pipeline for Missing-Person Investigations

arXiv:2603.08954v1 Announce Type: new Abstract: The first 72 hours of a missing-person investigation are critical for successful recovery. Guardian is an end-to-end system designed to support missing-child investigation and early search planning. This paper presents the Guardian LLM Pipeline, a...

1 min 1 month, 1 week ago

ear

LOW Academic International

MedMASLab: A Unified Orchestration Framework for Benchmarking Multimodal Medical Multi-Agent Systems

arXiv:2603.09909v1 Announce Type: new Abstract: While Multi-Agent Systems (MAS) show potential for complex clinical decision support, the field remains hindered by architectural fragmentation and the lack of standardized multimodal integration. Current medical MAS research suffers from non-uniform data ingestion pipelines,...

1 min 1 month, 1 week ago

ear

LOW Academic International

You Didn't Have to Say It like That: Subliminal Learning from Faithful Paraphrases

arXiv:2603.09517v1 Announce Type: new Abstract: When language models are trained on synthetic data, they (student model) can covertly acquire behavioral traits from the data-generating model (teacher model). Subliminal learning refers to the transmission of traits from a teacher to a...

1 min 1 month, 1 week ago

ear

LOW Academic United States

Build, Borrow, or Just Fine-Tune? A Political Scientist's Guide to Choosing NLP Models

arXiv:2603.09595v1 Announce Type: new Abstract: Political scientists increasingly face a consequential choice when adopting natural language processing tools: build a domain-specific model from scratch, borrow and adapt an existing one, or simply fine-tune a general-purpose model on task data? Each...

1 min 1 month, 1 week ago

ear

LOW Academic United States

Surgical Repair of Collapsed Attention Heads in ALiBi Transformers

arXiv:2603.09616v1 Announce Type: new Abstract: We identify a systematic attention collapse pathology in the BLOOM family of transformer language models, where ALiBi positional encoding causes 31-44% of attention heads to attend almost entirely to the beginning-of-sequence token. The collapse follows...

1 min 1 month, 1 week ago

ear

LOW Academic International

Understanding the Interplay between LLMs' Utilisation of Parametric and Contextual Knowledge: A keynote at ECIR 2025

arXiv:2603.09654v1 Announce Type: new Abstract: Language Models (LMs) acquire parametric knowledge from their training process, embedding it within their weights. The increasing scalability of LMs, however, poses significant challenges for understanding a model's inner workings and further for updating or...

1 min 1 month, 1 week ago

ear

LOW Academic International

Automatic Cardiac Risk Management Classification using large-context Electronic Patients Health Records

arXiv:2603.09685v1 Announce Type: new Abstract: To overcome the limitations of manual administrative coding in geriatric Cardiovascular Risk Management, this study introduces an automated classification framework leveraging unstructured Electronic Health Records (EHRs). Using a dataset of 3,482 patients, we benchmarked three...

1 min 1 month, 1 week ago

ear

Learning When to Sample: Confidence-Aware Self-Consistency for Efficient LLM Chain-of-Thought Reasoning

Interpretable Markov-Based Spatiotemporal Risk Surfaces for Missing-Child Search Planning with Reinforcement Learning and LLM-Based Quality Assurance

Automated Thematic Analysis for Clinical Qualitative Data: Iterative Codebook Refinement with Full Provenance

Influencing LLM Multi-Agent Dialogue via Policy-Parameterized Prompts

SPAR-K: Scheduled Periodic Alternating Early Exit for Spoken Language Models

PrivPRISM: Automatically Detecting Discrepancies Between Google Play Data Safety Declarations and Developer Privacy Policies

Meissa: Multi-modal Medical Agentic Intelligence

Does the Question Really Matter? Training-Free Data Selection for Vision-Language SFT

Evaluate-as-Action: Self-Evaluated Process Rewards for Retrieval-Augmented Agents

The Reasoning Trap -- Logical Reasoning as a Mechanistic Pathway to Situational Awareness

MEMO: Memory-Augmented Model Context Optimization for Robust Multi-Turn Multi-Agent LLM Games

LooComp: Leverage Leave-One-Out Strategy to Encoder-only Transformer for Efficient Query-aware Context Compression

Context Engineering: From Prompts to Corporate Multi-Agent Architecture

Social-R1: Towards Human-like Social Reasoning in LLMs

The Confidence Gate Theorem: When Should Ranked Decision Systems Abstain?

SciTaRC: Benchmarking QA on Scientific Tabular Data that Requires Language Reasoning and Complex Computation

Reward Prediction with Factorized World States

One Language, Two Scripts: Probing Script-Invariance in LLM Concept Representations

ConFu: Contemplate the Future for Better Speculative Sampling

AI Act Evaluation Benchmark: An Open, Transparent, and Reproducible Evaluation Dataset for NLP and RAG Systems

EPOCH: An Agentic Protocol for Multi-Round System Optimization

Chaotic Dynamics in Multi-LLM Deliberation

LCA: Local Classifier Alignment for Continual Learning

A Consensus-Driven Multi-LLM Pipeline for Missing-Person Investigations

MedMASLab: A Unified Orchestration Framework for Benchmarking Multimodal Medical Multi-Agent Systems

You Didn't Have to Say It like That: Subliminal Learning from Faithful Paraphrases

Build, Borrow, or Just Fine-Tune? A Political Scientist's Guide to Choosing NLP Models

Surgical Repair of Collapsed Attention Heads in ALiBi Transformers

Understanding the Interplay between LLMs' Utilisation of Parametric and Contextual Knowledge: A keynote at ECIR 2025

Automatic Cardiac Risk Management Classification using large-context Electronic Patients Health Records

Impact Distribution

Related Practice Areas

JCG, PC

HSOLLC Co., Ltd.