Intellectual Property

LOW Academic International

Do 3D Large Language Models Really Understand 3D Spatial Relationships?

arXiv:2603.23523v1 Announce Type: new Abstract: Recent 3D Large-Language Models (3D-LLMs) claim to understand 3D worlds, especially spatial relationships among objects. Yet, we find that simply fine-tuning a language model on text-only question-answer pairs can perform comparably or even surpass these...

1 min 3 weeks, 6 days ago

ip

LOW Academic International

Konkani LLM: Multi-Script Instruction Tuning and Evaluation for a Low-Resource Indian Language

arXiv:2603.23529v1 Announce Type: new Abstract: Large Language Models (LLMs) consistently under perform in low-resource linguistic contexts such as Konkani. This performance deficit stems from acute training data scarcity compounded by high script diversity across Devanagari, Romi and Kannada orthographies. To...

1 min 3 weeks, 6 days ago

ip

LOW Academic International

Leveraging Computerized Adaptive Testing for Cost-effective Evaluation of Large Language Models in Medical Benchmarking

arXiv:2603.23506v1 Announce Type: new Abstract: The rapid proliferation of large language models (LLMs) in healthcare creates an urgent need for scalable and psychometrically sound evaluation methods. Conventional static benchmarks are costly to administer repeatedly, vulnerable to data contamination, and lack...

1 min 3 weeks, 6 days ago

nda

LOW Academic International

MSA: Memory Sparse Attention for Efficient End-to-End Memory Model Scaling to 100M Tokens

arXiv:2603.23516v1 Announce Type: new Abstract: Long-term memory is a cornerstone of human intelligence. Enabling AI to process lifetime-scale information remains a long-standing pursuit in the field. Due to the constraints of full-attention architectures, the effective context length of large language...

1 min 3 weeks, 6 days ago

nda

LOW Academic International

Chitrakshara: A Large Multilingual Multimodal Dataset for Indian languages

arXiv:2603.23521v1 Announce Type: new Abstract: Multimodal research has predominantly focused on single-image reasoning, with limited exploration of multi-image scenarios. Recent models have sought to enhance multi-image understanding through large-scale pretraining on interleaved image-text datasets. However, most Vision-Language Models (VLMs) are...

1 min 3 weeks, 6 days ago

ip

LOW Academic United States

S-Path-RAG: Semantic-Aware Shortest-Path Retrieval Augmented Generation for Multi-Hop Knowledge Graph Question Answering

arXiv:2603.23512v1 Announce Type: new Abstract: We present S-Path-RAG, a semantic-aware shortest-path Retrieval-Augmented Generation framework designed to improve multi-hop question answering over large knowledge graphs. S-Path-RAG departs from one-shot, text-heavy retrieval by enumerating bounded-length, semantically weighted candidate paths using a hybrid...

1 min 3 weeks, 6 days ago

nda

LOW Academic International

Cluster-R1: Large Reasoning Models Are Instruction-following Clustering Agents

arXiv:2603.23518v1 Announce Type: new Abstract: General-purpose embedding models excel at recognizing semantic similarities but fail to capture the characteristics of texts specified by user instructions. In contrast, instruction-tuned embedders can align embeddings with textual instructions yet cannot autonomously infer latent...

1 min 3 weeks, 6 days ago

ip

LOW Academic International

DepthCharge: A Domain-Agnostic Framework for Measuring Depth-Dependent Knowledge in Large Language Models

arXiv:2603.23514v1 Announce Type: new Abstract: Large Language Models appear competent when answering general questions but often fail when pushed into domain-specific details. No existing methodology provides an out-of-the-box solution for measuring how deeply LLMs can sustain accurate responses under adaptive...

1 min 3 weeks, 6 days ago

nda

LOW Academic United States

DISCO: Document Intelligence Suite for COmparative Evaluation

arXiv:2603.23511v1 Announce Type: new Abstract: Document intelligence requires accurate text extraction and reliable reasoning over document content. We introduce \textbf{DISCO}, a \emph{Document Intelligence Suite for COmparative Evaluation}, that evaluates optical character recognition (OCR) pipelines and vision-language models (VLMs) separately on...

1 min 3 weeks, 6 days ago

ip

LOW Academic International

Navigating the Concept Space of Language Models

arXiv:2603.23524v1 Announce Type: new Abstract: Sparse autoencoders (SAEs) trained on large language model activations output thousands of features that enable mapping to human-interpretable concepts. The current practice for analyzing these features primarily relies on inspecting top-activating examples, manually browsing individual...

1 min 3 weeks, 6 days ago

ip

LOW Academic International

Fast and Faithful: Real-Time Verification for Long-Document Retrieval-Augmented Generation Systems

arXiv:2603.23508v1 Announce Type: new Abstract: Retrieval-augmented generation (RAG) is increasingly deployed in enterprise search and document-centric assistants, where responses must be grounded in long and complex source materials. In practice, verifying that generated answers faithfully reflect retrieved documents is difficult:...

1 min 3 weeks, 6 days ago

ip

LOW Academic United States

Compression Method Matters: Benchmark-Dependent Output Dynamics in LLM Prompt Compression

arXiv:2603.23527v1 Announce Type: new Abstract: Prompt compression is often evaluated by input-token reduction, but its real deployment impact depends on how compression changes output length and total inference cost. We present a controlled replication and extension study of benchmark-dependent output...

1 min 3 weeks, 6 days ago

ip

LOW Academic International

Training a Large Language Model for Medical Coding Using Privacy-Preserving Synthetic Clinical Data

arXiv:2603.23515v1 Announce Type: new Abstract: Improving the accuracy and reliability of medical coding reduces clinician burnout and supports revenue cycle processes, freeing providers to focus more on patient care. However, automating the assignment of ICD-10-CM and CPT codes from clinical...

1 min 3 weeks, 6 days ago

nda

LOW Academic International

Qworld: Question-Specific Evaluation Criteria for LLMs

arXiv:2603.23522v1 Announce Type: new Abstract: Evaluating large language models (LLMs) on open-ended questions is difficult because response quality depends on the question's context. Binary scores and static rubrics fail to capture these context-dependent requirements. Existing methods define criteria at the...

1 min 3 weeks, 6 days ago

ip

LOW Academic International

Did You Forget What I Asked? Prospective Memory Failures in Large Language Models

arXiv:2603.23530v1 Announce Type: new Abstract: Large language models often fail to satisfy formatting instructions when they must simultaneously perform demanding tasks. We study this behaviour through a prospective memory inspired lens from cognitive psychology, using a controlled paradigm that combines...

1 min 3 weeks, 6 days ago

nda

LOW Academic International

Swiss-Bench SBP-002: A Frontier Model Comparison on Swiss Legal and Regulatory Tasks

arXiv:2603.23646v1 Announce Type: new Abstract: While recent work has benchmarked large language models on Swiss legal translation (Niklaus et al., 2025) and academic legal reasoning from university exams (Fan et al., 2025), no existing benchmark evaluates frontier model performance on...

1 min 3 weeks, 6 days ago

ip

LOW Academic International

Probing Ethical Framework Representations in Large Language Models: Structure, Entanglement, and Methodological Challenges

arXiv:2603.23659v1 Announce Type: new Abstract: When large language models make ethical judgments, do their internal representations distinguish between normative frameworks, or collapse ethics into a single acceptability dimension? We probe hidden representations across five ethical frameworks (deontology, utilitarianism, virtue, justice,...

1 min 3 weeks, 6 days ago

ip

LOW Academic International

PLACID: Privacy-preserving Large language models for Acronym Clinical Inference and Disambiguation

arXiv:2603.23678v1 Announce Type: new Abstract: Large Language Models (LLMs) offer transformative solutions across many domains, but healthcare integration is hindered by strict data privacy constraints. Clinical narratives are dense with ambiguous acronyms, misinterpretation these abbreviations can precipitate severe outcomes like...

1 min 3 weeks, 6 days ago

ip

LOW Academic International

IslamicMMLU: A Benchmark for Evaluating LLMs on Islamic Knowledge

arXiv:2603.23750v1 Announce Type: new Abstract: Large language models are increasingly consulted for Islamic knowledge, yet no comprehensive benchmark evaluates their performance across core Islamic disciplines. We introduce IslamicMMLU, a benchmark of 10,013 multiple-choice questions spanning three tracks: Quran (2,013 questions),...

1 min 3 weeks, 6 days ago

ip

LOW Academic European Union

Perturbation: A simple and efficient adversarial tracer for representation learning in language models

arXiv:2603.23821v1 Announce Type: new Abstract: Linguistic representation learning in deep neural language models (LMs) has been studied for decades, for both practical and theoretical reasons. However, finding representations in LMs remains an unsolved problem, in part due to a dilemma...

1 min 3 weeks, 6 days ago

ip

LOW Academic International

PoliticsBench: Benchmarking Political Values in Large Language Models with Multi-Turn Roleplay

arXiv:2603.23841v1 Announce Type: new Abstract: While Large Language Models (LLMs) are increasingly used as primary sources of information, their potential for political bias may impact their objectivity. Existing benchmarks of LLM social bias primarily evaluate gender and racial stereotypes. When...

1 min 3 weeks, 6 days ago

nda

LOW Academic International

Language Model Planners do not Scale, but do Formalizers?

arXiv:2603.23844v1 Announce Type: new Abstract: Recent work shows overwhelming evidence that LLMs, even those trained to scale their reasoning trace, perform unsatisfactorily when solving planning problems too complex. Whether the same conclusion holds for LLM formalizers that generate solver-oriented programs...

1 min 3 weeks, 6 days ago

ip

LOW Academic International

Self-Distillation for Multi-Token Prediction

arXiv:2603.23911v1 Announce Type: new Abstract: As Large Language Models (LLMs) scale up, inference efficiency becomes a critical bottleneck. Multi-Token Prediction (MTP) could accelerate LLM inference by predicting multiple future tokens in parallel. However, existing MTP approaches still face two challenges:...

1 min 3 weeks, 6 days ago

ip

LOW Academic International

Dialogue to Question Generation for Evidence-based Medical Guideline Agent Development

arXiv:2603.23937v1 Announce Type: new Abstract: Evidence-based medicine (EBM) is central to high-quality care, but remains difficult to implement in fast-paced primary care settings. Physicians face short consultations, increasing patient loads, and lengthy guideline documents that are impractical to consult in...

1 min 3 weeks, 6 days ago

ip

LOW Academic International

OmniACBench: A Benchmark for Evaluating Context-Grounded Acoustic Control in Omni-Modal Models

arXiv:2603.23938v1 Announce Type: new Abstract: Most testbeds for omni-modal models assess multimodal understanding via textual outputs, leaving it unclear whether these models can properly speak their answers. To study this, we introduce OmniACBench, a benchmark for evaluating context-grounded acoustic control...

1 min 3 weeks, 6 days ago

ip

LOW Academic International

The Price Reversal Phenomenon: When Cheaper Reasoning Models End Up Costing More

arXiv:2603.23971v1 Announce Type: new Abstract: Developers and consumers increasingly choose reasoning language models (RLMs) based on their listed API prices. However, how accurately do these prices reflect actual inference costs? We conduct the first systematic study of this question, evaluating...

1 min 3 weeks, 6 days ago

nda

LOW Academic United Kingdom

Grounding Arabic LLMs in the Doha Historical Dictionary: Retrieval-Augmented Understanding of Quran and Hadith

arXiv:2603.23972v1 Announce Type: new Abstract: Large language models (LLMs) have achieved remarkable progress in many language tasks, yet they continue to struggle with complex historical and religious Arabic texts such as the Quran and Hadith. To address this limitation, we...

1 min 3 weeks, 6 days ago

ip

LOW Academic European Union

Thinking with Tables: Enhancing Multi-Modal Tabular Understanding via Neuro-Symbolic Reasoning

arXiv:2603.24004v1 Announce Type: new Abstract: Multimodal Large Language Models (MLLMs) have demonstrated remarkable reasoning capabilities across modalities such as images and text. However, tabular data, despite being a critical real-world modality, remains relatively underexplored in multimodal learning. In this paper,...

1 min 3 weeks, 6 days ago

ip

LOW Academic European Union

Beyond Accuracy: Introducing a Symbolic-Mechanistic Approach to Interpretable Evaluation

arXiv:2603.23517v1 Announce Type: new Abstract: Accuracy-based evaluation cannot reliably distinguish genuine generalization from shortcuts like memorization, leakage, or brittle heuristics, especially in small-data regimes. In this position paper, we argue for mechanism-aware evaluation that combines task-relevant symbolic rules with mechanistic...

1 min 3 weeks, 6 days ago

nda

LOW Academic International

Implicit Turn-Wise Policy Optimization for Proactive User-LLM Interaction

arXiv:2603.23550v1 Announce Type: new Abstract: Multi-turn human-AI collaboration is fundamental to deploying interactive services such as adaptive tutoring, conversational recommendation, and professional consultation. However, optimizing these interactions via reinforcement learning is hindered by the sparsity of verifiable intermediate rewards and...

1 min 3 weeks, 6 days ago

nda

Do 3D Large Language Models Really Understand 3D Spatial Relationships?

Konkani LLM: Multi-Script Instruction Tuning and Evaluation for a Low-Resource Indian Language

Leveraging Computerized Adaptive Testing for Cost-effective Evaluation of Large Language Models in Medical Benchmarking

MSA: Memory Sparse Attention for Efficient End-to-End Memory Model Scaling to 100M Tokens

Chitrakshara: A Large Multilingual Multimodal Dataset for Indian languages

S-Path-RAG: Semantic-Aware Shortest-Path Retrieval Augmented Generation for Multi-Hop Knowledge Graph Question Answering

Cluster-R1: Large Reasoning Models Are Instruction-following Clustering Agents

DepthCharge: A Domain-Agnostic Framework for Measuring Depth-Dependent Knowledge in Large Language Models

DISCO: Document Intelligence Suite for COmparative Evaluation

Navigating the Concept Space of Language Models

Fast and Faithful: Real-Time Verification for Long-Document Retrieval-Augmented Generation Systems

Compression Method Matters: Benchmark-Dependent Output Dynamics in LLM Prompt Compression

Training a Large Language Model for Medical Coding Using Privacy-Preserving Synthetic Clinical Data

Qworld: Question-Specific Evaluation Criteria for LLMs

Did You Forget What I Asked? Prospective Memory Failures in Large Language Models

Swiss-Bench SBP-002: A Frontier Model Comparison on Swiss Legal and Regulatory Tasks

Probing Ethical Framework Representations in Large Language Models: Structure, Entanglement, and Methodological Challenges

PLACID: Privacy-preserving Large language models for Acronym Clinical Inference and Disambiguation

IslamicMMLU: A Benchmark for Evaluating LLMs on Islamic Knowledge

Perturbation: A simple and efficient adversarial tracer for representation learning in language models

PoliticsBench: Benchmarking Political Values in Large Language Models with Multi-Turn Roleplay

Language Model Planners do not Scale, but do Formalizers?

Self-Distillation for Multi-Token Prediction

Dialogue to Question Generation for Evidence-based Medical Guideline Agent Development

OmniACBench: A Benchmark for Evaluating Context-Grounded Acoustic Control in Omni-Modal Models

The Price Reversal Phenomenon: When Cheaper Reasoning Models End Up Costing More

Grounding Arabic LLMs in the Doha Historical Dictionary: Retrieval-Augmented Understanding of Quran and Hadith

Thinking with Tables: Enhancing Multi-Modal Tabular Understanding via Neuro-Symbolic Reasoning

Beyond Accuracy: Introducing a Symbolic-Mechanistic Approach to Interpretable Evaluation

Implicit Turn-Wise Policy Optimization for Proactive User-LLM Interaction

Impact Distribution

Related Practice Areas

JCG, PC

HSOLLC Co., Ltd.