Arbitration

LOW Academic International

DeltaLogic: Minimal Premise Edits Reveal Belief-Revision Failures in Logical Reasoning Models

arXiv:2604.02733v1 Announce Type: new Abstract: Reasoning benchmarks typically evaluate whether a model derives the correct answer from a fixed premise set, but they under-measure a closely related capability that matters in dynamic environments: belief revision under minimal evidence change. We...

1 min 1 week, 4 days ago

bit

LOW Academic International

Breakdowns in Conversational AI: Interactional Failures in Emotionally and Ethically Sensitive Contexts

arXiv:2604.02713v1 Announce Type: new Abstract: Conversational AI is increasingly deployed in emotionally charged and ethically sensitive interactions. Previous research has primarily concentrated on emotional benchmarks or static safety checks, overlooking how alignment unfolds in evolving conversation. We explore the research...

1 min 1 week, 4 days ago

bit

LOW Academic International

AgentHazard: A Benchmark for Evaluating Harmful Behavior in Computer-Use Agents

arXiv:2604.02947v1 Announce Type: new Abstract: Computer-use agents extend language models from text generation to persistent action over tools, files, and execution environments. Unlike chat systems, they maintain state across interactions and translate intermediate outputs into concrete actions. This creates a...

1 min 1 week, 4 days ago

bit

LOW Academic International

Internalized Reasoning for Long-Context Visual Document Understanding

arXiv:2604.02371v1 Announce Type: cross Abstract: Visual long-document understanding is critical for enterprise, legal, and scientific applications, yet the best performing open recipes have not explored reasoning, a capability which has driven leaps in math and code performance. We introduce a...

1 min 1 week, 4 days ago

bit

LOW Academic United States

Social Meaning in Large Language Models: Structure, Magnitude, and Pragmatic Prompting

arXiv:2604.02512v1 Announce Type: new Abstract: Large language models (LLMs) increasingly exhibit human-like patterns of pragmatic and social reasoning. This paper addresses two related questions: do LLMs approximate human social meaning not only qualitatively but also quantitatively, and can prompting strategies...

1 min 1 week, 4 days ago

bit

LOW Academic International

FoE: Forest of Errors Makes the First Solution the Best in Large Reasoning Models

arXiv:2604.02967v1 Announce Type: new Abstract: Recent Large Reasoning Models (LRMs) like DeepSeek-R1 have demonstrated remarkable success in complex reasoning tasks, exhibiting human-like patterns in exploring multiple alternative solutions. Upon closer inspection, however, we uncover a surprising phenomenon: The First is...

1 min 1 week, 4 days ago

bit

LOW Academic International

SWAY: A Counterfactual Computational Linguistic Approach to Measuring and Mitigating Sycophancy

arXiv:2604.02423v1 Announce Type: new Abstract: Large language models exhibit sycophancy: the tendency to shift outputs toward user-expressed stances, regardless of correctness or consistency. While prior work has studied this issue and its impacts, rigorous computational linguistic metrics are needed to...

1 min 1 week, 4 days ago

bit

LOW Academic International

Querying Structured Data Through Natural Language Using Language Models

arXiv:2604.03057v1 Announce Type: new Abstract: This paper presents an open source methodology for allowing users to query structured non textual datasets through natural language Unlike Retrieval Augmented Generation RAG which struggles with numerical and highly structured information our approach trains...

1 min 1 week, 4 days ago

bit

LOW Academic International

Multiple-Debias: A Full-process Debiasing Method for Multilingual Pre-trained Language Models

arXiv:2604.02772v1 Announce Type: new Abstract: Multilingual Pre-trained Language Models (MPLMs) have become essential tools for natural language processing. However, they often exhibit biases related to sensitive attributes such as gender, race, and religion. In this paper, we introduce a comprehensive...

1 min 1 week, 4 days ago

bit

LOW Academic European Union

Learning the Signature of Memorization in Autoregressive Language Models

arXiv:2604.03199v1 Announce Type: new Abstract: All prior membership inference attacks for fine-tuned language models use hand-crafted heuristics (e.g., loss thresholding, Min-K\%, reference calibration), each bounded by the designer's intuition. We introduce the first transferable learned attack, enabled by the observation...

1 min 1 week, 4 days ago

bit

LOW Academic International

AXELRAM: Quantize Once, Never Dequantize

arXiv:2604.02638v1 Announce Type: new Abstract: We propose AXELRAM, a smart SRAM macro architecture that computes attention scores directly from quantized KV cache indices without dequantization. The key enabler is a design-time fixed codebook: orthogonal-transform-based quantization concentrates each coordinate's distribution to...

1 min 1 week, 4 days ago

bit

LOW Academic International

FTimeXer: Frequency-aware Time-series Transformer with Exogenous variables for Robust Carbon Footprint Forecasting

arXiv:2604.02347v1 Announce Type: new Abstract: Accurate and up-to-date forecasting of the power grid's carbon footprint is crucial for effective product carbon footprint (PCF) accounting and informed decarbonization decisions. However, the carbon intensity of the grid exhibits high non-stationarity, and existing...

1 min 1 week, 4 days ago

bit

LOW Academic International

Haiku to Opus in Just 10 bits: LLMs Unlock Massive Compression Gains

arXiv:2604.02343v1 Announce Type: cross Abstract: We study the compression of LLM-generated text across lossless and lossy regimes, characterizing a compression-compute frontier where more compression is possible at the cost of more compute. For lossless compression, domain-adapted LoRA adapters can improve...

1 min 1 week, 4 days ago

bit

LOW Academic European Union

Analytic Drift Resister for Non-Exemplar Continual Graph Learning

arXiv:2604.02633v1 Announce Type: new Abstract: Non-Exemplar Continual Graph Learning (NECGL) seeks to eliminate the privacy risks intrinsic to rehearsal-based paradigms by retaining solely class-level prototype representations rather than raw graph examples for mitigating catastrophic forgetting. However, this design choice inevitably...

1 min 1 week, 4 days ago

adr

LOW Academic International

Principled and Scalable Diversity-Aware Retrieval via Cardinality-Constrained Binary Quadratic Programming

arXiv:2604.02554v1 Announce Type: new Abstract: Diversity-aware retrieval is essential for Retrieval-Augmented Generation (RAG), yet existing methods lack theoretical guarantees and face scalability issues as the number of retrieved passages $k$ increases. We propose a principled formulation of diversity retrieval as...

1 min 1 week, 4 days ago

adr

LOW Academic International

Failing to Falsify: Evaluating and Mitigating Confirmation Bias in Language Models

arXiv:2604.02485v1 Announce Type: new Abstract: Confirmation bias, the tendency to seek evidence that supports rather than challenges one's belief, hinders one's reasoning ability. We examine whether large language models (LLMs) exhibit confirmation bias by adapting the rule-discovery study from human...

1 min 1 week, 4 days ago

bit

LOW Academic European Union

Differentiable Symbolic Planning: A Neural Architecture for Constraint Reasoning with Learned Feasibility

arXiv:2604.02350v1 Announce Type: cross Abstract: Neural networks excel at pattern recognition but struggle with constraint reasoning -- determining whether configurations satisfy logical or physical constraints. We introduce Differentiable Symbolic Planning (DSP), a neural architecture that performs discrete symbolic reasoning while...

1 min 1 week, 4 days ago

bit

LOW Academic International

Revealing the Learning Dynamics of Long-Context Continual Pre-training

arXiv:2604.02650v1 Announce Type: new Abstract: Existing studies on Long-Context Continual Pre-training (LCCP) mainly focus on small-scale models and limited data regimes (tens of billions of tokens). We argue that directly migrating these small-scale settings to industrial-grade models risks insufficient adaptation...

1 min 1 week, 4 days ago

bit

LOW Academic International

Fast NF4 Dequantization Kernels for Large Language Model Inference

arXiv:2604.02556v1 Announce Type: new Abstract: Large language models (LLMs) have grown beyond the memory capacity of single GPU devices, necessitating quantization techniques for practical deployment. While NF4 (4-bit NormalFloat) quantization enables 4$\times$ memory reduction, inference on current NVIDIA GPUs (e.g.,...

1 min 1 week, 4 days ago

bit

LOW Academic International

An Empirical Study of Many-Shot In-Context Learning for Machine Translation of Low-Resource Languages

arXiv:2604.02596v1 Announce Type: new Abstract: In-context learning (ICL) allows large language models (LLMs) to adapt to new tasks from a few examples, making it promising for languages underrepresented in pre-training. Recent work on many-shot ICL suggests that modern LLMs can...

1 min 1 week, 4 days ago

bit

LOW Academic European Union

Evaluating the Formal Reasoning Capabilities of Large Language Models through Chomsky Hierarchy

arXiv:2604.02709v1 Announce Type: new Abstract: The formal reasoning capabilities of LLMs are crucial for advancing automated software engineering. However, existing benchmarks for LLMs lack systematic evaluation based on computation and complexity, leaving a critical gap in understanding their formal reasoning...

1 min 1 week, 4 days ago

bit

LOW Academic United States

Mitigating LLM biases toward spurious social contexts using direct preference optimization

arXiv:2604.02585v1 Announce Type: new Abstract: LLMs are increasingly used for high-stakes decision-making, yet their sensitivity to spurious contextual information can introduce harmful biases. This is a critical concern when models are deployed for tasks like evaluating teachers' instructional quality, where...

1 min 1 week, 4 days ago

bit

LOW Academic International

Too Polite to Disagree: Understanding Sycophancy Propagation in Multi-Agent Systems

arXiv:2604.02668v1 Announce Type: new Abstract: Large language models (LLMs) often exhibit sycophancy: agreement with user stance even when it conflicts with the model's opinion. While prior work has mostly studied this in single-agent settings, it remains underexplored in collaborative multi-agent...

1 min 1 week, 4 days ago

bit

LOW Academic United States

Ambig-IaC: Multi-level Disambiguation for Interactive Cloud Infrastructure-as-Code Synthesis

arXiv:2604.02382v1 Announce Type: cross Abstract: The scale and complexity of modern cloud infrastructure have made Infrastructure-as-Code (IaC) essential for managing deployments. While large Language models (LLMs) are increasingly being used to generate IaC configurations from natural language, user requests are...

1 min 1 week, 4 days ago

bit

LOW Academic International

Valence-Arousal Subspace in LLMs: Circular Emotion Geometry and Multi-Behavioral Control

arXiv:2604.03147v1 Announce Type: new Abstract: We present a method to identify a valence-arousal (VA) subspace within large language model representations. From 211k emotion-labeled texts, we derive emotion steering vectors, then learn VA axes as linear combinations of their top PCA...

1 min 1 week, 4 days ago

bit

LOW News International

Can orbital data centers help justify a massive valuation for SpaceX?

On the latest episode of TechCrunch’s Equity podcast, we debated Elon Musk's vision for data centers in space.

1 min 1 week, 4 days ago

bit

LOW Academic International

MSA-Thinker: Discrimination-Calibration Reasoning with Hint-Guided Reinforcement Learning for Multimodal Sentiment Analysis

arXiv:2604.00013v1 Announce Type: cross Abstract: Multimodal sentiment analysis aims to understand human emotions by integrating textual, auditory, and visual modalities. Although Multimodal Large Language Models (MLLMs) have achieved state-of-the-art performance via supervised fine-tuning (SFT), their end-to-end "black-box" nature limits interpretability....

1 min 2 weeks ago

bit

LOW Academic European Union

Signals: Trajectory Sampling and Triage for Agentic Interactions

arXiv:2604.00356v1 Announce Type: new Abstract: Agentic applications based on large language models increasingly rely on multi-step interaction loops involving planning, action execution, and environment feedback. While such systems are now deployed at scale, improving them post-deployment remains challenging. Agent trajectories...

1 min 2 weeks ago

bit

LOW Academic International

Quantifying Gender Bias in Large Language Models: When ChatGPT Becomes a Hiring Manager

arXiv:2604.00011v1 Announce Type: cross Abstract: The growing prominence of large language models (LLMs) in daily life has heightened concerns that LLMs exhibit many of the same gender-related biases as their creators. In the context of hiring decisions, we quantify the...

1 min 2 weeks ago

bit

LOW Academic International

Finding and Reactivating Post-Trained LLMs' Hidden Safety Mechanisms

arXiv:2604.00012v1 Announce Type: cross Abstract: Despite the impressive performance of general-purpose large language models (LLMs), they often require fine-tuning or post-training to excel at specific tasks. For instance, large reasoning models (LRMs), such as the DeepSeek-R1 series, demonstrate strong reasoning...

1 min 2 weeks ago

bit

DeltaLogic: Minimal Premise Edits Reveal Belief-Revision Failures in Logical Reasoning Models

Breakdowns in Conversational AI: Interactional Failures in Emotionally and Ethically Sensitive Contexts

AgentHazard: A Benchmark for Evaluating Harmful Behavior in Computer-Use Agents

Internalized Reasoning for Long-Context Visual Document Understanding

Social Meaning in Large Language Models: Structure, Magnitude, and Pragmatic Prompting

FoE: Forest of Errors Makes the First Solution the Best in Large Reasoning Models

SWAY: A Counterfactual Computational Linguistic Approach to Measuring and Mitigating Sycophancy

Querying Structured Data Through Natural Language Using Language Models

Multiple-Debias: A Full-process Debiasing Method for Multilingual Pre-trained Language Models

Learning the Signature of Memorization in Autoregressive Language Models

AXELRAM: Quantize Once, Never Dequantize

FTimeXer: Frequency-aware Time-series Transformer with Exogenous variables for Robust Carbon Footprint Forecasting

Haiku to Opus in Just 10 bits: LLMs Unlock Massive Compression Gains

Analytic Drift Resister for Non-Exemplar Continual Graph Learning

Principled and Scalable Diversity-Aware Retrieval via Cardinality-Constrained Binary Quadratic Programming

Failing to Falsify: Evaluating and Mitigating Confirmation Bias in Language Models

Differentiable Symbolic Planning: A Neural Architecture for Constraint Reasoning with Learned Feasibility

Revealing the Learning Dynamics of Long-Context Continual Pre-training

Fast NF4 Dequantization Kernels for Large Language Model Inference

An Empirical Study of Many-Shot In-Context Learning for Machine Translation of Low-Resource Languages

Evaluating the Formal Reasoning Capabilities of Large Language Models through Chomsky Hierarchy

Mitigating LLM biases toward spurious social contexts using direct preference optimization

Too Polite to Disagree: Understanding Sycophancy Propagation in Multi-Agent Systems

Ambig-IaC: Multi-level Disambiguation for Interactive Cloud Infrastructure-as-Code Synthesis

Valence-Arousal Subspace in LLMs: Circular Emotion Geometry and Multi-Behavioral Control

Can orbital data centers help justify a massive valuation for SpaceX?

MSA-Thinker: Discrimination-Calibration Reasoning with Hint-Guided Reinforcement Learning for Multimodal Sentiment Analysis

Signals: Trajectory Sampling and Triage for Agentic Interactions

Quantifying Gender Bias in Large Language Models: When ChatGPT Becomes a Hiring Manager

Finding and Reactivating Post-Trained LLMs' Hidden Safety Mechanisms

Impact Distribution

Related Practice Areas

JCG, PC

HSOLLC Co., Ltd.