Litigation

LOW Academic International

When Models Know More Than They Say: Probing Analogical Reasoning in LLMs

arXiv:2604.03877v1 Announce Type: new Abstract: Analogical reasoning is a core cognitive faculty essential for narrative understanding. While LLMs perform well when surface and structural cues align, they struggle in cases where an analogy is not apparent on the surface but...

1 min 1 week, 4 days ago

standing

LOW Academic United States

Position: Science of AI Evaluation Requires Item-level Benchmark Data

arXiv:2604.03244v1 Announce Type: new Abstract: AI evaluations have become the primary evidence for deploying generative AI systems across high-stakes domains. However, current evaluation paradigms often exhibit systemic validity failures. These issues, ranging from unjustified design choices to misaligned metrics, remain...

1 min 1 week, 4 days ago

evidence

LOW Academic International

Scaling DPPs for RAG: Density Meets Diversity

arXiv:2604.03240v1 Announce Type: new Abstract: Retrieval-Augmented Generation (RAG) enhances Large Language Models (LLMs) by grounding generation in external knowledge, yielding relevance responses that are aligned with factual evidence and evolving corpora. Standard RAG pipelines construct context through relevance ranking, performing...

1 min 1 week, 4 days ago

evidence

LOW Academic International

Towards the AI Historian: Agentic Information Extraction from Primary Sources

arXiv:2604.03553v1 Announce Type: new Abstract: AI is supporting, accelerating, and automating scientific discovery across a diverse set of fields. However, AI adoption in historical research remains limited due to the lack of solutions designed for historians. In this technical progress...

1 min 1 week, 4 days ago

discovery

LOW Academic European Union

'Layer su Layer': Identifying and Disambiguating the Italian NPN Construction in BERT's family

arXiv:2604.03673v1 Announce Type: new Abstract: Interpretability research has highlighted the importance of evaluating Pretrained Language Models (PLMs) and in particular contextual embeddings against explicit linguistic theories to determine what linguistic information they encode. This study focuses on the Italian NPN...

1 min 1 week, 4 days ago

evidence

LOW Academic European Union

Beyond Predefined Schemas: TRACE-KG for Context-Enriched Knowledge Graphs from Complex Documents

arXiv:2604.03496v1 Announce Type: new Abstract: Knowledge graph construction typically relies either on predefined ontologies or on schema-free extraction. Ontology-driven pipelines enforce consistent typing but require costly schema design and maintenance, whereas schema-free methods often produce fragmented graphs with weak global...

1 min 1 week, 4 days ago

evidence

LOW Academic United States

Readable Minds: Emergent Theory-of-Mind-Like Behavior in LLM Poker Agents

arXiv:2604.04157v1 Announce Type: new Abstract: Theory of Mind (ToM) -- the ability to model others' mental states -- is fundamental to human social cognition. Whether large language models (LLMs) can develop ToM has been tested exclusively through static vignettes, leaving...

1 min 1 week, 4 days ago

standing

LOW Academic International

When Adaptive Rewards Hurt: Causal Probing and the Switching-Stability Dilemma in LLM-Guided LEO Satellite Scheduling

arXiv:2604.03562v1 Announce Type: new Abstract: Adaptive reward design for deep reinforcement learning (DRL) in multi-beam LEO satellite scheduling is motivated by the intuition that regime-aware reward weights should outperform static ones. We systematically test this intuition and uncover a switching-stability...

1 min 1 week, 4 days ago

standing

LOW Academic International

Testing the Limits of Truth Directions in LLMs

arXiv:2604.03754v1 Announce Type: new Abstract: Large language models (LLMs) have been shown to encode truth of statements in their activation space along a linear truth direction. Previous studies have argued that these directions are universal in certain aspects, while more...

1 min 1 week, 4 days ago

standing

LOW Academic International

MultiPress: A Multi-Agent Framework for Interpretable Multimodal News Classification

arXiv:2604.03586v1 Announce Type: new Abstract: With the growing prevalence of multimodal news content, effective news topic classification demands models capable of jointly understanding and reasoning over heterogeneous data such as text and images. Existing methods often process modalities independently or...

1 min 1 week, 4 days ago

standing

LOW Academic International

The Tool Illusion: Rethinking Tool Use in Web Agents

arXiv:2604.03465v1 Announce Type: new Abstract: As web agents rapidly evolve, an increasing body of work has moved beyond conventional atomic browser interactions and explored tool use as a higher-level action paradigm. Although prior studies have shown the promise of tools,...

1 min 1 week, 4 days ago

evidence

LOW Academic International

CODE-GEN: A Human-in-the-Loop RAG-Based Agentic AI System for Multiple-Choice Question Generation

arXiv:2604.03926v1 Announce Type: new Abstract: We present CODE-GEN, a human-in-the-Loop, retrieval-augmented generation (RAG)-based agentic AI system for generating context-aligned multiple-choice questions to develop student code reasoning and comprehension abilities. CODE-GEN employs an agentic AI architecture in which a Generator agent...

1 min 1 week, 4 days ago

standing

LOW Academic European Union

FactReview: Evidence-Grounded Reviews with Literature Positioning and Execution-Based Claim Verification

arXiv:2604.04074v1 Announce Type: new Abstract: Peer review in machine learning is under growing pressure from rising submission volume and limited reviewer time. Most LLM-based reviewing systems read only the manuscript and generate comments from the paper's own narrative. This makes...

1 min 1 week, 4 days ago

evidence

LOW Academic International

AdaptFuse: Training-Free Sequential Preference Learning via Externalized Bayesian Inference

arXiv:2604.03925v1 Announce Type: new Abstract: Large language models struggle to accumulate evidence across multiple rounds of user interaction, failing to update their beliefs in a manner consistent with Bayesian inference. Existing solutions require fine-tuning on sensitive user interaction data, limiting...

1 min 1 week, 4 days ago

evidence

LOW Conference United States

Announcing the ICML 2026 Workshops and Affinity Workshops

7 min 1 week, 4 days ago

standing

LOW Academic United States

DRAFT: Task Decoupled Latent Reasoning for Agent Safety

arXiv:2604.03242v1 Announce Type: new Abstract: The advent of tool-using LLM agents shifts safety monitoring from output moderation to auditing long, noisy interaction trajectories, where risk-critical evidence is sparse-making standard binary supervision poorly suited for credit assignment. To address this, we...

1 min 1 week, 4 days ago

evidence

LOW Academic International

Automated Conjecture Resolution with Formal Verification

arXiv:2604.03789v1 Announce Type: new Abstract: Recent advances in large language models have significantly improved their ability to perform mathematical reasoning, extending from elementary problem solving to increasingly capable performance on research-level problems. However, reliably solving and verifying such problems remains...

1 min 1 week, 4 days ago

discovery

LOW Academic International

Automated Attention Pattern Discovery at Scale in Large Language Models

arXiv:2604.03764v1 Announce Type: new Abstract: Large language models have found success by scaling up capabilities to work in general settings. The same can unfortunately not be said for interpretability methods. The current trend in mechanistic interpretability is to provide precise...

1 min 1 week, 4 days ago

discovery

LOW Academic United States

GeoBrowse: A Geolocation Benchmark for Agentic Tool Use with Expert-Annotated Reasoning Traces

arXiv:2604.04017v1 Announce Type: new Abstract: Deep research agents integrate fragmented evidence through multi-step tool use. BrowseComp offers a text-only testbed for such agents, but existing multimodal benchmarks rarely require both weak visual cues composition and BrowseComp-style multi-hop verification. Geolocation is...

1 min 1 week, 4 days ago

evidence

LOW Academic International

Don't Blink: Evidence Collapse during Multimodal Reasoning

arXiv:2604.04207v1 Announce Type: new Abstract: Reasoning VLMs can become more accurate while progressively losing visual grounding as they think. This creates task-conditional danger zones where low-entropy predictions are confident but ungrounded, a failure mode text-only monitoring cannot detect. Evaluating three...

1 min 1 week, 4 days ago

evidence

LOW Academic International

POEMetric: The Last Stanza of Humanity

arXiv:2604.03695v1 Announce Type: new Abstract: Large Language Models (LLMs) can compose poetry, but how far are they from human poets? In this paper, we introduce POEMetric, the first comprehensive framework for poetry evaluation, examining 1) basic instruction-following abilities in generating...

1 min 1 week, 4 days ago

motion

LOW Academic United States

From Model-Based Screening to Data-Driven Surrogates: A Multi-Stage Workflow for Exploring Stochastic Agent-Based Models

arXiv:2604.03350v1 Announce Type: new Abstract: Systematic exploration of Agent-Based Models (ABMs) is challenged by the curse of dimensionality and their inherent stochasticity. We present a multi-stage pipeline integrating the systematic design of experiments with machine learning surrogates. Using a predator-prey...

1 min 1 week, 4 days ago

discovery

LOW Academic United States

Extracting and Steering Emotion Representations in Small Language Models: A Methodological Comparison

arXiv:2604.04064v1 Announce Type: new Abstract: Small language models (SLMs) in the 100M-10B parameter range increasingly power production systems, yet whether they possess the internal emotion representations recently discovered in frontier models remains unknown. We present the first comparative analysis of...

1 min 1 week, 4 days ago

motion

LOW Academic International

Align Your Structures: Generating Trajectories with Structure Pretraining for Molecular Dynamics

arXiv:2604.03911v1 Announce Type: new Abstract: Generating molecular dynamics (MD) trajectories using deep generative models has attracted increasing attention, yet remains inherently challenging due to the limited availability of MD data and the complexities involved in modeling high-dimensional MD distributions. To...

1 min 1 week, 4 days ago

evidence

LOW Academic International

Which English Do LLMs Prefer? Triangulating Structural Bias Towards American English in Foundation Models

arXiv:2604.04204v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly deployed in high-stakes domains, yet they expose only limited language settings, most notably "English (US)," despite the global diversity and colonial history of English. Through a postcolonial framing to...

1 min 1 week, 4 days ago

evidence

LOW Academic International

FeynmanBench: Benchmarking Multimodal LLMs on Diagrammatic Physics Reasoning

arXiv:2604.03893v1 Announce Type: new Abstract: Breakthroughs in frontier theory often depend on the combination of concrete diagrammatic notations with rigorous logic. While multimodal large language models (MLLMs) show promise in general scientific tasks, current benchmarks often focus on local information...

1 min 1 week, 4 days ago

discovery

LOW Academic International

Unmasking Hallucinations: A Causal Graph-Attention Perspective on Factual Reliability in Large Language Models

arXiv:2604.04020v1 Announce Type: new Abstract: This paper primarily focuses on the hallucinations caused due to AI language models(LLMs).LLMs have shown extraordinary Language understanding and generation capabilities .Still it has major a disadvantage hallucinations which give outputs which are factually incorrect...

1 min 1 week, 4 days ago

standing

LOW Academic European Union

IC3-Evolve: Proof-/Witness-Gated Offline LLM-Driven Heuristic Evolution for IC3 Hardware Model Checking

arXiv:2604.03232v1 Announce Type: new Abstract: IC3, also known as property-directed reachability (PDR), is a commonly-used algorithm for hardware safety model checking. It checks if a state transition system complies with a given safety property. IC3 either returns UNSAFE (indicating property...

1 min 1 week, 4 days ago

trial

LOW Academic International

Compliance-by-Construction Argument Graphs: Using Generative AI to Produce Evidence-Linked Formal Arguments for Certification-Grade Accountability

arXiv:2604.04103v1 Announce Type: new Abstract: High-stakes decision systems increasingly require structured justification, traceability, and auditability to ensure accountability and regulatory compliance. Formal arguments commonly used in the certification of safety-critical systems provide a mechanism for structuring claims, reasoning, and evidence...

1 min 1 week, 4 days ago

evidence

LOW Academic International

Beyond Retrieval: Modeling Confidence Decay and Deterministic Agentic Platforms in Generative Engine Optimization

arXiv:2604.03656v1 Announce Type: new Abstract: Generative Engine Optimization (GEO) is rapidly reshaping digital marketing paradigms in the era of Large Language Models (LLMs). However, current GEO strategies predominantly rely on Retrieval-Augmented Generation (RAG), which inherently suffers from probabilistic hallucinations and...

1 min 1 week, 4 days ago

trial

When Models Know More Than They Say: Probing Analogical Reasoning in LLMs

Position: Science of AI Evaluation Requires Item-level Benchmark Data

Scaling DPPs for RAG: Density Meets Diversity

Towards the AI Historian: Agentic Information Extraction from Primary Sources

'Layer su Layer': Identifying and Disambiguating the Italian NPN Construction in BERT's family

Beyond Predefined Schemas: TRACE-KG for Context-Enriched Knowledge Graphs from Complex Documents

Readable Minds: Emergent Theory-of-Mind-Like Behavior in LLM Poker Agents

When Adaptive Rewards Hurt: Causal Probing and the Switching-Stability Dilemma in LLM-Guided LEO Satellite Scheduling

Testing the Limits of Truth Directions in LLMs

MultiPress: A Multi-Agent Framework for Interpretable Multimodal News Classification

The Tool Illusion: Rethinking Tool Use in Web Agents

CODE-GEN: A Human-in-the-Loop RAG-Based Agentic AI System for Multiple-Choice Question Generation

FactReview: Evidence-Grounded Reviews with Literature Positioning and Execution-Based Claim Verification

AdaptFuse: Training-Free Sequential Preference Learning via Externalized Bayesian Inference

Announcing the ICML 2026 Workshops and Affinity Workshops

DRAFT: Task Decoupled Latent Reasoning for Agent Safety

Automated Conjecture Resolution with Formal Verification

Automated Attention Pattern Discovery at Scale in Large Language Models

GeoBrowse: A Geolocation Benchmark for Agentic Tool Use with Expert-Annotated Reasoning Traces

Don't Blink: Evidence Collapse during Multimodal Reasoning

POEMetric: The Last Stanza of Humanity

From Model-Based Screening to Data-Driven Surrogates: A Multi-Stage Workflow for Exploring Stochastic Agent-Based Models

Extracting and Steering Emotion Representations in Small Language Models: A Methodological Comparison

Align Your Structures: Generating Trajectories with Structure Pretraining for Molecular Dynamics

Which English Do LLMs Prefer? Triangulating Structural Bias Towards American English in Foundation Models

FeynmanBench: Benchmarking Multimodal LLMs on Diagrammatic Physics Reasoning

Unmasking Hallucinations: A Causal Graph-Attention Perspective on Factual Reliability in Large Language Models

IC3-Evolve: Proof-/Witness-Gated Offline LLM-Driven Heuristic Evolution for IC3 Hardware Model Checking

Compliance-by-Construction Argument Graphs: Using Generative AI to Produce Evidence-Linked Formal Arguments for Certification-Grade Accountability

Beyond Retrieval: Modeling Confidence Decay and Deterministic Agentic Platforms in Generative Engine Optimization

Impact Distribution

Related Practice Areas

JCG, PC

HSOLLC Co., Ltd.