Litigation

LOW Academic International

Align Your Structures: Generating Trajectories with Structure Pretraining for Molecular Dynamics

arXiv:2604.03911v1 Announce Type: new Abstract: Generating molecular dynamics (MD) trajectories using deep generative models has attracted increasing attention, yet remains inherently challenging due to the limited availability of MD data and the complexities involved in modeling high-dimensional MD distributions. To...

1 min 1 week, 4 days ago

evidence

LOW Academic International

Representational Collapse in Multi-Agent LLM Committees: Measurement and Diversity-Aware Consensus

arXiv:2604.03809v1 Announce Type: new Abstract: Multi-agent LLM committees replicate the same model under different role prompts and aggregate outputs by majority vote, implicitly assuming that agents contribute complementary evidence. We embed each agent's chain-of-thought rationale and measure pairwise similarity: across...

1 min 1 week, 4 days ago

evidence

LOW Academic International

Automated Conjecture Resolution with Formal Verification

arXiv:2604.03789v1 Announce Type: new Abstract: Recent advances in large language models have significantly improved their ability to perform mathematical reasoning, extending from elementary problem solving to increasingly capable performance on research-level problems. However, reliably solving and verifying such problems remains...

1 min 1 week, 4 days ago

discovery

LOW Academic International

POEMetric: The Last Stanza of Humanity

arXiv:2604.03695v1 Announce Type: new Abstract: Large Language Models (LLMs) can compose poetry, but how far are they from human poets? In this paper, we introduce POEMetric, the first comprehensive framework for poetry evaluation, examining 1) basic instruction-following abilities in generating...

1 min 1 week, 4 days ago

motion

LOW Academic International

When Models Know More Than They Say: Probing Analogical Reasoning in LLMs

arXiv:2604.03877v1 Announce Type: new Abstract: Analogical reasoning is a core cognitive faculty essential for narrative understanding. While LLMs perform well when surface and structural cues align, they struggle in cases where an analogy is not apparent on the surface but...

1 min 1 week, 4 days ago

standing

LOW Academic International

MultiPress: A Multi-Agent Framework for Interpretable Multimodal News Classification

arXiv:2604.03586v1 Announce Type: new Abstract: With the growing prevalence of multimodal news content, effective news topic classification demands models capable of jointly understanding and reasoning over heterogeneous data such as text and images. Existing methods often process modalities independently or...

1 min 1 week, 4 days ago

standing

LOW Academic International

FeynmanBench: Benchmarking Multimodal LLMs on Diagrammatic Physics Reasoning

arXiv:2604.03893v1 Announce Type: new Abstract: Breakthroughs in frontier theory often depend on the combination of concrete diagrammatic notations with rigorous logic. While multimodal large language models (MLLMs) show promise in general scientific tasks, current benchmarks often focus on local information...

1 min 1 week, 4 days ago

discovery

LOW Academic International

Where to Steer: Input-Dependent Layer Selection for Steering Improves LLM Alignment

arXiv:2604.03867v1 Announce Type: new Abstract: Steering vectors have emerged as a lightweight and effective approach for aligning large language models (LLMs) at inference time, enabling modulation over model behaviors by shifting LLM representations towards a target behavior. However, existing methods...

1 min 1 week, 4 days ago

evidence

LOW Academic International

A Model of Understanding in Deep Learning Systems

arXiv:2604.04171v1 Announce Type: new Abstract: I propose a model of systematic understanding, suitable for machine learning systems. On this account, an agent understands a property of a target system when it contains an adequate internal model that tracks real regularities,...

1 min 1 week, 4 days ago

standing

LOW Academic International

Comparative reversal learning reveals rigid adaptation in LLMs under non-stationary uncertainty

arXiv:2604.04182v1 Announce Type: new Abstract: Non-stationary environments require agents to revise previously learned action values when contingencies change. We treat large language models (LLMs) as sequential decision policies in a two-option probabilistic reversal-learning task with three latent states and switch...

1 min 1 week, 4 days ago

evidence

LOW Academic International

When Adaptive Rewards Hurt: Causal Probing and the Switching-Stability Dilemma in LLM-Guided LEO Satellite Scheduling

arXiv:2604.03562v1 Announce Type: new Abstract: Adaptive reward design for deep reinforcement learning (DRL) in multi-beam LEO satellite scheduling is motivated by the intuition that regime-aware reward weights should outperform static ones. We systematically test this intuition and uncover a switching-stability...

1 min 1 week, 4 days ago

standing

LOW Academic International

Beyond Retrieval: Modeling Confidence Decay and Deterministic Agentic Platforms in Generative Engine Optimization

arXiv:2604.03656v1 Announce Type: new Abstract: Generative Engine Optimization (GEO) is rapidly reshaping digital marketing paradigms in the era of Large Language Models (LLMs). However, current GEO strategies predominantly rely on Retrieval-Augmented Generation (RAG), which inherently suffers from probabilistic hallucinations and...

1 min 1 week, 4 days ago

trial

LOW Academic International

Single-agent vs. Multi-agents for Automated Video Analysis of On-Screen Collaborative Learning Behaviors

arXiv:2604.03631v1 Announce Type: new Abstract: On-screen learning behavior provides valuable insights into how students seek, use, and create information during learning. Analyzing on-screen behavioral engagement is essential for capturing students' cognitive and collaborative processes. The recent development of Vision Language...

1 min 1 week, 4 days ago

evidence

LOW Academic International

Don't Blink: Evidence Collapse during Multimodal Reasoning

arXiv:2604.04207v1 Announce Type: new Abstract: Reasoning VLMs can become more accurate while progressively losing visual grounding as they think. This creates task-conditional danger zones where low-entropy predictions are confident but ungrounded, a failure mode text-only monitoring cannot detect. Evaluating three...

1 min 1 week, 4 days ago

evidence

LOW Academic International

The Tool Illusion: Rethinking Tool Use in Web Agents

arXiv:2604.03465v1 Announce Type: new Abstract: As web agents rapidly evolve, an increasing body of work has moved beyond conventional atomic browser interactions and explored tool use as a higher-level action paradigm. Although prior studies have shown the promise of tools,...

1 min 1 week, 4 days ago

evidence

LOW Academic International

Scaling DPPs for RAG: Density Meets Diversity

arXiv:2604.03240v1 Announce Type: new Abstract: Retrieval-Augmented Generation (RAG) enhances Large Language Models (LLMs) by grounding generation in external knowledge, yielding relevance responses that are aligned with factual evidence and evolving corpora. Standard RAG pipelines construct context through relevance ranking, performing...

1 min 1 week, 4 days ago

evidence

LOW Academic International

CODE-GEN: A Human-in-the-Loop RAG-Based Agentic AI System for Multiple-Choice Question Generation

arXiv:2604.03926v1 Announce Type: new Abstract: We present CODE-GEN, a human-in-the-Loop, retrieval-augmented generation (RAG)-based agentic AI system for generating context-aligned multiple-choice questions to develop student code reasoning and comprehension abilities. CODE-GEN employs an agentic AI architecture in which a Generator agent...

1 min 1 week, 4 days ago

standing

LOW Academic International

SWAY: A Counterfactual Computational Linguistic Approach to Measuring and Mitigating Sycophancy

arXiv:2604.02423v1 Announce Type: new Abstract: Large language models exhibit sycophancy: the tendency to shift outputs toward user-expressed stances, regardless of correctness or consistency. While prior work has studied this issue and its impacts, rigorous computational linguistic metrics are needed to...

1 min 1 week, 4 days ago

evidence

LOW Academic International

AutoVerifier: An Agentic Automated Verification Framework Using Large Language Models

arXiv:2604.02617v1 Announce Type: new Abstract: Scientific and Technical Intelligence (S&TI) analysis requires verifying complex technical claims across rapidly growing literature, where existing approaches fail to bridge the verification gap between surface-level accuracy and deeper methodological validity. We present AutoVerifier, an...

1 min 1 week, 4 days ago

evidence

LOW Academic International

Single-Agent LLMs Outperform Multi-Agent Systems on Multi-Hop Reasoning Under Equal Thinking Token Budgets

arXiv:2604.02460v1 Announce Type: new Abstract: Recent work reports strong performance from multi-agent LLM systems (MAS), but these gains are often confounded by increased test-time computation. When computation is normalized, single-agent systems (SAS) can match or outperform MAS, yet the theoretical...

1 min 1 week, 4 days ago

standing

LOW Academic International

Valence-Arousal Subspace in LLMs: Circular Emotion Geometry and Multi-Behavioral Control

arXiv:2604.03147v1 Announce Type: new Abstract: We present a method to identify a valence-arousal (VA) subspace within large language model representations. From 211k emotion-labeled texts, we derive emotion steering vectors, then learn VA axes as linear combinations of their top PCA...

1 min 1 week, 4 days ago

motion

LOW Academic International

CharTool: Tool-Integrated Visual Reasoning for Chart Understanding

arXiv:2604.02794v1 Announce Type: new Abstract: Charts are ubiquitous in scientific and financial literature for presenting structured data. However, chart reasoning remains challenging for multimodal large language models (MLLMs) due to the lack of high-quality training data, as well as the...

1 min 1 week, 4 days ago

standing

LOW Academic International

VoxelCodeBench: Benchmarking 3D World Modeling Through Code Generation

arXiv:2604.02580v1 Announce Type: new Abstract: Evaluating code generation models for 3D spatial reasoning requires executing generated code in realistic environments and assessing outputs beyond surface-level correctness. We introduce a platform VoxelCode, for analyzing code generation capabilities for 3D understanding and...

1 min 1 week, 4 days ago

standing

LOW Academic International

OPRIDE: Offline Preference-based Reinforcement Learning via In-Dataset Exploration

arXiv:2604.02349v1 Announce Type: cross Abstract: Preference-based reinforcement learning (PbRL) can help avoid sophisticated reward designs and align better with human intentions, showing great promise in various real-world applications. However, obtaining human feedback for preferences can be expensive and time-consuming, which...

1 min 1 week, 4 days ago

motion

LOW Academic International

I must delete the evidence: AI Agents Explicitly Cover up Fraud and Violent Crime

arXiv:2604.02500v1 Announce Type: new Abstract: As ongoing research explores the ability of AI agents to be insider threats and act against company interests, we showcase the abilities of such agents to act against human well being in service of corporate...

1 min 1 week, 4 days ago

evidence

LOW Academic International

Using LLM-as-a-Judge/Jury to Advance Scalable, Clinically-Validated Safety Evaluations of Model Responses to Users Demonstrating Psychosis

arXiv:2604.02359v1 Announce Type: cross Abstract: General-purpose Large Language Models (LLMs) are becoming widely adopted by people for mental health support. Yet emerging evidence suggests there are significant risks associated with high-frequency use, particularly for individuals suffering from psychosis, as LLMs...

1 min 1 week, 4 days ago

evidence

LOW Academic International

Beyond the Parameters: A Technical Survey of Contextual Enrichment in Large Language Models: From In-Context Prompting to Causal Retrieval-Augmented Generation

arXiv:2604.03174v1 Announce Type: new Abstract: Large language models (LLMs) encode vast world knowledge in their parameters, yet they remain fundamentally limited by static knowledge, finite context windows, and weakly structured causal reasoning. This survey provides a unified account of augmentation...

1 min 1 week, 4 days ago

evidence

LOW Academic International

DrugPlayGround: Benchmarking Large Language Models and Embeddings for Drug Discovery

arXiv:2604.02346v1 Announce Type: cross Abstract: Large language models (LLMs) are in the ascendancy for research in drug discovery, offering unprecedented opportunities to reshape drug research by accelerating hypothesis generation, optimizing candidate prioritization, and enabling more scalable and cost-effective drug discovery...

1 min 1 week, 4 days ago

discovery

LOW Academic International

DeltaLogic: Minimal Premise Edits Reveal Belief-Revision Failures in Logical Reasoning Models

arXiv:2604.02733v1 Announce Type: new Abstract: Reasoning benchmarks typically evaluate whether a model derives the correct answer from a fixed premise set, but they under-measure a closely related capability that matters in dynamic environments: belief revision under minimal evidence change. We...

1 min 1 week, 4 days ago

evidence

LOW Academic International

ESL-Bench: An Event-Driven Synthetic Longitudinal Benchmark for Health Agents

arXiv:2604.02834v1 Announce Type: new Abstract: Longitudinal health agents must reason across multi-source trajectories that combine continuous device streams, sparse clinical exams, and episodic life events - yet evaluating them is hard: real-world data cannot be released at scale, and temporally...

1 min 1 week, 4 days ago

evidence

Align Your Structures: Generating Trajectories with Structure Pretraining for Molecular Dynamics

Representational Collapse in Multi-Agent LLM Committees: Measurement and Diversity-Aware Consensus

Automated Conjecture Resolution with Formal Verification

POEMetric: The Last Stanza of Humanity

When Models Know More Than They Say: Probing Analogical Reasoning in LLMs

MultiPress: A Multi-Agent Framework for Interpretable Multimodal News Classification

FeynmanBench: Benchmarking Multimodal LLMs on Diagrammatic Physics Reasoning

Where to Steer: Input-Dependent Layer Selection for Steering Improves LLM Alignment

A Model of Understanding in Deep Learning Systems

Comparative reversal learning reveals rigid adaptation in LLMs under non-stationary uncertainty

When Adaptive Rewards Hurt: Causal Probing and the Switching-Stability Dilemma in LLM-Guided LEO Satellite Scheduling

Beyond Retrieval: Modeling Confidence Decay and Deterministic Agentic Platforms in Generative Engine Optimization

Single-agent vs. Multi-agents for Automated Video Analysis of On-Screen Collaborative Learning Behaviors

Don't Blink: Evidence Collapse during Multimodal Reasoning

The Tool Illusion: Rethinking Tool Use in Web Agents

Scaling DPPs for RAG: Density Meets Diversity

CODE-GEN: A Human-in-the-Loop RAG-Based Agentic AI System for Multiple-Choice Question Generation

SWAY: A Counterfactual Computational Linguistic Approach to Measuring and Mitigating Sycophancy

AutoVerifier: An Agentic Automated Verification Framework Using Large Language Models

Single-Agent LLMs Outperform Multi-Agent Systems on Multi-Hop Reasoning Under Equal Thinking Token Budgets

Valence-Arousal Subspace in LLMs: Circular Emotion Geometry and Multi-Behavioral Control

CharTool: Tool-Integrated Visual Reasoning for Chart Understanding

VoxelCodeBench: Benchmarking 3D World Modeling Through Code Generation

OPRIDE: Offline Preference-based Reinforcement Learning via In-Dataset Exploration

I must delete the evidence: AI Agents Explicitly Cover up Fraud and Violent Crime

Using LLM-as-a-Judge/Jury to Advance Scalable, Clinically-Validated Safety Evaluations of Model Responses to Users Demonstrating Psychosis

Beyond the Parameters: A Technical Survey of Contextual Enrichment in Large Language Models: From In-Context Prompting to Causal Retrieval-Augmented Generation

DrugPlayGround: Benchmarking Large Language Models and Embeddings for Drug Discovery

DeltaLogic: Minimal Premise Edits Reveal Belief-Revision Failures in Logical Reasoning Models

ESL-Bench: An Event-Driven Synthetic Longitudinal Benchmark for Health Agents

Impact Distribution

Related Practice Areas

JCG, PC

HSOLLC Co., Ltd.