Labor & Employment

LOW Academic United States

Enhancing Safety of Large Language Models via Embedding Space Separation

arXiv:2603.20206v1 Announce Type: new Abstract: Large language models (LLMs) have achieved impressive capabilities, yet ensuring their safety against harmful prompts remains a critical challenge. Recent work has revealed that the latent representations (embeddings) of harmful and safe queries in LLMs...

1 min 3 weeks, 3 days ago

ada

LOW Academic United States

FinReflectKG -- HalluBench: GraphRAG Hallucination Benchmark for Financial Question Answering Systems

arXiv:2603.20252v1 Announce Type: new Abstract: As organizations increasingly integrate AI-powered question-answering systems into financial information systems for compliance, risk assessment, and decision support, ensuring the factual accuracy of AI-generated outputs becomes a critical engineering challenge. Current Knowledge Graph (KG)-augmented QA...

1 min 3 weeks, 3 days ago

ada

LOW Academic United States

A Framework for Low-Latency, LLM-driven Multimodal Interaction on the Pepper Robot

arXiv:2603.21013v1 Announce Type: new Abstract: Despite recent advances in integrating Large Language Models (LLMs) into social robotics, two weaknesses persist. First, existing implementations on platforms like Pepper often rely on cascaded Speech-to-Text (STT)->LLM->Text-to-Speech (TTS) pipelines, resulting in high latency and...

1 min 3 weeks, 3 days ago

ada

LOW Academic United States

Profit is the Red Team: Stress-Testing Agents in Strategic Economic Interactions

arXiv:2603.20925v1 Announce Type: new Abstract: As agentic systems move into real-world deployments, their decisions increasingly depend on external inputs such as retrieved content, tool outputs, and information provided by other actors. When these inputs can be strategically shaped by adversaries,...

1 min 3 weeks, 3 days ago

ada

LOW Conference United States

NeurIPS Datasets & Benchmarks Track: From Art to Science in AI Evaluations

5 min 3 weeks, 3 days ago

ada

LOW Academic United States

Weber's Law in Transformer Magnitude Representations: Efficient Coding, Representational Geometry, and Psychophysical Laws in Language Models

arXiv:2603.20642v1 Announce Type: new Abstract: How do transformer language models represent magnitude? Recent work disagrees: some find logarithmic spacing, others linear encoding, others per-digit circular representations. We apply the formal tools of psychophysics to resolve this. Using four converging paradigms...

1 min 3 weeks, 3 days ago

discrimination

LOW Academic United States

Large Neighborhood Search meets Iterative Neural Constraint Heuristics

arXiv:2603.20801v1 Announce Type: new Abstract: Neural networks are being increasingly used as heuristics for constraint satisfaction. These neural methods are often recurrent, learning to iteratively refine candidate assignments. In this work, we make explicit the connection between such iterative neural...

1 min 3 weeks, 3 days ago

ada

LOW Academic United States

A Subgoal-driven Framework for Improving Long-Horizon LLM Agents

arXiv:2603.19685v1 Announce Type: new Abstract: Large language model (LLM)-based agents have emerged as powerful autonomous controllers for digital environments, including mobile interfaces, operating systems, and web browsers. Web navigation, for example, requires handling dynamic content and long sequences of actions,...

1 min 3 weeks, 4 days ago

ada

LOW Academic United States

Stepwise: Neuro-Symbolic Proof Search for Automated Systems Verification

arXiv:2603.19715v1 Announce Type: new Abstract: Formal verification via interactive theorem proving is increasingly used to ensure the correctness of critical systems, yet constructing large proof scripts remains highly manual and limits scalability. Advances in large language models (LLMs), especially in...

1 min 3 weeks, 4 days ago

ada

LOW Academic United States

Improving Automatic Summarization of Radiology Reports through Mid-Training of Large Language Models

arXiv:2603.19275v1 Announce Type: cross Abstract: Automatic summarization of radiology reports is an essential application to reduce the burden on physicians. Previous studies have widely used the "pre-training, fine-tuning" strategy to adapt large language models (LLMs) for summarization. This study proposed...

1 min 3 weeks, 4 days ago

ada

LOW Academic United States

Autonoma: A Hierarchical Multi-Agent Framework for End-to-End Workflow Automation

arXiv:2603.19270v1 Announce Type: new Abstract: The increasing complexity of user demands necessitates automation frameworks that can reliably translate open-ended instructions into robust, multi-step workflows. Current monolithic agent architectures often struggle with the challenges of scalability, error propagation, and maintaining focus...

1 min 3 weeks, 4 days ago

labor

LOW Academic United States

PrefPO: Pairwise Preference Prompt Optimization

arXiv:2603.19311v1 Announce Type: new Abstract: Prompt engineering is effective but labor-intensive, motivating automated optimization methods. Existing methods typically require labeled datasets, which are often unavailable, and produce verbose, repetitive prompts. We introduce PrefPO, a minimal prompt optimization approach inspired by...

1 min 3 weeks, 4 days ago

labor

LOW Academic United States

Scalable Prompt Routing via Fine-Grained Latent Task Discovery

arXiv:2603.19415v1 Announce Type: new Abstract: Prompt routing dynamically selects the most appropriate large language model from a pool of candidates for each query, optimizing performance while managing costs. As model pools scale to include dozens of frontier models with narrow...

1 min 3 weeks, 4 days ago

ada

LOW Academic United States

FDARxBench: Benchmarking Regulatory and Clinical Reasoning on FDA Generic Drug Assessment

arXiv:2603.19539v1 Announce Type: new Abstract: We introduce an expert curated, real-world benchmark for evaluating document-grounded question-answering (QA) motivated by generic drug assessment, using the U.S. Food and Drug Administration (FDA) drug label documents. Drug labels contain rich but heterogeneous clinical...

1 min 3 weeks, 4 days ago

labor

LOW Academic United States

Scalable Cross-Facility Federated Learning for Scientific Foundation Models on Multiple Supercomputers

arXiv:2603.19544v1 Announce Type: new Abstract: Artificial Intelligence for scientific applications increasingly requires training large models on data that cannot be centralized due to privacy constraints, data sovereignty, or the sheer volume of data generated. Federated learning (FL) addresses this by...

1 min 3 weeks, 4 days ago

labor

LOW Academic United States

Wearable Foundation Models Should Go Beyond Static Encoders

arXiv:2603.19564v1 Announce Type: new Abstract: Wearable foundation models (WFMs), trained on large volumes of data collected by affordable, always-on devices, have demonstrated strong performance on short-term, well-defined health monitoring tasks, including activity recognition, fitness tracking, and cardiovascular signal assessment. However,...

1 min 3 weeks, 4 days ago

ada

LOW Academic United States

Scale-Dependent Radial Geometry and Metric Mismatch in Wasserstein Propagation for Reverse Diffusion

arXiv:2603.19670v1 Announce Type: new Abstract: Existing analyses of reverse diffusion often propagate sampling error in the Euclidean geometry underlying \(\Wtwo\) along the entire reverse trajectory. Under weak log-concavity, however, Gaussian smoothing can create contraction first at large separations while short...

1 min 3 weeks, 4 days ago

ada

LOW Academic United States

AlignMamba-2: Enhancing Multimodal Fusion and Sentiment Analysis with Modality-Aware Mamba

arXiv:2603.18462v1 Announce Type: new Abstract: In the era of large-scale pre-trained models, effectively adapting general knowledge to specific affective computing tasks remains a challenge, particularly regarding computational efficiency and multimodal heterogeneity. While Transformer-based methods have excelled at modeling inter-modal dependencies,...

1 min 4 weeks ago

ada

LOW Academic United States

The Validity Gap in Health AI Evaluation: A Cross-Sectional Analysis of Benchmark Composition

arXiv:2603.18294v1 Announce Type: new Abstract: Background: Clinical trials rely on transparent inclusion criteria to ensure generalizability. In contrast, benchmarks validating health-related large language models (LLMs) rarely characterize the "patient" or "query" populations they contain. Without defined composition, aggregate performance metrics...

1 min 4 weeks ago

labor

LOW Academic United States

Balanced Thinking: Improving Chain of Thought Training in Vision Language Models

arXiv:2603.18656v1 Announce Type: new Abstract: Multimodal reasoning in vision-language models (VLMs) typically relies on a two-stage process: supervised fine-tuning (SFT) and reinforcement learning (RL). In standard SFT, all tokens contribute equally to the loss, even though reasoning data are inherently...

1 min 4 weeks ago

ada

LOW Academic United States

How Psychological Learning Paradigms Shaped and Constrained Artificial Intelligence

arXiv:2603.18203v1 Announce Type: new Abstract: The dominant paradigms of artificial intelligence were shaped by learning theories from psychology: behaviorism inspired reinforcement learning, cognitivism gave rise to deep learning and memory-augmented architectures, and constructivism influenced curriculum learning and compositional approaches. This...

1 min 4 weeks ago

ada

LOW Academic United States

Beyond Passive Aggregation: Active Auditing and Topology-Aware Defense in Decentralized Federated Learning

arXiv:2603.18538v1 Announce Type: new Abstract: Decentralized Federated Learning (DFL) remains highly vulnerable to adaptive backdoor attacks designed to bypass traditional passive defense metrics. To address this limitation, we shift the defensive paradigm toward a novel active, interventional auditing framework. First,...

1 min 4 weeks ago

ada

LOW Academic United States

Personalized Fall Detection by Balancing Data with Selective Feedback Using Contrastive Learning

arXiv:2603.17148v1 Announce Type: new Abstract: Personalized fall detection models can significantly improve accuracy by adapting to individual motion patterns, yet their effectiveness is often limited by the scarcity of real-world fall data and the dominance of non-fall feedback samples. This...

1 min 4 weeks, 1 day ago

ada

LOW Academic United States

MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild

arXiv:2603.17187v1 Announce Type: new Abstract: Large language model (LLM) agents are increasingly used for complex tasks, yet deployed agents often remain static, failing to adapt as user needs evolve. This creates a tension between the need for continuous service and...

1 min 4 weeks, 1 day ago

ada

LOW Academic United States

Optimizing Hospital Capacity During Pandemics: A Dual-Component Framework for Strategic Patient Relocation

arXiv:2603.15960v1 Announce Type: new Abstract: The COVID-19 pandemic has placed immense strain on hospital systems worldwide, leading to critical capacity challenges. This research proposes a two-part framework to optimize hospital capacity through patient relocation strategies. The first component involves developing...

1 min 4 weeks, 2 days ago

ada

LOW Academic United States

AsgardBench - Evaluating Visually Grounded Interactive Planning Under Minimal Feedback

arXiv:2603.15888v1 Announce Type: new Abstract: With AsgardBench we aim to evaluate visually grounded, high-level action sequence generation and interactive planning, focusing specifically on plan adaptation during execution based on visual observations rather than navigation or low-level manipulation. In the landscape...

1 min 4 weeks, 2 days ago

ada

LOW Academic United States

POLAR:A Per-User Association Test in Embedding Space

arXiv:2603.15950v1 Announce Type: new Abstract: Most intrinsic association probes operate at the word, sentence, or corpus level, obscuring author-level variation. We present POLAR (Per-user On-axis Lexical Association Re-port), a per-user lexical association test that runs in the embedding space of...

1 min 4 weeks, 2 days ago

ada

LOW Academic United States

Quantum-Secure-By-Construction (QSC): A Paradigm Shift For Post-Quantum Agentic Intelligence

arXiv:2603.15668v1 Announce Type: new Abstract: As agentic artificial intelligence systems scale across globally distributed and long lived infrastructures, secure and policy compliant communication becomes a fundamental systems challenge. This challenge grows more serious in the quantum era, where the cryptographic...

1 min 4 weeks, 2 days ago

ada

LOW Academic United States

An Agentic Evaluation Framework for AI-Generated Scientific Code in PETSc

arXiv:2603.15976v1 Announce Type: new Abstract: While large language models have significantly accelerated scientific code generation, comprehensively evaluating the generated code remains a major challenge. Traditional benchmarks reduce evaluation to test-case matching, an approach insufficient for library code in HPC where...

1 min 4 weeks, 2 days ago

ada

LOW Academic United States

RadAnnotate: Large Language Models for Efficient and Reliable Radiology Report Annotation

arXiv:2603.16002v1 Announce Type: new Abstract: Radiology report annotation is essential for clinical NLP, yet manual labeling is slow and costly. We present RadAnnotate, an LLM-based framework that studies retrieval-augmented synthetic reports and confidence-based selective automation to reduce expert effort for...

1 min 4 weeks, 2 days ago

ada

Enhancing Safety of Large Language Models via Embedding Space Separation

FinReflectKG -- HalluBench: GraphRAG Hallucination Benchmark for Financial Question Answering Systems

A Framework for Low-Latency, LLM-driven Multimodal Interaction on the Pepper Robot

Profit is the Red Team: Stress-Testing Agents in Strategic Economic Interactions

NeurIPS Datasets & Benchmarks Track: From Art to Science in AI Evaluations

Weber's Law in Transformer Magnitude Representations: Efficient Coding, Representational Geometry, and Psychophysical Laws in Language Models

Large Neighborhood Search meets Iterative Neural Constraint Heuristics

A Subgoal-driven Framework for Improving Long-Horizon LLM Agents

Stepwise: Neuro-Symbolic Proof Search for Automated Systems Verification

Improving Automatic Summarization of Radiology Reports through Mid-Training of Large Language Models

Autonoma: A Hierarchical Multi-Agent Framework for End-to-End Workflow Automation

PrefPO: Pairwise Preference Prompt Optimization

Scalable Prompt Routing via Fine-Grained Latent Task Discovery

FDARxBench: Benchmarking Regulatory and Clinical Reasoning on FDA Generic Drug Assessment

Scalable Cross-Facility Federated Learning for Scientific Foundation Models on Multiple Supercomputers

Wearable Foundation Models Should Go Beyond Static Encoders

Scale-Dependent Radial Geometry and Metric Mismatch in Wasserstein Propagation for Reverse Diffusion

AlignMamba-2: Enhancing Multimodal Fusion and Sentiment Analysis with Modality-Aware Mamba

The Validity Gap in Health AI Evaluation: A Cross-Sectional Analysis of Benchmark Composition

Balanced Thinking: Improving Chain of Thought Training in Vision Language Models

How Psychological Learning Paradigms Shaped and Constrained Artificial Intelligence

Beyond Passive Aggregation: Active Auditing and Topology-Aware Defense in Decentralized Federated Learning

Personalized Fall Detection by Balancing Data with Selective Feedback Using Contrastive Learning

MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild

Optimizing Hospital Capacity During Pandemics: A Dual-Component Framework for Strategic Patient Relocation

AsgardBench - Evaluating Visually Grounded Interactive Planning Under Minimal Feedback

POLAR:A Per-User Association Test in Embedding Space

Quantum-Secure-By-Construction (QSC): A Paradigm Shift For Post-Quantum Agentic Intelligence

An Agentic Evaluation Framework for AI-Generated Scientific Code in PETSc

RadAnnotate: Large Language Models for Efficient and Reliable Radiology Report Annotation

Impact Distribution

Related Practice Areas

JCG, PC

HSOLLC Co., Ltd.