International Law

LOW Academic International

SLEA-RL: Step-Level Experience Augmented Reinforcement Learning for Multi-Turn Agentic Training

arXiv:2603.18079v1 Announce Type: new Abstract: Large Language Model (LLM) agents have shown strong results on multi-turn tool-use tasks, yet they operate in isolation during training, failing to leverage experiences accumulated across episodes. Existing experience-augmented methods address this by organizing trajectories...

1 min 4 weeks, 2 days ago

ear

LOW Academic International

Enhancing Reinforcement Learning Fine-Tuning with an Online Refiner

arXiv:2603.18088v1 Announce Type: new Abstract: Constraints are essential for stabilizing reinforcement learning fine-tuning (RFT) and preventing degenerate outputs, yet they inherently conflict with the optimization objective because stronger constraints limit the ability of a fine-tuned model to discover better solutions....

1 min 4 weeks, 2 days ago

ear

LOW Academic International

AGRI-Fidelity: Evaluating the Reliability of Listenable Explanations for Poultry Disease Detection

arXiv:2603.18247v1 Announce Type: new Abstract: Existing XAI metrics measure faithfulness for a single model, ignoring model multiplicity where near-optimal classifiers rely on different or spurious acoustic cues. In noisy farm environments, stationary artifacts such as ventilation noise can produce explanations...

1 min 4 weeks, 2 days ago

ear

LOW Academic International

Discovering What You Can Control: Interventional Boundary Discovery for Reinforcement Learning

arXiv:2603.18257v1 Announce Type: new Abstract: Selecting relevant state dimensions in the presence of confounded distractors is a causal identification problem: observational statistics alone cannot reliably distinguish dimensions that correlate with actions from those that actions cause. We formalize this as...

1 min 4 weeks, 2 days ago

ear

LOW Academic International

Learning to Reason with Curriculum I: Provable Benefits of Autocurriculum

arXiv:2603.18325v1 Announce Type: new Abstract: Chain-of-thought reasoning, where language models expend additional computation by producing thinking tokens prior to final responses, has driven significant advances in model capabilities. However, training these reasoning models is extremely costly in terms of both...

1 min 4 weeks, 2 days ago

ear

LOW Academic International

Escaping Offline Pessimism: Vector-Field Reward Shaping for Safe Frontier Exploration

arXiv:2603.18326v1 Announce Type: new Abstract: While offline reinforcement learning provides reliable policies for real-world deployment, its inherent pessimism severely restricts an agent's ability to explore and collect novel data online. Drawing inspiration from safe reinforcement learning, exploring near the boundary...

1 min 4 weeks, 2 days ago

ear

LOW Academic International

RE-SAC: Disentangling aleatoric and epistemic risks in bus fleet control: A stable and robust ensemble DRL approach

arXiv:2603.18396v1 Announce Type: new Abstract: Bus holding control is challenging due to stochastic traffic and passenger demand. While deep reinforcement learning (DRL) shows promise, standard actor-critic algorithms suffer from Q-value instability in volatile environments. A key source of this instability...

1 min 4 weeks, 2 days ago

ear

LOW Academic International

FlowMS: Flow Matching for De Novo Structure Elucidation from Mass Spectra

arXiv:2603.18397v1 Announce Type: new Abstract: Mass spectrometry (MS) stands as a cornerstone analytical technique for molecular identification, yet de novo structure elucidation from spectra remains challenging due to the combinatorial complexity of chemical space and the inherent ambiguity of spectral...

1 min 4 weeks, 2 days ago

ear

LOW Academic International

Towards Noise-Resilient Quantum Multi-Armed and Stochastic Linear Bandits

arXiv:2603.18431v1 Announce Type: new Abstract: Quantum multi-armed bandits (MAB) and stochastic linear bandits (SLB) have recently attracted significant attention, as their quantum counterparts can achieve quadratic speedups over classical MAB and SLB. However, most existing quantum MAB algorithms assume ideal...

1 min 4 weeks, 2 days ago

ear

LOW Academic International

Discounted Beta--Bernoulli Reward Estimation for Sample-Efficient Reinforcement Learning with Verifiable Rewards

arXiv:2603.18444v1 Announce Type: new Abstract: Reinforcement learning with verifiable rewards (RLVR) has emerged as an effective post-training paradigm for improving the reasoning capabilities of large language models. However, existing group-based RLVR methods often suffer from severe sample inefficiency. This inefficiency...

1 min 4 weeks, 2 days ago

ear

LOW Academic International

AcceRL: A Distributed Asynchronous Reinforcement Learning and World Model Framework for Vision-Language-Action Models

arXiv:2603.18464v1 Announce Type: new Abstract: Reinforcement learning (RL) for large-scale Vision-Language-Action (VLA) models faces significant challenges in computational efficiency and data acquisition. We propose AcceRL, a fully asynchronous and decoupled RL framework designed to eliminate synchronization barriers by physically isolating...

1 min 4 weeks, 2 days ago

ear

LOW Academic International

Data-efficient pre-training by scaling synthetic megadocs

arXiv:2603.18534v1 Announce Type: new Abstract: Synthetic data augmentation has emerged as a promising solution when pre-training is constrained by data rather than compute. We study how to design synthetic data algorithms that achieve better loss scaling: not only lowering loss...

1 min 4 weeks, 2 days ago

ear

LOW News International

FBI started buying Americans' location data again, Kash Patel confirms

Tom Cotton supports FBI data purchasing, compares it to searching people's trash.

1 min 4 weeks, 2 days ago

ear

LOW News International

DoorDash launches a new ‘Tasks’ app that pays couriers to submit videos to train AI

Delivery couriers will be able to earn money by completing activities like filming everyday tasks or recording themselves speaking in another language.

1 min 4 weeks, 2 days ago

ear

LOW Conference International

On Violations of LLM Review Policies

5 min 1 month ago

ear

LOW Academic International

A foundation model for electrodermal activity data

arXiv:2603.16878v1 Announce Type: new Abstract: Foundation models have recently extended beyond natural language and vision to timeseries domains, including physiological signals. However, progress in electrodermal activity (EDA) modeling is hindered by the absence of large-scale, curated, and openly accessible datasets....

1 min 1 month ago

ear

LOW Academic International

Multi-Agent Reinforcement Learning for Dynamic Pricing: Balancing Profitability,Stability and Fairness

arXiv:2603.16888v1 Announce Type: new Abstract: Dynamic pricing in competitive retail markets requires strategies that adapt to fluctuating demand and competitor behavior. In this work, we present a systematic empirical evaluation of multi-agent reinforcement learning (MARL) approaches-specifically MAPPO and MADDPG-for dynamic...

1 min 1 month ago

ear

LOW Academic International

MHPO: Modulated Hazard-aware Policy Optimization for Stable Reinforcement Learning

arXiv:2603.16929v1 Announce Type: new Abstract: Regulating the importance ratio is critical for the training stability of Group Relative Policy Optimization (GRPO) based frameworks. However, prevailing ratio control methods, such as hard clipping, suffer from non-differentiable boundaries and vanishing gradient regions,...

1 min 1 month ago

ear

LOW Academic International

Integrating Explainable Machine Learning and Mixed-Integer Optimization for Personalized Sleep Quality Intervention

arXiv:2603.16937v1 Announce Type: new Abstract: Sleep quality is influenced by a complex interplay of behavioral, environmental, and psychosocial factors, yet most computational studies focus mainly on predictive risk identification rather than actionable intervention design. Although machine learning models can accurately...

1 min 1 month ago

ear

LOW Academic International

Formal verification of tree-based machine learning models for lateral spreading

arXiv:2603.16983v1 Announce Type: new Abstract: Machine learning models for geotechnical hazard prediction can achieve high accuracy while learning physically inconsistent relationships from sparse or biased training data. Current remedies (post-hoc explainability, such as SHAP and LIME, and training-time constraints) either...

1 min 1 month ago

ear

LOW Academic International

Do Understanding and Generation Fight? A Diagnostic Study of DPO for Unified Multimodal Models

arXiv:2603.17044v1 Announce Type: new Abstract: Unified multimodal models share a language model backbone for both understanding and generating images. Can DPO align both capabilities simultaneously? We present the first systematic study of this question, applying DPO to Janus-Pro at 1B...

1 min 1 month ago

ear

LOW Academic International

Early Quantization Shrinks Codebook: A Simple Fix for Diversity-Preserving Tokenization

arXiv:2603.17052v1 Announce Type: new Abstract: Vector quantization is a technique in machine learning that discretizes continuous representations into a set of discrete vectors. It is widely employed in tokenizing data representations for large language models, diffusion models, and other generative...

1 min 1 month ago

ear

LOW Academic International

PRISM: Demystifying Retention and Interaction in Mid-Training

arXiv:2603.17074v1 Announce Type: new Abstract: We present PRISM, a comprehensive empirical study of mid-training design choices for large language models. Through controlled experiments across seven base models spanning four families (Granite, LLaMA, Mistral, Nemotron-H), two architecture types (dense Transformer and...

1 min 1 month ago

ear

LOW Academic International

CircuitBuilder: From Polynomials to Circuits via Reinforcement Learning

arXiv:2603.17075v1 Announce Type: new Abstract: Motivated by auto-proof generation and Valiant's VP vs. VNP conjecture, we study the problem of discovering efficient arithmetic circuits to compute polynomials, using addition and multiplication gates. We formulate this problem as a single-player game,...

1 min 1 month ago

ear

LOW Academic International

Topology-Preserving Deep Joint Source-Channel Coding for Semantic Communication

arXiv:2603.17126v1 Announce Type: new Abstract: Many wireless vision applications, such as autonomous driving, require preservation of global structural information rather than only per-pixel fidelity. However, existing Deep joint source-channel coding (DeepJSCC) schemes mainly optimize pixel-wise losses and provide no explicit...

1 min 1 month ago

ear

LOW Academic International

REAL: Regression-Aware Reinforcement Learning for LLM-as-a-Judge

arXiv:2603.17145v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly deployed as automated evaluators that assign numeric scores to model outputs, a paradigm known as LLM-as-a-Judge. However, standard Reinforcement Learning (RL) methods typically rely on binary rewards (e.g., 0-1...

1 min 1 month ago

ear

LOW Academic International

Noise-Response Calibration: A Causal Intervention Protocol for LLM-Judges

arXiv:2603.17172v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used as automated judges and synthetic labelers, especially in low-label settings. Yet these systems are stochastic and often overconfident, which makes deployment decisions difficult when external ground truth is...

1 min 1 month ago

ear

LOW Academic International

Domain-informed explainable boosting machines for trustworthy lateral spread predictions

arXiv:2603.17175v1 Announce Type: new Abstract: Explainable Boosting Machines (EBMs) provide transparent predictions through additive shape functions, enabling direct inspection of feature contributions. However, EBMs can learn non-physical relationships that reduce their reliability in natural hazard applications. This study presents a...

1 min 1 month ago

ear

LOW Academic International

Self-Conditioned Denoising for Atomistic Representation Learning

arXiv:2603.17196v1 Announce Type: new Abstract: The success of large-scale pretraining in NLP and computer vision has catalyzed growing efforts to develop analogous foundation models for the physical sciences. However, pretraining strategies using atomistic data remain underexplored. To date, large-scale supervised...

1 min 1 month ago

ear

LOW Academic International

Abstraction as a Memory-Efficient Inductive Bias for Continual Learning

arXiv:2603.17198v1 Announce Type: new Abstract: The real world is non-stationary and infinitely complex, requiring intelligent agents to learn continually without the prohibitive cost of retraining from scratch. While online continual learning offers a framework for this setting, learning new information...

1 min 1 month ago

ear

SLEA-RL: Step-Level Experience Augmented Reinforcement Learning for Multi-Turn Agentic Training

Enhancing Reinforcement Learning Fine-Tuning with an Online Refiner

AGRI-Fidelity: Evaluating the Reliability of Listenable Explanations for Poultry Disease Detection

Discovering What You Can Control: Interventional Boundary Discovery for Reinforcement Learning

Learning to Reason with Curriculum I: Provable Benefits of Autocurriculum

Escaping Offline Pessimism: Vector-Field Reward Shaping for Safe Frontier Exploration

RE-SAC: Disentangling aleatoric and epistemic risks in bus fleet control: A stable and robust ensemble DRL approach

FlowMS: Flow Matching for De Novo Structure Elucidation from Mass Spectra

Towards Noise-Resilient Quantum Multi-Armed and Stochastic Linear Bandits

Discounted Beta--Bernoulli Reward Estimation for Sample-Efficient Reinforcement Learning with Verifiable Rewards

AcceRL: A Distributed Asynchronous Reinforcement Learning and World Model Framework for Vision-Language-Action Models

Data-efficient pre-training by scaling synthetic megadocs

FBI started buying Americans' location data again, Kash Patel confirms

DoorDash launches a new ‘Tasks’ app that pays couriers to submit videos to train AI

On Violations of LLM Review Policies

A foundation model for electrodermal activity data

Multi-Agent Reinforcement Learning for Dynamic Pricing: Balancing Profitability,Stability and Fairness

MHPO: Modulated Hazard-aware Policy Optimization for Stable Reinforcement Learning

Integrating Explainable Machine Learning and Mixed-Integer Optimization for Personalized Sleep Quality Intervention

Formal verification of tree-based machine learning models for lateral spreading

Do Understanding and Generation Fight? A Diagnostic Study of DPO for Unified Multimodal Models

Early Quantization Shrinks Codebook: A Simple Fix for Diversity-Preserving Tokenization

PRISM: Demystifying Retention and Interaction in Mid-Training

CircuitBuilder: From Polynomials to Circuits via Reinforcement Learning

Topology-Preserving Deep Joint Source-Channel Coding for Semantic Communication

REAL: Regression-Aware Reinforcement Learning for LLM-as-a-Judge

Noise-Response Calibration: A Causal Intervention Protocol for LLM-Judges

Domain-informed explainable boosting machines for trustworthy lateral spread predictions

Self-Conditioned Denoising for Atomistic Representation Learning

Abstraction as a Memory-Efficient Inductive Bias for Continual Learning

Impact Distribution

Related Practice Areas

JCG, PC

HSOLLC Co., Ltd.