Immigration Law

LOW Academic United States

Revisiting Sharpness-Aware Minimization: A More Faithful and Effective Implementation

arXiv:2603.10048v1 Announce Type: new Abstract: Sharpness-Aware Minimization (SAM) enhances generalization by minimizing the maximum training loss within a predefined neighborhood around the parameters. However, its practical implementation approximates this as gradient ascent(s) followed by applying the gradient at the ascent...

1 min 1 month, 1 week ago

ead

LOW Academic International

InFusionLayer: a CFA-based ensemble tool to generate new classifiers for learning and modeling

arXiv:2603.10049v1 Announce Type: new Abstract: Ensemble learning is a well established body of methods for machine learning to enhance predictive performance by combining multiple algorithms/models. Combinatorial Fusion Analysis (CFA) has provided method and practice for combining multiple scoring systems, using...

1 min 1 month, 1 week ago

tps

LOW Academic International

HTMuon: Improving Muon via Heavy-Tailed Spectral Correction

arXiv:2603.10067v1 Announce Type: new Abstract: Muon has recently shown promising results in LLM training. In this work, we study how to further improve Muon. We argue that Muon's orthogonalized update rule suppresses the emergence of heavy-tailed weight spectra and over-emphasizes...

1 min 1 month, 1 week ago

tps

LOW Academic International

Improving Search Agent with One Line of Code

arXiv:2603.10069v1 Announce Type: new Abstract: Tool-based Agentic Reinforcement Learning (TARL) has emerged as a promising paradigm for training search agents to interact with external tools for a multi-turn information-seeking process autonomously. However, we identify a critical training instability that leads...

1 min 1 month, 1 week ago

ead

LOW Academic United States

Marginals Before Conditionals

arXiv:2603.10074v1 Announce Type: new Abstract: We construct a minimal task that isolates conditional learning in neural networks: a surjective map with K-fold ambiguity, resolved by a selector token z, so H(A | B) = log K while H(A | B,...

1 min 1 month, 1 week ago

ead

LOW Academic United States

ES-dLLM: Efficient Inference for Diffusion Large Language Models by Early-Skipping

arXiv:2603.10088v1 Announce Type: new Abstract: Diffusion large language models (dLLMs) are emerging as a promising alternative to autoregressive models (ARMs) due to their ability to capture bidirectional context and the potential for parallel generation. Despite the advantages, dLLM inference remains...

1 min 1 month, 1 week ago

tps

LOW Academic European Union

A Survey of Weight Space Learning: Understanding, Representation, and Generation

arXiv:2603.10090v1 Announce Type: new Abstract: Neural network weights are typically viewed as the end product of training, while most deep learning research focuses on data, features, and architectures. However, recent advances show that the set of all possible weight values...

1 min 1 month, 1 week ago

tps

LOW Academic United States

Equivariant Asynchronous Diffusion: An Adaptive Denoising Schedule for Accelerated Molecular Conformation Generation

arXiv:2603.10093v1 Announce Type: new Abstract: Recent 3D molecular generation methods primarily use asynchronous auto-regressive or synchronous diffusion models. While auto-regressive models build molecules sequentially, they're limited by a short horizon and a discrepancy between training and inference. Conversely, synchronous diffusion...

1 min 1 month, 1 week ago

ead

LOW Academic United States

Rethinking Adam for Time Series Forecasting: A Simple Heuristic to Improve Optimization under Distribution Shifts

arXiv:2603.10095v1 Announce Type: new Abstract: Time-series forecasting often faces challenges from non-stationarity, particularly distributional drift, where the data distribution evolves over time. This dynamic behavior can undermine the effectiveness of adaptive optimizers, such as Adam, which are typically designed for...

1 min 1 month, 1 week ago

tps

LOW Academic International

Lost in the Middle at Birth: An Exact Theory of Transformer Position Bias

arXiv:2603.10123v1 Announce Type: new Abstract: The ``Lost in the Middle'' phenomenon -- a U-shaped performance curve where LLMs retrieve well from the beginning and end of a context but fail in the middle -- is widely attributed to learned Softmax...

1 min 1 month, 1 week ago

ead

LOW Academic European Union

Mashup Learning: Faster Finetuning by Remixing Past Checkpoints

arXiv:2603.10156v1 Announce Type: new Abstract: Finetuning on domain-specific data is a well-established method for enhancing LLM performance on downstream tasks. Training on each dataset produces a new set of model weights, resulting in a multitude of checkpoints saved in-house or...

1 min 1 month, 1 week ago

ead

LOW Academic International

DT-BEHRT: Disease Trajectory-aware Transformer for Interpretable Patient Representation Learning

arXiv:2603.10180v1 Announce Type: new Abstract: The growing adoption of electronic health record (EHR) systems has provided unprecedented opportunities for predictive modeling to guide clinical decision making. Structured EHRs contain longitudinal observations of patients across hospital visits, where each visit is...

1 min 1 month, 1 week ago

tps

LOW Academic United States

Discovery of a Hematopoietic Manifold in scGPT Yields a Method for Extracting Performant Algorithms from Biological Foundation Model Internals

arXiv:2603.10261v1 Announce Type: new Abstract: We report the discovery and extraction of a compact hematopoietic algorithm from the single-cell foundation model scGPT, to our knowledge the first biologically useful, competitive algorithm extracted from a foundation model via mechanistic interpretability. We...

1 min 1 month, 1 week ago

ead

LOW Academic United States

Taming Score-Based Denoisers in ADMM: A Convergent Plug-and-Play Framework

arXiv:2603.10281v1 Announce Type: new Abstract: While score-based generative models have emerged as powerful priors for solving inverse problems, directly integrating them into optimization algorithms such as ADMM remains nontrivial. Two central challenges arise: i) the mismatch between the noisy data...

1 min 1 month, 1 week ago

ead

LOW Academic United States

How to make the most of your masked language model for protein engineering

arXiv:2603.10302v1 Announce Type: new Abstract: A plethora of protein language models have been released in recent years. Yet comparatively little work has addressed how to best sample from them to optimize desired biological properties. We fill this gap by proposing...

1 min 1 month, 1 week ago

ead

LOW Academic European Union

Optimal Expert-Attention Allocation in Mixture-of-Experts: A Scalable Law for Dynamic Model Design

arXiv:2603.10379v1 Announce Type: new Abstract: This paper presents a novel extension of neural scaling laws to Mixture-of-Experts (MoE) models, focusing on the optimal allocation of compute between expert and attention sub-layers. As MoE architectures have emerged as an efficient method...

1 min 1 month, 1 week ago

ead

LOW Academic International

Variance-Aware Adaptive Weighting for Diffusion Model Training

arXiv:2603.10391v1 Announce Type: new Abstract: Diffusion models have recently achieved remarkable success in generative modeling, yet their training dynamics across different noise levels remain highly imbalanced, which can lead to inefficient optimization and unstable learning behavior. In this work, we...

1 min 1 month, 1 week ago

ead

LOW Academic International

On the Learning Dynamics of Two-layer Linear Networks with Label Noise SGD

arXiv:2603.10397v1 Announce Type: new Abstract: One crucial factor behind the success of deep learning lies in the implicit bias induced by noise inherent in gradient-based training algorithms. Motivated by empirical observations that training with noisy labels improves model generalization, we...

1 min 1 month, 1 week ago

tps

LOW News United States

The 14th Amendment’s citizenship clause does not codify English principles of subjectship

Critics and supporters of President Donald Trump’s executive order on birthright citizenship often focus on the order’s barring of automatic citizenship to children born to individuals unlawfully present in the […]The postThe 14th Amendment’s citizenship clause does not codify English...

1 min 1 month, 1 week ago

citizenship

LOW News International

Zendesk acquires agentic customer service startup Forethought

Forethought was years ahead of its time and the 2018 winner of TechCrunch Battlefield.

1 min 1 month, 1 week ago

ead

LOW Law Review International

What is a Tort?

What is a tort, and what is tort law for? On one leading scholarly account, torts are legal liability rules that seek to promote the welfare of society at large by disincentivizing socially suboptimal behavior and distributing the costs of...

1 min 1 month, 1 week ago

ead

LOW Academic International

Learning When to Sample: Confidence-Aware Self-Consistency for Efficient LLM Chain-of-Thought Reasoning

arXiv:2603.08999v1 Announce Type: new Abstract: Large language models (LLMs) achieve strong reasoning performance through chain-of-thought (CoT) reasoning, yet often generate unnecessarily long reasoning paths that incur high inference cost. Recent self-consistency-based approaches further improve accuracy but require sampling and aggregating...

1 min 1 month, 1 week ago

ead

LOW Academic International

Let's Verify Math Questions Step by Step

arXiv:2505.13903v1 Announce Type: cross Abstract: Large Language Models (LLMs) have recently achieved remarkable progress in mathematical reasoning. To enable such capabilities, many existing works distill strong reasoning models into long chains of thought or design algorithms to construct high-quality math...

1 min 1 month, 1 week ago

tps

LOW Academic United States

Common Sense vs. Morality: The Curious Case of Narrative Focus Bias in LLMs

arXiv:2603.09434v1 Announce Type: new Abstract: Large Language Models (LLMs) are increasingly deployed across diverse real-world applications and user communities. As such, it is crucial that these models remain both morally grounded and knowledge-aware. In this work, we uncover a critical...

1 min 1 month, 1 week ago

ead

LOW Academic International

SPAR-K: Scheduled Periodic Alternating Early Exit for Spoken Language Models

arXiv:2603.09215v1 Announce Type: new Abstract: Interleaved spoken language models (SLMs) alternately generate text and speech tokens, but decoding at full transformer depth for every step becomes costly, especially due to long speech sequences. We propose SPAR-K, a modality-aware early exit...

1 min 1 month, 1 week ago

ead

LOW Academic International

DataFactory: Collaborative Multi-Agent Framework for Advanced Table Question Answering

arXiv:2603.09152v1 Announce Type: new Abstract: Table Question Answering (TableQA) enables natural language interaction with structured tabular data. However, existing large language model (LLM) approaches face critical limitations: context length constraints that restrict data handling capabilities, hallucination issues that compromise answer...

1 min 1 month, 1 week ago

ead

LOW Academic International

Reading, Not Thinking: Understanding and Bridging the Modality Gap When Text Becomes Pixels in Multimodal LLMs

arXiv:2603.09095v1 Announce Type: new Abstract: Multimodal large language models (MLLMs) can process text presented as images, yet they often perform worse than when the same content is provided as textual tokens. We systematically diagnose this "modality gap" by evaluating seven...

1 min 1 month, 1 week ago

ead

LOW Academic International

Reward Prediction with Factorized World States

arXiv:2603.09400v1 Announce Type: new Abstract: Agents must infer action outcomes and select actions that maximize a reward signal indicating how close the goal is to being reached. Supervised learning of reward models could introduce biases inherent to training data, limiting...

1 min 1 month, 1 week ago

tps

LOW Academic European Union

An Empirical Study and Theoretical Explanation on Task-Level Model-Merging Collapse

arXiv:2603.09463v1 Announce Type: new Abstract: Model merging unifies independently fine-tuned LLMs from the same base, enabling reuse and integration of parallel development efforts without retraining. However, in practice we observe that merging does not always succeed: certain combinations of task-specialist...

1 min 1 month, 1 week ago

adjustment

LOW Academic International

MedMASLab: A Unified Orchestration Framework for Benchmarking Multimodal Medical Multi-Agent Systems

arXiv:2603.09909v1 Announce Type: new Abstract: While Multi-Agent Systems (MAS) show potential for complex clinical decision support, the field remains hindered by architectural fragmentation and the lack of standardized multimodal integration. Current medical MAS research suffers from non-uniform data ingestion pipelines,...

1 min 1 month, 1 week ago

tps

Revisiting Sharpness-Aware Minimization: A More Faithful and Effective Implementation

InFusionLayer: a CFA-based ensemble tool to generate new classifiers for learning and modeling

HTMuon: Improving Muon via Heavy-Tailed Spectral Correction

Improving Search Agent with One Line of Code

Marginals Before Conditionals

ES-dLLM: Efficient Inference for Diffusion Large Language Models by Early-Skipping

A Survey of Weight Space Learning: Understanding, Representation, and Generation

Equivariant Asynchronous Diffusion: An Adaptive Denoising Schedule for Accelerated Molecular Conformation Generation

Rethinking Adam for Time Series Forecasting: A Simple Heuristic to Improve Optimization under Distribution Shifts

Lost in the Middle at Birth: An Exact Theory of Transformer Position Bias

Mashup Learning: Faster Finetuning by Remixing Past Checkpoints

DT-BEHRT: Disease Trajectory-aware Transformer for Interpretable Patient Representation Learning

Discovery of a Hematopoietic Manifold in scGPT Yields a Method for Extracting Performant Algorithms from Biological Foundation Model Internals

Taming Score-Based Denoisers in ADMM: A Convergent Plug-and-Play Framework

How to make the most of your masked language model for protein engineering

Optimal Expert-Attention Allocation in Mixture-of-Experts: A Scalable Law for Dynamic Model Design

Variance-Aware Adaptive Weighting for Diffusion Model Training

On the Learning Dynamics of Two-layer Linear Networks with Label Noise SGD

The 14th Amendment’s citizenship clause does not codify English principles of subjectship

Zendesk acquires agentic customer service startup Forethought

What is a Tort?

Learning When to Sample: Confidence-Aware Self-Consistency for Efficient LLM Chain-of-Thought Reasoning

Let's Verify Math Questions Step by Step

Common Sense vs. Morality: The Curious Case of Narrative Focus Bias in LLMs

SPAR-K: Scheduled Periodic Alternating Early Exit for Spoken Language Models

DataFactory: Collaborative Multi-Agent Framework for Advanced Table Question Answering

Reading, Not Thinking: Understanding and Bridging the Modality Gap When Text Becomes Pixels in Multimodal LLMs

Reward Prediction with Factorized World States

An Empirical Study and Theoretical Explanation on Task-Level Model-Merging Collapse

MedMASLab: A Unified Orchestration Framework for Benchmarking Multimodal Medical Multi-Agent Systems

Impact Distribution

Related Practice Areas

JCG, PC

HSOLLC Co., Ltd.