NERdME: a Named Entity Recognition Dataset for Indexing Research Artifacts in Code Repositories
arXiv:2603.05750v1 Announce Type: new Abstract: Existing scholarly information extraction (SIE) datasets focus on scientific papers and overlook implementation-level details in code repositories. README files describe datasets, source code, and other implementation-level artifacts, however, their free-form Markdown offers little semantic structure,...
Tutor Move Taxonomy: A Theory-Aligned Framework for Analyzing Instructional Moves in Tutoring
arXiv:2603.05778v1 Announce Type: new Abstract: Understanding what makes tutoring effective requires methods for systematically analyzing tutors' instructional actions during learning interactions. This paper presents a tutor move taxonomy designed to support large-scale analysis of tutoring dialogue within the National Tutoring...
Addressing the Ecological Fallacy in Larger LMs with Human Context
arXiv:2603.05928v1 Announce Type: new Abstract: Language model training and inference ignore a fundamental linguistic fact -- there is a dependence between multiple sequences of text written by the same person. Prior work has shown that addressing this form of \textit{ecological...
Track-SQL: Enhancing Generative Language Models with Dual-Extractive Modules for Schema and Context Tracking in Multi-turn Text-to-SQL
arXiv:2603.05996v1 Announce Type: new Abstract: Generative language models have shown significant potential in single-turn Text-to-SQL. However, their performance does not extend equivalently to multi-turn Text-to-SQL. This is primarily due to generative language models' inadequacy in handling the complexities of context...
Diffusion Language Models Are Natively Length-Aware
arXiv:2603.06123v1 Announce Type: new Abstract: Unlike autoregressive language models, which terminate variable-length generation upon predicting an End-of-Sequence (EoS) token, Diffusion Language Models (DLMs) operate over a fixed maximum-length context window for a predetermined number of denoising steps. However, this process...
FlashPrefill: Instantaneous Pattern Discovery and Thresholding for Ultra-Fast Long-Context Prefilling
arXiv:2603.06199v1 Announce Type: new Abstract: Long-context modeling is a pivotal capability for Large Language Models, yet the quadratic complexity of attention remains a critical bottleneck, particularly during the compute-intensive prefilling phase. While various sparse attention mechanisms have been explored, they...
SPOT: Span-level Pause-of-Thought for Efficient and Interpretable Latent Reasoning in Large Language Models
arXiv:2603.06222v1 Announce Type: new Abstract: Explicit Chain-of-Thought improves the reasoning performance of large language models but often incurs high inference cost due to verbose token-level traces. While recent approaches reduce this overhead via concise prompting or step pruning, they largely...
FuseDiff: Symmetry-Preserving Joint Diffusion for Dual-Target Structure-Based Drug Design
arXiv:2603.05567v1 Announce Type: new Abstract: Dual-target structure-based drug design aims to generate a single ligand together with two pocket-specific binding poses, each compatible with a corresponding target pocket, enabling polypharmacological therapies with improved efficacy and reduced resistance. Existing approaches typically...
Why Depth Matters in Parallelizable Sequence Models: A Lie Algebraic View
arXiv:2603.05573v1 Announce Type: new Abstract: Scalable sequence models, such as Transformer variants and structured state-space models, often trade expressivity power for sequence-level parallelism, which enables efficient training. Here we examine the bounds on error and how error scales when models...
Reinforcement Learning for Power-Flow Network Analysis
arXiv:2603.05673v1 Announce Type: new Abstract: The power flow equations are non-linear multivariate equations that describe the relationship between power injections and bus voltages of electric power networks. Given a network topology, we are interested in finding network parameters with many...
Revisiting the (Sub)Optimality of Best-of-N for Inference-Time Alignment
arXiv:2603.05739v1 Announce Type: new Abstract: Best-of-N (BoN) sampling is a widely used inference-time alignment method for language models, whereby N candidate responses are sampled from a reference model and the one with the highest predicted reward according to a learned...
MIRACL: A Diverse Meta-Reinforcement Learning for Multi-Objective Multi-Echelon Combinatorial Supply Chain Optimisation
arXiv:2603.05760v1 Announce Type: new Abstract: Multi-objective reinforcement learning (MORL) is effective for multi-echelon combinatorial supply chain optimisation, where tasks involve high dimensionality, uncertainty, and competing objectives. However, its deployment in dynamic environments is hindered by the need for task-specific retraining...
Score-Guided Proximal Projection: A Unified Geometric Framework for Rectified Flow Editing
arXiv:2603.05761v1 Announce Type: new Abstract: Rectified Flow (RF) models achieve state-of-the-art generation quality, yet controlling them for precise tasks -- such as semantic editing or blind image recovery -- remains a challenge. Current approaches bifurcate into inversion-based guidance, which suffers...
Bridging Domains through Subspace-Aware Model Merging
arXiv:2603.05768v1 Announce Type: new Abstract: Model merging integrates multiple task-specific models into a single consolidated one. Recent research has made progress in improving merging performance for in-distribution or multi-task scenarios, but domain generalization in model merging remains underexplored. We investigate...
Sparse Crosscoders for diffing MoEs and Dense models
arXiv:2603.05805v1 Announce Type: new Abstract: Mixture of Experts (MoE) achieve parameter-efficient scaling through sparse expert routing, yet their internal representations remain poorly understood compared to dense models. We present a systematic comparison of MoE and dense model internals using crosscoders,...
MoE Lens -- An Expert Is All You Need
arXiv:2603.05806v1 Announce Type: new Abstract: Mixture of Experts (MoE) models enable parameter-efficient scaling through sparse expert activations, yet optimizing their inference and memory costs remains challenging due to limited understanding of their specialization behavior. We present a systematic analysis of...
Self-Auditing Parameter-Efficient Fine-Tuning for Few-Shot 3D Medical Image Segmentation
arXiv:2603.05822v1 Announce Type: new Abstract: Adapting foundation models to new clinical sites remains challenging in practice. Domain shift and scarce annotations must be handled by experts, yet many clinical groups do not have ready access to skilled AI engineers to...
Omni-Masked Gradient Descent: Memory-Efficient Optimization via Mask Traversal with Improved Convergence
arXiv:2603.05960v1 Announce Type: new Abstract: Memory-efficient optimization methods have recently gained increasing attention for scaling full-parameter training of large language models under the GPU-memory bottleneck. Existing approaches either lack clear convergence guarantees, or only achieve the standard ${\mathcal{O}}(\epsilon^{-4})$ iteration complexity...
Preventing Learning Stagnation in PPO by Scaling to 1 Million Parallel Environments
arXiv:2603.06009v1 Announce Type: new Abstract: Plateaus, where an agent's performance stagnates at a suboptimal level, are a common problem in deep on-policy RL. Focusing on PPO due to its widespread adoption, we show that plateaus in certain regimes arise not...
Latent Diffusion-Based 3D Molecular Recovery from Vibrational Spectra
arXiv:2603.06113v1 Announce Type: new Abstract: Infrared (IR) spectroscopy, a type of vibrational spectroscopy, is widely used for molecular structure determination and provides critical structural information for chemists. However, existing approaches for recovering molecular structures from IR spectra typically rely on...
DC-Merge: Improving Model Merging with Directional Consistency
arXiv:2603.06242v1 Announce Type: new Abstract: Model merging aims to integrate multiple task-adapted models into a unified model that preserves the knowledge of each task. In this paper, we identify that the key to this knowledge retention lies in maintaining the...
Gradient Flow Polarizes Softmax Outputs towards Low-Entropy Solutions
arXiv:2603.06248v1 Announce Type: new Abstract: Understanding the intricate non-convex training dynamics of softmax-based models is crucial for explaining the empirical success of transformers. In this article, we analyze the gradient flow dynamics of the value-softmax model, defined as ${L}(\mathbf{V} \sigma(\mathbf{a}))$,...
Owner of ICE detention facility sees big opportunity in AI man camps
AI data center developers are increasingly relying on a style of camp popularized as housing for men working in remote oil fields.
A roadmap for AI, if anyone will listen
The Pro-Human Declaration was finalized before last week's Pentagon-Anthropic standoff, but the collision of the two events wasn’t lost on anyone involved.
Google just gave Sundar Pichai a $692M pay package
Most of it is tied to performance, including new stock incentives linked to Waymo and Wing, its drone delivery venture.
Grammarly’s ‘expert review’ is just missing the actual experts
A recently-added feature in Grammarly purports to improve users’ writing with help from the world's great writers and thinkers — and some tech journalists, too.
NLP2024 Theme Session “NLP in the Legal Domain”
The Demise of the Functionality Doctrine in Design Patent Law
ARTICLE The Demise of the Functionality Doctrine in Design Patent Law Perry J. Saidman* The so-called doctrine of functionality arises in both design patent validity and infringement analyses. Broadly stated, the doctrine seeks to ensure that design patents do not...
The potential of AI-driven truth technologies: opportunities, risks and governance