Efficient Decoder Scaling Strategy for Neural Routing Solvers
arXiv:2603.00430v1 Announce Type: new Abstract: Construction-based neural routing solvers, typically composed of an encoder and a decoder, have emerged as a promising approach for solving vehicle routing problems. While recent studies suggest that shifting parameters from the encoder to the...
Déjà vu all over again
The Relist Watch column examines cert petitions that the Supreme Court has “relisted” for its upcoming conference. A short explanation of relists is available here. The Supreme Court is continuing to […]The postDéjà vu all over againappeared first onSCOTUSblog.
Episode 41: Thinking through Rupture in International Economic Law: Views from Latin America - EJIL: The Podcast!
FCC chair calls Paramount/WBD merger "a lot cleaner" than defunct Netflix deal
FCC to review foreign debt, but Carr indicates it will be a formality.
Why AI startups are selling the same equity at two different prices
Some AI founders are using a novel valuation mechanism to manufacture unicorn status.
Alibaba’s Qwen tech lead steps down after major AI push
Reactions rippled through Alibaba's Qwen team after tech lead Junyang Lin stepped down following a major model launch.
AI companies are spending millions to thwart this former tech exec’s congressional bid
A tech billionaire-backed super PAC is spending $125 million to undercut candidates pushing for AI regulation. New York's Alex Bores, a former tech executive himself, is one of them.
Claude Code rolls out a voice mode capability
Anthropic is stepping up its game in the AI coding space with the rollout of Voice Mode in Claude Code.
X says it will suspend creators from revenue-sharing program for unlabeled AI posts of ‘armed conflict’
Creators who break the rules will get a three-month suspension, and if they continue to violate the policy, they'll be permanently banned.
France or Spain or Germany or France: A Neural Account of Non-Redundant Redundant Disjunctions
arXiv:2602.23547v1 Announce Type: new Abstract: Sentences like "She will go to France or Spain, or perhaps to Germany or France." appear formally redundant, yet become acceptable in contexts such as "Mary will go to a philosophy program in France or...
Structured Prompt Optimization for Few-Shot Text Classification via Semantic Alignment in Latent Space
arXiv:2602.23753v1 Announce Type: new Abstract: This study addresses the issues of semantic entanglement, unclear label structure, and insufficient feature representation in few-shot text classification, and proposes an optimization framework based on structured prompts to enhance semantic understanding and task adaptation...
GLUScope: A Tool for Analyzing GLU Neurons in Transformer Language Models
arXiv:2602.23826v1 Announce Type: new Abstract: We present GLUScope, an open-source tool for analyzing neurons in Transformer-based language models, intended for interpretability researchers. We focus on more recent models than previous tools do; specifically we consider gated activation functions such as...
Benchmarking BERT-based Models for Sentence-level Topic Classification in Nepali Language
arXiv:2602.23940v1 Announce Type: new Abstract: Transformer-based models such as BERT have significantly advanced Natural Language Processing (NLP) across many languages. However, Nepali, a low-resource language written in Devanagari script, remains relatively underexplored. This study benchmarks multilingual, Indic, Hindi, and Nepali...
EDDA-Coordinata: An Annotated Dataset of Historical Geographic Coordinates
arXiv:2602.23941v1 Announce Type: new Abstract: This paper introduces a dataset of enriched geographic coordinates retrieved from Diderot and d'Alembert's eighteenth-century Encyclopedie. Automatically recovering geographic coordinates from historical texts is a complex task, as they are expressed in a variety of...
MemEmo: Evaluating Emotion in Memory Systems of Agents
arXiv:2602.23944v1 Announce Type: new Abstract: Memory systems address the challenge of context loss in Large Language Model during prolonged interactions. However, compared to human cognition, the efficacy of these systems in processing emotion-related information remains inconclusive. To address this gap,...
The GRADIEND Python Package: An End-to-End System for Gradient-Based Feature Learning
arXiv:2602.23993v1 Announce Type: new Abstract: We present gradiend, an open-source Python package that operationalizes the GRADIEND method for learning feature directions from factual-counterfactual MLM and CLM gradients in language models. The package provides a unified workflow for feature-related data creation,...
Task-Centric Acceleration of Small-Language Models
arXiv:2602.24174v1 Announce Type: new Abstract: Small language models (SLMs) have emerged as efficient alternatives to large language models for task-specific applications. However, they are often employed in high-volume, low-latency settings, where efficiency is crucial. We propose TASC, Task-Adaptive Sequence Compression,...
Controllable Reasoning Models Are Private Thinkers
arXiv:2602.24210v1 Announce Type: new Abstract: AI agents powered by reasoning models require access to sensitive user data. However, their reasoning traces are difficult to control, which can result in the unintended leakage of private information to external parties. We propose...
NAU-QMUL: Utilizing BERT and CLIP for Multi-modal AI-Generated Image Detection
arXiv:2602.23863v1 Announce Type: cross Abstract: With the aim of detecting AI-generated images and identifying the specific models responsible for their generation, we propose a multi-modal multi-task model. The model leverages pre-trained BERT and CLIP Vision encoders for text and image...
Global Interpretability via Automated Preprocessing: A Framework Inspired by Psychiatric Questionnaires
arXiv:2602.23459v1 Announce Type: new Abstract: Psychiatric questionnaires are highly context sensitive and often only weakly predict subsequent symptom severity, which makes the prognostic relationship difficult to learn. Although flexible nonlinear models can improve predictive accuracy, their limited interpretability can erode...
FedDAG: Clustered Federated Learning via Global Data and Gradient Integration for Heterogeneous Environments
arXiv:2602.23504v1 Announce Type: new Abstract: Federated Learning (FL) enables a group of clients to collaboratively train a model without sharing individual data, but its performance drops when client data are heterogeneous. Clustered FL tackles this by grouping similar clients. However,...
Neural Operators Can Discover Functional Clusters
arXiv:2602.23528v1 Announce Type: new Abstract: Operator learning is reshaping scientific computing by amortizing inference across infinite families of problems. While neural operators (NOs) are increasingly well understood for regression, far less is known for classification and its unsupervised analogue: clustering....
SDMixer: Sparse Dual-Mixer for Time Series Forecasting
arXiv:2602.23581v1 Announce Type: new Abstract: Multivariate time series forecasting is widely applied in fields such as transportation, energy, and finance. However, the data commonly suffers from issues of multi-scale characteristics, weak correlations, and noise interference, which limit the predictive performance...
FedRot-LoRA: Mitigating Rotational Misalignment in Federated LoRA
arXiv:2602.23638v1 Announce Type: new Abstract: Federated LoRA provides a communication-efficient mechanism for fine-tuning large language models on decentralized data. In practice, however, a discrepancy between the factor-wise averaging used to preserve low rank and the mathematically correct aggregation of local...
Selective Denoising Diffusion Model for Time Series Anomaly Detection
arXiv:2602.23662v1 Announce Type: new Abstract: Time series anomaly detection (TSAD) has been an important area of research for decades, with reconstruction-based methods, mostly based on generative models, gaining popularity and demonstrating success. Diffusion models have recently attracted attention due to...
Disentangled Mode-Specific Representations for Tensor Time Series via Contrastive Learning
arXiv:2602.23663v1 Announce Type: new Abstract: Multi-mode tensor time series (TTS) can be found in many domains, such as search engines and environmental monitoring systems. Learning representations of a TTS benefits various applications, but it is also challenging since the complexities...
Optimizer-Induced Low-Dimensional Drift and Transverse Dynamics in Transformer Training
arXiv:2602.23696v1 Announce Type: new Abstract: We study the geometry of training trajectories in small transformer models and find that parameter updates organize into a dominant drift direction with transverse residual dynamics. Using uncentered, row-normalized trajectory PCA, we show that a...
Bridging Dynamics Gaps via Diffusion Schr\"odinger Bridge for Cross-Domain Reinforcement Learning
arXiv:2602.23737v1 Announce Type: new Abstract: Cross-domain reinforcement learning (RL) aims to learn transferable policies under dynamics shifts between source and target domains. A key challenge lies in the lack of target-domain environment interaction and reward supervision, which prevents direct policy...
TradeFM: A Generative Foundation Model for Trade-flow and Market Microstructure
arXiv:2602.23784v1 Announce Type: new Abstract: Foundation models have transformed domains from language to genomics by learning general-purpose representations from large-scale, heterogeneous data. We introduce TradeFM, a 524M-parameter generative Transformer that brings this paradigm to market microstructure, learning directly from billions...
Provable Subspace Identification of Nonlinear Multi-view CCA
arXiv:2602.23785v1 Announce Type: new Abstract: We investigate the identifiability of nonlinear Canonical Correlation Analysis (CCA) in a multi-view setup, where each view is generated by an unknown nonlinear map applied to a linear mixture of shared latents and view-private noise....