SSKG Hub: An Expert-Guided Platform for LLM-Empowered Sustainability Standards Knowledge Graphs
arXiv:2603.00669v1 Announce Type: new Abstract: Sustainability disclosure standards (e.g., GRI, SASB, TCFD, IFRS S2) are comprehensive yet lengthy, terminology-dense, and highly cross-referential, hindering structured analysis and downstream use. We present SSKG Hub (Sustainability Standards Knowledge Graph Hub), a research prototype...
Polynomial Mixing for Efficient Self-supervised Speech Encoders
arXiv:2603.00683v1 Announce Type: new Abstract: State-of-the-art speech-to-text models typically employ Transformer-based encoders that model token dependencies via self-attention mechanisms. However, the quadratic complexity of self-attention in both memory and computation imposes significant constraints on scalability. In this work, we propose...
RAVEL: Reasoning Agents for Validating and Evaluating LLM Text Synthesis
arXiv:2603.00686v1 Announce Type: new Abstract: Large Language Models have evolved from single-round generators into long-horizon agents, capable of complex text synthesis scenarios. However, current evaluation frameworks lack the ability to assess the actual synthesis operations, such as outlining, drafting, and...
RLAR: An Agentic Reward System for Multi-task Reinforcement Learning on Large Language Models
arXiv:2603.00724v1 Announce Type: new Abstract: Large language model alignment via reinforcement learning depends critically on reward function quality. However, static, domain-specific reward models are often costly to train and exhibit poor generalization in out-of-distribution scenarios encountered during RL iterations. We...
Constitutional Black-Box Monitoring for Scheming in LLM Agents
arXiv:2603.00829v1 Announce Type: new Abstract: Safe deployment of Large Language Model (LLM) agents in autonomous settings requires reliable oversight mechanisms. A central challenge is detecting scheming, where agents covertly pursue misaligned goals. One approach to mitigating such risks is LLM-based...
Learning Nested Named Entity Recognition from Flat Annotations
arXiv:2603.00840v1 Announce Type: new Abstract: Nested named entity recognition identifies entities contained within other entities, but requires expensive multi-level annotation. While flat NER corpora exist abundantly, nested resources remain scarce. We investigate whether models can learn nested structure from flat...
KVSlimmer: Theoretical Insights and Practical Optimizations for Asymmetric KV Merging
arXiv:2603.00907v1 Announce Type: new Abstract: The growing computational and memory demands of the Key-Value (KV) cache significantly limit the ability of Large Language Models (LLMs). While KV merging has emerged as a promising solution, existing methods that rely on empirical...
Thoth: Mid-Training Bridges LLMs to Time Series Understanding
arXiv:2603.01042v1 Announce Type: new Abstract: Large Language Models (LLMs) have demonstrated remarkable success in general-purpose reasoning. However, they still struggle to understand and reason about time series data, which limits their effectiveness in decision-making scenarios that depend on temporal dynamics....
How RL Unlocks the Aha Moment in Geometric Interleaved Reasoning
arXiv:2603.01070v1 Announce Type: new Abstract: Solving complex geometric problems inherently requires interleaved reasoning: a tight alternation between constructing diagrams and performing logical deductions. Although recent Multimodal Large Language Models (MLLMs) have demonstrated strong capabilities in visual generation and plotting, we...
StaTS: Spectral Trajectory Schedule Learning for Adaptive Time Series Forecasting with Frequency Guided Denoiser
arXiv:2603.00037v1 Announce Type: new Abstract: Diffusion models have been used for probabilistic time series forecasting and show strong potential. However, fixed noise schedules often produce intermediate states that are hard to invert and a terminal state that deviates from the...
CARE: Confounder-Aware Aggregation for Reliable LLM Evaluation
arXiv:2603.00039v1 Announce Type: new Abstract: LLM-as-a-judge ensembles are the standard paradigm for scalable evaluation, but their aggregation mechanisms suffer from a fundamental flaw: they implicitly assume that judges provide independent estimates of true quality. However, in practice, LLM judges exhibit...
Econometric vs. Causal Structure-Learning for Time-Series Policy Decisions: Evidence from the UK COVID-19 Policies
arXiv:2603.00041v1 Announce Type: new Abstract: Causal machine learning (ML) recovers graphical structures that inform us about potential cause-and-effect relationships. Most progress has focused on cross-sectional data with no explicit time order, whereas recovering causal structures from time series data remains...
Maximizing the Spectral Energy Gain in Sub-1-Bit LLMs via Latent Geometry Alignment
arXiv:2603.00042v1 Announce Type: new Abstract: We identify the Spectral Energy Gain in extreme model compression, where low-rank binary approximations outperform tiny-rank floating-point baselines for heavy-tailed spectra. However, prior attempts fail to realize this potential, trailing state-of-the-art 1-bit methods. We attribute...
REMIND: Rethinking Medical High-Modality Learning under Missingness--A Long-Tailed Distribution Perspective
arXiv:2603.00046v1 Announce Type: new Abstract: Medical multi-modal learning is critical for integrating information from a large set of diverse modalities. However, when leveraging a high number of modalities in real clinical applications, it is often impractical to obtain full-modality observations...
BiJEPA: Bi-directional Joint Embedding Predictive Architecture for Symmetric Representation Learning
arXiv:2603.00049v1 Announce Type: new Abstract: Self-Supervised Learning (SSL) has shifted from pixel-level reconstruction to latent space prediction, spearheaded by the Joint Embedding Predictive Architecture (JEPA). While effective, standard JEPA models typically rely on a uni-directional prediction mechanism (e.g. Context $\to$...
Expert Divergence Learning for MoE-based Language Models
arXiv:2603.00054v1 Announce Type: new Abstract: The Mixture-of-Experts (MoE) architecture is a powerful technique for scaling language models, yet it often suffers from expert homogenization, where experts learn redundant functionalities, thereby limiting MoE's full potential. To address this, we introduce Expert...
M3-AD: Reflection-aware Multi-modal, Multi-category, and Multi-dimensional Benchmark and Framework for Industrial Anomaly Detection
arXiv:2603.00055v1 Announce Type: new Abstract: Although multimodal large language models (MLLMs) have advanced industrial anomaly detection toward a zero-shot paradigm, they still tend to produce high-confidence yet unreliable decisions in fine-grained and structurally complex industrial scenarios, and lack effective self-corrective...
A Representation-Consistent Gated Recurrent Framework for Robust Medical Time-Series Classification
arXiv:2603.00067v1 Announce Type: new Abstract: Medical time-series data are characterized by irregular sampling, high noise levels, missing values, and strong inter-feature dependencies. Recurrent neural networks (RNNs), particularly gated architectures such as Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRU),...
SEval-NAS: A Search-Agnostic Evaluation for Neural Architecture Search
arXiv:2603.00099v1 Announce Type: new Abstract: Neural architecture search (NAS) automates the discovery of neural networks that meet specified criteria, yet its evaluation procedures are often hardcoded, limiting the ability to introduce new metrics. This issue is especially pronounced in hardware-aware...
Bridging Policy and Real-World Dynamics: LLM-Augmented Rebalancing for Shared Micromobility Systems
arXiv:2603.00176v1 Announce Type: new Abstract: Shared micromobility services such as e-scooters and bikes have become an integral part of urban transportation, yet their efficiency critically depends on effective vehicle rebalancing. Existing methods either optimize for average demand patterns or employ...
OSF: On Pre-training and Scaling of Sleep Foundation Models
arXiv:2603.00190v1 Announce Type: new Abstract: Polysomnography (PSG) provides the gold standard for sleep assessment but suffers from substantial heterogeneity across recording devices and cohorts. There have been growing efforts to build general-purpose foundation models (FMs) for sleep physiology, but lack...
Diagnostics for Individual-Level Prediction Instability in Machine Learning for Healthcare
arXiv:2603.00192v1 Announce Type: new Abstract: In healthcare, predictive models increasingly inform patient-level decisions, yet little attention is paid to the variability in individual risk estimates and its impact on treatment decisions. For overparameterized models, now standard in machine learning, a...
Scalable Gaussian process modeling of parametrized spatio-temporal fields
arXiv:2603.00290v1 Announce Type: new Abstract: We introduce a scalable Gaussian process (GP) framework with deep product kernels for data-driven learning of parametrized spatio-temporal fields over fixed or parameter-dependent domains. The proposed framework learns a continuous representation, enabling predictions at arbitrary...
Improving Full Waveform Inversion in Large Model Era
arXiv:2603.00377v1 Announce Type: new Abstract: Full Waveform Inversion (FWI) is a highly nonlinear and ill-posed problem that aims to recover subsurface velocity maps from surface-recorded seismic waveforms data. Existing data-driven FWI typically uses small models, as available datasets have limited...
SCOTUStoday for Tuesday, March 3
As we’ve noted before, we read a lot of legal news in the process of preparing this newsletter. Here’s a headline we saw recently that we won’t soon forget: References […]The postSCOTUStoday for Tuesday, March 3appeared first onSCOTUSblog.
Episode 41: Reading Recommendations - EJIL: The Podcast!
Episode 41: Thinking through Rupture in International Economic Law: Views from Latin America - EJIL: The Podcast!
Alibaba’s Qwen tech lead steps down after major AI push
Reactions rippled through Alibaba's Qwen team after tech lead Junyang Lin stepped down following a major model launch.
AI companies are spending millions to thwart this former tech exec’s congressional bid
A tech billionaire-backed super PAC is spending $125 million to undercut candidates pushing for AI regulation. New York's Alex Bores, a former tech executive himself, is one of them.
France or Spain or Germany or France: A Neural Account of Non-Redundant Redundant Disjunctions
arXiv:2602.23547v1 Announce Type: new Abstract: Sentences like "She will go to France or Spain, or perhaps to Germany or France." appear formally redundant, yet become acceptable in contexts such as "Mary will go to a philosophy program in France or...