Task-Conditioned Routing Signatures in Sparse Mixture-of-Experts Transformers
arXiv:2603.11114v1 Announce Type: new Abstract: Sparse Mixture-of-Experts (MoE) architectures enable efficient scaling of large language models through conditional computation, yet the routing mechanisms responsible for expert selection remain poorly understood. In this work, we introduce routing signatures, a vector representation...
Group Resonance Network: Learnable Prototypes and Multi-Subject Resonance for EEG Emotion Recognition
arXiv:2603.11119v1 Announce Type: new Abstract: Electroencephalography(EEG)-basedemotionrecognitionre- mains challenging in cross-subject settings due to severe inter-subject variability. Existing methods mainly learn subject-invariant features, but often under-exploit stimulus-locked group regularities shared across sub- jects. To address this issue, we propose the Group...
High-resolution weather-guided surrogate modeling for data-efficient cross-location building energy prediction
arXiv:2603.11121v1 Announce Type: new Abstract: Building design optimization often depends on physics-based simulation tools such as EnergyPlus, which, although accurate, are computationally expensive and slow. Surrogate models provide a faster alternative, yet most are location-specific, and even weather-informed variants require...
Higher-Order Modular Attention: Fusing Pairwise and Triadic Interactions for Protein Sequences
arXiv:2603.11133v1 Announce Type: new Abstract: Transformer self-attention computes pairwise token interactions, yet protein sequence to phenotype relationships often involve cooperative dependencies among three or more residues that dot product attention does not capture explicitly. We introduce Higher-Order Modular Attention, HOMA,...
Scaling Reasoning Efficiently via Relaxed On-Policy Distillation
arXiv:2603.11137v1 Announce Type: new Abstract: On-policy distillation is pivotal for transferring reasoning capabilities to capacity-constrained models, yet remains prone to instability and negative transfer. We show that on-policy distillation can be interpreted, both theoretically and empirically, as a form of...
Attention Gathers, MLPs Compose: A Causal Analysis of an Action-Outcome Circuit in VideoViT
arXiv:2603.11142v1 Announce Type: new Abstract: The paper explores how video models trained for classification tasks represent nuanced, hidden semantic information that may not affect the final outcome, a key challenge for Trustworthy AI models. Through Explainable and Interpretable AI methods,...
Systematic Scaling Analysis of Jailbreak Attacks in Large Language Models
arXiv:2603.11149v1 Announce Type: new Abstract: Large language models remain vulnerable to jailbreak attacks, yet we still lack a systematic understanding of how jailbreak success scales with attacker effort across methods, model families, and harm types. We initiate a scaling-law framework...
Huntington Disease Automatic Speech Recognition with Biomarker Supervision
arXiv:2603.11168v1 Announce Type: new Abstract: Automatic speech recognition (ASR) for pathological speech remains underexplored, especially for Huntington's disease (HD), where irregular timing, unstable phonation, and articulatory distortion challenge current models. We present a systematic HD-ASR study using a high-fidelity clinical...
Representation Finetuning for Continual Learning
arXiv:2603.11201v1 Announce Type: new Abstract: The world is inherently dynamic, and continual learning aims to enable models to adapt to ever-evolving data streams. While pre-trained models have shown powerful performance in continual learning, they still require finetuning to adapt effectively...
Reference-Guided Machine Unlearning
arXiv:2603.11210v1 Announce Type: new Abstract: Machine unlearning aims to remove the influence of specific data from trained models while preserving general utility. Existing approximate unlearning methods often rely on performance-degradation heuristics, such as loss maximization or random labeling. However, these...
Beyond the Class Subspace: Teacher-Guided Training for Reliable Out-of-Distribution Detection in Single-Domain Models
arXiv:2603.11269v1 Announce Type: new Abstract: Out-of-distribution (OOD) detection methods perform well on multi-domain benchmarks, yet many practical systems are trained on single-domain data. We show that this regime induces a geometric failure mode, Domain-Sensitivity Collapse (DSC): supervised training compresses features...
Single molecule localization microscopy challenge: a biologically inspired benchmark for long-sequence modeling
arXiv:2603.11296v1 Announce Type: new Abstract: State space models (SSMs) have recently achieved strong performance on long sequence modeling tasks while offering improved memory and computational efficiency compared to transformer based architectures. However, their evaluation has been largely limited to synthetic...
Client-Conditional Federated Learning via Local Training Data Statistics
arXiv:2603.11307v1 Announce Type: new Abstract: Federated learning (FL) under data heterogeneity remains challenging: existing methods either ignore client differences (FedAvg), require costly cluster discovery (IFCA), or maintain per-client models (Ditto). All degrade when data is sparse or heterogeneity is multi-dimensional....
Heavy-Tailed Principle Component Analysis
arXiv:2603.11308v1 Announce Type: new Abstract: Principal Component Analysis (PCA) is a cornerstone of dimensionality reduction, yet its classical formulation relies critically on second-order moments and is therefore fragile in the presence of heavy-tailed data and impulsive noise. While numerous robust...
On the Robustness of Langevin Dynamics to Score Function Error
arXiv:2603.11319v1 Announce Type: new Abstract: We consider the robustness of score-based generative modeling to errors in the estimate of the score function. In particular, we show that Langevin dynamics is not robust to the L^2 errors (more generally L^p errors)...
Ensuring Safety in Automated Mechanical Ventilation through Offline Reinforcement Learning and Digital Twin Verification
arXiv:2603.11372v1 Announce Type: new Abstract: Mechanical ventilation (MV) is a life-saving intervention for patients with acute respiratory failure (ARF) in the ICU. However, inappropriate ventilator settings could cause ventilator-induced lung injury (VILI). Also, clinicians workload is shown to be directly...
Harnessing Data Asymmetry: Manifold Learning in the Finsler World
arXiv:2603.11396v1 Announce Type: new Abstract: Manifold learning is a fundamental task at the core of data analysis and visualisation. It aims to capture the simple underlying structure of complex high-dimensional data by preserving pairwise dissimilarities in low-dimensional embeddings. Traditional methods...
A Stable Neural Statistical Dependence Estimator for Autoencoder Feature Analysis
arXiv:2603.11428v1 Announce Type: new Abstract: Statistical dependence measures like mutual information is ideal for analyzing autoencoders, but it can be ill-posed for deterministic, static, noise-free networks. We adopt the variational (Gaussian) formulation that makes dependence among inputs, latents, and reconstructions...
UniHetCO: A Unified Heterogeneous Representation for Multi-Problem Learning in Unsupervised Neural Combinatorial Optimization
arXiv:2603.11456v1 Announce Type: new Abstract: Unsupervised neural combinatorial optimization (NCO) offers an appealing alternative to supervised approaches by training learning-based solvers without ground-truth solutions, directly minimizing instance objectives and constraint violations. Yet for graph node subset-selection problems (e.g., Maximum Clique...
Bridging Discrete Marks and Continuous Dynamics: Dual-Path Cross-Interaction for Marked Temporal Point Processes
arXiv:2603.11462v1 Announce Type: new Abstract: Predicting irregularly spaced event sequences with discrete marks poses significant challenges due to the complex, asynchronous dependencies embedded within continuous-time data streams.Existing sequential approaches capture dependencies among event tokens but ignore the continuous evolution between...
Attention Sinks Are Provably Necessary in Softmax Transformers: Evidence from Trigger-Conditional Tasks
arXiv:2603.11487v1 Announce Type: new Abstract: Transformers often display an attention sink: probability mass concentrates on a fixed, content-agnostic position. We prove that computing a simple trigger-conditional behavior necessarily induces a sink in softmax self-attention models. Our results formalize a familiar...
When presidents attack the Supreme Court
During a roundtable at the White House on Friday, March 6, President Donald Trump returned to what has become a familiar refrain in the weeks since the Supreme Court struck […]The postWhen presidents attack the Supreme Courtappeared first onSCOTUSblog.
SCOTUStoday for Thursday, March 12
On this day in 1804, the House of Representatives voted to impeach Justice Samuel Chase, who had been accused of abusing his power by refusing to dismiss biased jurors and […]The postSCOTUStoday for Thursday, March 12appeared first onSCOTUSblog.
How to watch Jensen Huang’s Nvidia GTC 2026 keynote
GTC — which stands for GPU Technology Conference — is Nvidia's flagship annual event, where the chipmaker typically uses the spotlight to announce new products, champion partnerships, and lay out its vision for the future of computing. Huang's keynote will...
Sales automation startup Rox AI hits $1.2B valuation, sources say
Rox, founded in 2024 by the former chief growth officer of New Relic, offers an AI-native alternative to CRM tools.
Facebook Marketplace now lets Meta AI respond to buyers’ messages
When buyers inquire about an item’s availability, sellers can use Meta AI to automatically draft replies using information from their listing, such as the description, availability, pickup location, and price.
Tinder tries to lure people back to online dating with IRL events, virtual speed dating
Tinder just got a major revamp as it attempts to reengage its user base and attract younger daters. This includes in-person events, AI enhancements, and even virtual speed dating.
Atlassian follows Block’s footsteps and cuts staff in the name of AI
Atlassian laid off 10% of its workforce, around 1,600 people, as the company looks to funnel more funds to AI.
Bumble introduces an AI dating assistant, ‘Bee’
Bumble's new AI assistant Bee will move the dating app beyond the swipe by matching people based on compatibility and goals.
A writer is suing Grammarly for turning her and other authors into ‘AI editors’ without consent
Journalist Julia Angwin is leading a class action lawsuit against Grammarly for violating her privacy and publicity rights.