CLAA: Cross-Layer Attention Aggregation for Accelerating LLM Prefill
arXiv:2602.16054v1 Announce Type: new Abstract: The prefill stage in long-context LLM inference remains a computational bottleneck. Recent token-ranking heuristics accelerate inference by selectively processing a subset of semantically relevant tokens. However, existing methods suffer from unstable token importance estimation, often...
Investigating GNN Convergence on Large Randomly Generated Graphs with Realistic Node Feature Correlations
arXiv:2602.16145v1 Announce Type: new Abstract: There are a number of existing studies analysing the convergence behaviour of graph neural networks on large random graphs. Unfortunately, the majority of these studies do not model correlations between node features, which would naturally...
ASPEN: Spectral-Temporal Fusion for Cross-Subject Brain Decoding
arXiv:2602.16147v1 Announce Type: new Abstract: Cross-subject generalization in EEG-based brain-computer interfaces (BCIs) remains challenging due to individual variability in neural signals. We investigate whether spectral representations offer more stable features for cross-subject transfer than temporal waveforms. Through correlation analyses across...
Rethinking Input Domains in Physics-Informed Neural Networks via Geometric Compactification Mappings
arXiv:2602.16193v1 Announce Type: new Abstract: Several complex physical systems are governed by multi-scale partial differential equations (PDEs) that exhibit both smooth low-frequency components and localized high-frequency structures. Existing physics-informed neural network (PINN) methods typically train with fixed coordinate system inputs,...
Prescriptive Scaling Reveals the Evolution of Language Model Capabilities
arXiv:2602.15327v1 Announce Type: cross Abstract: For deploying foundation models, practitioners increasingly need prescriptive scaling laws: given a pre training compute budget, what downstream accuracy is attainable with contemporary post training practice, and how stable is that mapping as the field...
Size Transferability of Graph Transformers with Convolutional Positional Encodings
arXiv:2602.15239v1 Announce Type: new Abstract: Transformers have achieved remarkable success across domains, motivating the rise of Graph Transformers (GTs) as attention-based architectures for graph-structured data. A key design choice in GTs is the use of Graph Neural Network (GNN)-based positional...
Scaling Laws for Masked-Reconstruction Transformers on Single-Cell Transcriptomics
arXiv:2602.15253v1 Announce Type: new Abstract: Neural scaling laws -- power-law relationships between loss, model size, and data -- have been extensively documented for language and vision transformers, yet their existence in single-cell genomics remains largely unexplored. We present the first...
ExLipBaB: Exact Lipschitz Constant Computation for Piecewise Linear Neural Networks
arXiv:2602.15499v1 Announce Type: new Abstract: It has been shown that a neural network's Lipschitz constant can be leveraged to derive robustness guarantees, to improve generalizability via regularization or even to construct invertible networks. Therefore, a number of methods varying in...
On the Geometric Coherence of Global Aggregation in Federated GNN
arXiv:2602.15510v1 Announce Type: new Abstract: Federated Learning (FL) enables distributed training across multiple clients without centralized data sharing, while Graph Neural Networks (GNNs) model relational data through message passing. In federated GNN settings, client graphs often exhibit heterogeneous structural and...
A unified theory of feature learning in RNNs and DNNs
arXiv:2602.15593v1 Announce Type: new Abstract: Recurrent and deep neural networks (RNNs/DNNs) are cornerstone architectures in machine learning. Remarkably, RNNs differ from DNNs only by weight sharing, as can be shown through unrolling in time. How does this structural similarity fit...
Out-of-Support Generalisation via Weight Space Sequence Modelling
arXiv:2602.13550v1 Announce Type: new Abstract: As breakthroughs in deep learning transform key industries, models are increasingly required to extrapolate on datapoints found outside the range of the training set, a challenge we coin as out-of-support (OoS) generalisation. However, neural networks...
On the Sparsifiability of Correlation Clustering: Approximation Guarantees under Edge Sampling
arXiv:2602.13684v1 Announce Type: new Abstract: Correlation Clustering (CC) is a fundamental unsupervised learning primitive whose strongest LP-based approximation guarantees require $\Theta(n^3)$ triangle inequality constraints and are prohibitive at scale. We initiate the study of \emph{sparsification--approximation trade-offs} for CC, asking how...