From Scarcity to Scale: A Release-Level Analysis of the Pashto Common Voice Dataset
arXiv:2602.14062v1 Announce Type: new Abstract: Large, openly licensed speech datasets are essential for building automatic speech recognition (ASR) systems, yet many widely spoken languages remain underrepresented in public resources. Pashto, spoken by more than 60 million people, has historically lacked...
Directional Concentration Uncertainty: A representational approach to uncertainty quantification for generative models
arXiv:2602.13264v1 Announce Type: new Abstract: In the critical task of making generative models trustworthy and robust, methods for Uncertainty Quantification (UQ) have begun to show encouraging potential. However, many of these methods rely on rigid heuristics that fail to generalize...
Federated Learning of Nonlinear Temporal Dynamics with Graph Attention-based Cross-Client Interpretability
arXiv:2602.13485v1 Announce Type: new Abstract: Networks of modern industrial systems are increasingly monitored by distributed sensors, where each system comprises multiple subsystems generating high dimensional time series data. These subsystems are often interdependent, making it important to understand how temporal...
Optimization-Free Graph Embedding via Distributional Kernel for Community Detection
arXiv:2602.13634v1 Announce Type: new Abstract: Neighborhood Aggregation Strategy (NAS) is a widely used approach in graph embedding, underpinning both Graph Neural Networks (GNNs) and Weisfeiler-Lehman (WL) methods. However, NAS-based methods are identified to be prone to over-smoothing-the loss of node...
GREPO: A Benchmark for Graph Neural Networks on Repository-Level Bug Localization
arXiv:2602.13921v1 Announce Type: new Abstract: Repository-level bug localization-the task of identifying where code must be modified to fix a bug-is a critical software engineering challenge. Standard Large Language Modles (LLMs) are often unsuitable for this task due to context window...