Enhancing Multilingual Embeddings via Multi-Way Parallel Text Alignment
arXiv:2602.21543v1 Announce Type: new Abstract: Multilingual pretraining typically lacks explicit alignment signals, leading to suboptimal cross-lingual alignment in the representation space. In this work, we show that training standard pretrained models for cross-lingual alignment with a multi-way parallel corpus in...
MixSarc: A Bangla-English Code-Mixed Corpus for Implicit Meaning Identification
arXiv:2602.21608v1 Announce Type: new Abstract: Bangla-English code-mixing is widespread across South Asian social media, yet resources for implicit meaning identification in this setting remain scarce. Existing sentiment and sarcasm models largely focus on monolingual English or high-resource languages and struggle...
Scalable Multilingual Multimodal Machine Translation with Speech-Text Fusion
arXiv:2602.21646v1 Announce Type: new Abstract: Multimodal Large Language Models (MLLMs) have achieved notable success in enhancing translation performance by integrating multimodal information. However, existing research primarily focuses on image-guided methods, whose applicability is constrained by the scarcity of multilingual image-text...
Sparsity Induction for Accurate Post-Training Pruning of Large Language Models
arXiv:2602.21652v1 Announce Type: new Abstract: Large language models have demonstrated capabilities in text generation, while their increasing parameter scales present challenges in computational and memory efficiency. Post-training sparsity (PTS), which reduces model cost by removing weights from dense networks, is...
Evaluating the relationship between regularity and learnability in recursive numeral systems using Reinforcement Learning
arXiv:2602.21720v1 Announce Type: new Abstract: Human recursive numeral systems (i.e., counting systems such as English base-10 numerals), like many other grammatical systems, are highly regular. Following prior work that relates cross-linguistic tendencies to biases in learning, we ask whether regular...
D-COT: Disciplined Chain-of-Thought Learning for Efficient Reasoning in Small Language Models
arXiv:2602.21786v1 Announce Type: new Abstract: Chain-of-Thought (CoT) distillation from Large Language Models (LLMs) often induces "overthinking" in Small Language Models (SLMs), leading to performance degradation and excessive token consumption. In this study, we propose Disciplined Chain-of-Thought (D-CoT), a novel framework...
FewMMBench: A Benchmark for Multimodal Few-Shot Learning
arXiv:2602.21854v1 Announce Type: new Abstract: As multimodal large language models (MLLMs) advance in handling interleaved image-text data, assessing their few-shot learning capabilities remains an open challenge. In this paper, we introduce FewMMBench, a comprehensive benchmark designed to evaluate MLLMs under...
ExpLang: Improved Exploration and Exploitation in LLM Reasoning with On-Policy Thinking Language Selection
arXiv:2602.21887v1 Announce Type: new Abstract: Current large reasoning models (LRMs) have shown strong ability on challenging tasks after reinforcement learning (RL) based post-training. However, previous work mainly focuses on English reasoning in expectation of the strongest performance, despite the demonstrated...
MERRY: Semantically Decoupled Evaluation of Multimodal Emotional and Role Consistencies of Role-Playing Agents
arXiv:2602.21941v1 Announce Type: new Abstract: Multimodal Role-Playing Agents (MRPAs) are attracting increasing attention due to their ability to deliver more immersive multimodal emotional interactions. However, existing studies still rely on pure textual benchmarks to evaluate the text responses of MRPAs,...
Large Language Models are Algorithmically Blind
arXiv:2602.21947v1 Announce Type: new Abstract: Large language models (LLMs) demonstrate remarkable breadth of knowledge, yet their ability to reason about computational processes remains poorly understood. Closing this gap matters for practitioners who rely on LLMs to guide algorithm selection and...
Neural network optimization strategies and the topography of the loss landscape
arXiv:2602.21276v1 Announce Type: new Abstract: Neural networks are trained by optimizing multi-dimensional sets of fitting parameters on non-convex loss landscapes. Low-loss regions of the landscapes correspond to the parameter sets that perform well on the training data. A key issue...
Robust AI Evaluation through Maximal Lotteries
arXiv:2602.21297v1 Announce Type: new Abstract: The standard way to evaluate language models on subjective tasks is through pairwise comparisons: an annotator chooses the "better" of two responses to a prompt. Leaderboards aggregate these comparisons into a single Bradley-Terry (BT) ranking,...
SymTorch: A Framework for Symbolic Distillation of Deep Neural Networks
arXiv:2602.21307v1 Announce Type: new Abstract: Symbolic distillation replaces neural networks, or components thereof, with interpretable, closed-form mathematical expressions. This approach has shown promise in discovering physical laws and mathematical relationships directly from trained deep learning models, yet adoption remains limited...
Interleaved Head Attention
arXiv:2602.21371v1 Announce Type: new Abstract: Multi-Head Attention (MHA) is the core computational primitive underlying modern Large Language Models (LLMs). However, MHA suffers from a fundamental linear scaling limitation: $H$ attention heads produce exactly $H$ independent attention matrices, with no communication...
Proximal-IMH: Proximal Posterior Proposals for Independent Metropolis-Hastings with Approximate Operators
arXiv:2602.21426v1 Announce Type: new Abstract: We consider the problem of sampling from a posterior distribution arising in Bayesian inverse problems in science, engineering, and imaging. Our method belongs to the family of independence Metropolis-Hastings (IMH) sampling algorithms, which are common...
MINAR: Mechanistic Interpretability for Neural Algorithmic Reasoning
arXiv:2602.21442v1 Announce Type: new Abstract: The recent field of neural algorithmic reasoning (NAR) studies the ability of graph neural networks (GNNs) to emulate classical algorithms like Bellman-Ford, a phenomenon known as algorithmic alignment. At the same time, recent advances in...
When Learning Hurts: Fixed-Pole RNN for Real-Time Online Training
arXiv:2602.21454v1 Announce Type: new Abstract: Recurrent neural networks (RNNs) can be interpreted as discrete-time state-space models, where the state evolution corresponds to an infinite-impulse-response (IIR) filtering operation governed by both feedforward weights and recurrent poles. While, in principle, all parameters...
Effects of Training Data Quality on Classifier Performance
arXiv:2602.21462v1 Announce Type: new Abstract: We describe extensive numerical experiments assessing and quantifying how classifier performance depends on the quality of the training data, a frequently neglected component of the analysis of classifiers. More specifically, in the scientific context of...
Asymptotically Fast Clebsch-Gordan Tensor Products with Vector Spherical Harmonics
arXiv:2602.21466v1 Announce Type: new Abstract: $E(3)$-equivariant neural networks have proven to be effective in a wide range of 3D modeling tasks. A fundamental operation of such networks is the tensor product, which allows interaction between different feature types. Because this...
GradAlign: Gradient-Aligned Data Selection for LLM Reinforcement Learning
arXiv:2602.21492v1 Announce Type: new Abstract: Reinforcement learning (RL) has become a central post-training paradigm for large language models (LLMs), but its performance is highly sensitive to the quality of training problems. This sensitivity stems from the non-stationarity of RL: rollouts...
WaterVIB: Learning Minimal Sufficient Watermark Representations via Variational Information Bottleneck
arXiv:2602.21508v1 Announce Type: new Abstract: Robust watermarking is critical for intellectual property protection, whereas existing methods face a severe vulnerability against regeneration-based AIGC attacks. We identify that existing methods fail because they entangle the watermark with high-frequency cover texture, which...
Muon+: Towards Better Muon via One Additional Normalization Step
arXiv:2602.21545v1 Announce Type: new Abstract: The Muon optimizer has demonstrated promising performance in pre-training large language models through gradient (or momentum) orthogonalization. In this work, we propose a simple yet effective enhancement to Muon, namely Muon+, which introduces an additional...
Training-free Composition of Pre-trained GFlowNets for Multi-Objective Generation
arXiv:2602.21565v1 Announce Type: new Abstract: Generative Flow Networks (GFlowNets) learn to sample diverse candidates in proportion to a reward function, making them well-suited for scientific discovery, where exploring multiple promising solutions is crucial. Further extending GFlowNets to multi-objective settings has...
ABM-UDE: Developing Surrogates for Epidemic Agent-Based Models via Scientific Machine Learning
arXiv:2602.21588v1 Announce Type: new Abstract: Agent-based epidemic models (ABMs) encode behavioral and policy heterogeneity but are too slow for nightly hospital planning. We develop county-ready surrogates that learn directly from exascale ABM trajectories using Universal Differential Equations (UDEs): mechanistic SEIR-family...
Deep Clustering based Boundary-Decoder Net for Inter and Intra Layer Stress Prediction of Heterogeneous Integrated IC Chip
arXiv:2602.21601v1 Announce Type: new Abstract: High stress occurs when 3D heterogeneous IC packages are subjected to thermal cycling at extreme temperatures. Stress mainly occurs at the interface between different materials. We investigate stress image using latent space representation which is...
How Does NLP Benefit Legal System: A Summary of Legal Artificial Intelligence
Legal Artificial Intelligence (LegalAI) focuses on applying the technology of artificial intelligence, especially natural language processing, to benefit tasks in the legal domain. In recent years, LegalAI has drawn increasing attention rapidly from both AI researchers and legal professionals, as...
How can the Supreme Court protect electoral integrity?
Justice, Democracy, and Law is a recurring series by Edward B. Foley that focuses on election law and the relationship of law and democracy. The court has already confronted cases […]The postHow can the Supreme Court protect electoral integrity?appeared first...
SCOTUStoday for Thursday, February 26
A new Economist/YouGov poll found that 57% of Americans strongly or somewhat approve of the tariffs ruling and 23% disapprove. For more on the survey, see the Morning Reads section […]The postSCOTUStoday for Thursday, February 26appeared first onSCOTUSblog.
Anthropic CEO stands firm as Pentagon deadline looms
Anthropic CEO Dario Amodei said Thursday that he "cannot in good conscience accede" to the Pentagon's demands to give the military unrestricted access to its AI systems.
Read AI launches an email-based ‘digital twin’ to help you with schedules and answers
Read AI is launching Ada, which can reply with your availability and extract answers from the company knowledge base and the web.