Unmasking Biases and Reliability Concerns in Convolutional Neural Networks Analysis of Cancer Pathology Images
arXiv:2603.12445v1 Announce Type: cross Abstract: Convolutional Neural Networks have shown promising effectiveness in identifying different types of cancer from radiographs. However, the opaque nature of CNNs makes it difficult to fully understand the way they operate, limiting their assessment to...
Shattering the Shortcut: A Topology-Regularized Benchmark for Multi-hop Medical Reasoning in LLMs
arXiv:2603.12458v1 Announce Type: cross Abstract: While Large Language Models (LLMs) achieve expert-level performance on standard medical benchmarks through single-hop factual recall, they severely struggle with the complex, multi-hop diagnostic reasoning required in real-world clinical settings. A primary obstacle is "shortcut...
One-Step Flow Policy: Self-Distillation for Fast Visuomotor Policies
arXiv:2603.12480v1 Announce Type: cross Abstract: Generative flow and diffusion models provide the continuous, multimodal action distributions needed for high-precision robotic policies. However, their reliance on iterative sampling introduces severe inference latency, degrading control frequency and harming performance in time-sensitive manipulation....
TRACE: Temporal Rule-Anchored Chain-of-Evidence on Knowledge Graphs for Interpretable Stock Movement Prediction
arXiv:2603.12500v1 Announce Type: cross Abstract: We present a Temporal Rule-Anchored Chain-of-Evidence (TRACE) on knowledge graphs for interpretable stock movement prediction that unifies symbolic relational priors, dynamic graph exploration, and LLM-guided decision making in a single end-to-end pipeline. The approach performs...
Na\"ive PAINE: Lightweight Text-to-Image Generation Improvement with Prompt Evaluation
arXiv:2603.12506v1 Announce Type: cross Abstract: Text-to-Image (T2I) generation is primarily driven by Diffusion Models (DM) which rely on random Gaussian noise. Thus, like playing the slots at a casino, a DM will produce different results given the same user-defined inputs....
ELLA: Generative AI-Powered Social Robots for Early Language Development at Home
arXiv:2603.12508v1 Announce Type: cross Abstract: Early language development shapes children's later literacy and learning, yet many families have limited access to scalable, high-quality support at home. Recent advances in generative AI make it possible for social robots to move beyond...
Red-Teaming Vision-Language-Action Models via Quality Diversity Prompt Generation for Robust Robot Policies
arXiv:2603.12510v1 Announce Type: cross Abstract: Vision-Language-Action (VLA) models have significant potential to enable general-purpose robotic systems for a range of vision-language tasks. However, the performance of VLA-based robots is highly sensitive to the precise wording of language instructions, and it...
LLM BiasScope: A Real-Time Bias Analysis Platform for Comparative LLM Evaluation
arXiv:2603.12522v1 Announce Type: cross Abstract: As large language models (LLMs) are deployed widely, detecting and understanding bias in their outputs is critical. We present LLM BiasScope, a web application for side-by-side comparison of LLM outputs with real-time bias analysis. The...
ActTail: Global Activation Sparsity in Large Language Models
arXiv:2603.12272v1 Announce Type: new Abstract: Activation sparsity is a promising approach for accelerating large language model (LLM) inference by reducing computation and memory movement. However, existing activation sparsity methods typically apply uniform sparsity across projections, ignoring the heterogeneous statistical properties...
LLM-Augmented Therapy Normalization and Aspect-Based Sentiment Analysis for Treatment-Resistant Depression on Reddit
arXiv:2603.12343v1 Announce Type: new Abstract: Treatment-resistant depression (TRD) is a severe form of major depressive disorder in which patients do not achieve remission despite multiple adequate treatment trials. Evidence across pharmacologic options for TRD remains limited, and trials often do...
TASTE-Streaming: Towards Streamable Text-Aligned Speech Tokenization and Embedding for Spoken Language Modeling
arXiv:2603.12350v1 Announce Type: new Abstract: Text-speech joint spoken language modeling (SLM) aims at natural and intelligent speech-based interactions, but developing such a system may suffer from modality mismatch: speech unit sequences are much longer than text tokens. Prior work reduces...
Interpreting Negation in GPT-2: Layer- and Head-Level Causal Analysis
arXiv:2603.12423v1 Announce Type: new Abstract: Negation remains a persistent challenge for modern language models, often causing reversed meanings or factual errors. In this work, we conduct a causal analysis of how GPT-2 Small internally processes such linguistic transformations. We examine...
Marked Pedagogies: Examining Linguistic Biases in Personalized Automated Writing Feedback
arXiv:2603.12471v1 Announce Type: new Abstract: Effective personalized feedback is critical to students' literacy development. Though LLM-powered tools now promise to automate such feedback at scale, LLMs are not language-neutral: they privilege standard academic English and reproduce social stereotypes, raising concerns...
LMEB: Long-horizon Memory Embedding Benchmark
arXiv:2603.12572v1 Announce Type: new Abstract: Memory embeddings are crucial for memory-augmented systems, such as OpenClaw, but their evaluation is underexplored in current text embedding benchmarks, which narrowly focus on traditional passage retrieval and fail to assess models' ability to handle...
RTD-Guard: A Black-Box Textual Adversarial Detection Framework via Replacement Token Detection
arXiv:2603.12582v1 Announce Type: new Abstract: Textual adversarial attacks pose a serious security threat to Natural Language Processing (NLP) systems by introducing imperceptible perturbations that mislead deep learning models. While adversarial example detection offers a lightweight alternative to robust training, existing...
Using a Human-AI Teaming Approach to Create and Curate Scientific Datasets with the SCILIRE System
arXiv:2603.12638v1 Announce Type: new Abstract: The rapid growth of scientific literature has made manual extraction of structured knowledge increasingly impractical. To address this challenge, we introduce SCILIRE, a system for creating datasets from scientific literature. SCILIRE has been designed around...
98$\times$ Faster LLM Routing Without a Dedicated GPU: Flash Attention, Prompt Compression, and Near-Streaming for the vLLM Semantic Router
arXiv:2603.12646v1 Announce Type: new Abstract: System-level routers that intercept LLM requests for safety classification, domain routing, and PII detection must be both fast and operationally lightweight: they should add minimal latency to every request, yet not require a dedicated GPU...
Continual Learning in Large Language Models: Methods, Challenges, and Opportunities
arXiv:2603.12658v1 Announce Type: new Abstract: Continual learning (CL) has emerged as a pivotal paradigm to enable large language models (LLMs) to dynamically adapt to evolving knowledge and sequential tasks while mitigating catastrophic forgetting-a critical limitation of the static pre-training paradigm...
SteerRM: Debiasing Reward Models via Sparse Autoencoders
arXiv:2603.12795v1 Announce Type: new Abstract: Reward models (RMs) are critical components of alignment pipelines, yet they exhibit biases toward superficial stylistic cues, preferring better-presented responses over semantically superior ones. Existing debiasing methods typically require retraining or architectural modifications, while direct...
Rethinking Multiple-Choice Questions for RLVR: Unlocking Potential via Distractor Design
arXiv:2603.12826v1 Announce Type: new Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) significantly enhances the reasoning capabilities of Large Language Models. When applied to RLVR, Multiple-Choice Questions (MCQs) offer a scalable source of verifiable data but risk inducing reward hacking, where...
Learning from Child-Directed Speech in Two-Language Scenarios: A French-English Case Study
arXiv:2603.12906v1 Announce Type: new Abstract: Research on developmentally plausible language models has largely focused on English, leaving open questions about multilingual settings. We present a systematic study of compact language models by extending BabyBERTa to English-French scenarios under strictly size-matched...
HMS-BERT: Hybrid Multi-Task Self-Training for Multilingual and Multi-Label Cyberbullying Detection
arXiv:2603.12920v1 Announce Type: new Abstract: Cyberbullying on social media is inherently multilingual and multi-faceted, where abusive behaviors often overlap across multiple categories. Existing methods are commonly limited by monolingual assumptions or single-task formulations, which restrict their effectiveness in realistic multilingual...
Interpretable Semantic Gradients in SSD: A PCA Sweep Approach and a Case Study on AI Discourse
arXiv:2603.13038v1 Announce Type: new Abstract: Supervised Semantic Differential (SSD) is a mixed quantitative-interpretive method that models how text meaning varies with continuous individual-difference variables by estimating a semantic gradient in an embedding space and interpreting its poles through clustering and...
No More DeLuLu: Physics-Inspired Kernel Networks for Geometrically-Grounded Neural Computation
arXiv:2603.12276v1 Announce Type: new Abstract: We introduce the yat-product, a kernel operator combining quadratic alignment with inverse-square proximity. We prove it is a Mercer kernel, analytic, Lipschitz on bounded domains, and self-regularizing, admitting a unique RKHS embedding. Neural Matter Networks...
Overcoming the Modality Gap in Context-Aided Forecasting
arXiv:2603.12451v1 Announce Type: new Abstract: Context-aided forecasting (CAF) holds promise for integrating domain knowledge and forward-looking information, enabling AI systems to surpass traditional statistical methods. However, recent empirical studies reveal a puzzling gap: multimodal models often fail to outperform their...
Modal Logical Neural Networks for Financial AI
arXiv:2603.12487v1 Announce Type: new Abstract: The financial industry faces a critical dichotomy in AI adoption: deep learning often delivers strong empirical performance, while symbolic logic offers interpretability and rule adherence expected in regulated settings. We use Modal Logical Neural Networks...
Curriculum Sampling: A Two-Phase Curriculum for Efficient Training of Flow Matching
arXiv:2603.12517v1 Announce Type: new Abstract: Timestep sampling $p(t)$ is a central design choice in Flow Matching models, yet common practice increasingly favors static middle-biased distributions (e.g., Logit-Normal). We show that this choice induces a speed--quality trade-off: middle-biased sampling accelerates early...
A Reduction Algorithm for Markovian Contextual Linear Bandits
arXiv:2603.12530v1 Announce Type: new Abstract: Recent work shows that when contexts are drawn i.i.d., linear contextual bandits can be reduced to single-context linear bandits. This ``contexts are cheap" perspective is highly advantageous, as it allows for sharper finite-time analyses and...
CALF: Communication-Aware Learning Framework for Distributed Reinforcement Learning
arXiv:2603.12543v1 Announce Type: new Abstract: Distributed reinforcement learning policies face network delays, jitter, and packet loss when deployed across edge devices and cloud servers. Standard RL training assumes zero-latency interaction, causing severe performance degradation under realistic network conditions. We introduce...
Asymptotic and Finite-Time Guarantees for Langevin-Based Temperature Annealing in InfoNCE
arXiv:2603.12552v1 Announce Type: new Abstract: The InfoNCE loss in contrastive learning depends critically on a temperature parameter, yet its dynamics under fixed versus annealed schedules remain poorly understood. We provide a theoretical analysis by modeling embedding evolution under Langevin dynamics...