The Prediction-Measurement Gap: Toward Meaning Representations as Scientific Instruments
arXiv:2603.10130v1 Announce Type: new Abstract: Text embeddings have become central to computational social science and psychology, enabling scalable measurement of meaning and mixed-method inference. Yet most representation learning is optimized and evaluated for prediction and retrieval, yielding a prediction-measurement gap:...
ViDia2Std: A Parallel Corpus and Methods for Low-Resource Vietnamese Dialect-to-Standard Translation
arXiv:2603.10211v1 Announce Type: new Abstract: Vietnamese exhibits extensive dialectal variation, posing challenges for NLP systems trained predominantly on standard Vietnamese. Such systems often underperform on dialectal inputs, especially from underrepresented Central and Southern regions. Previous work on dialect normalization has...
Mitigating Translationese Bias in Multilingual LLM-as-a-Judge via Disentangled Information Bottleneck
arXiv:2603.10351v1 Announce Type: new Abstract: Large language models (LLMs) have become a standard for multilingual evaluation, yet they exhibit a severe systematic translationese bias. In this paper, translationese bias is characterized as LLMs systematically favoring machine-translated text over human-authored references,...
Gated Adaptation for Continual Learning in Human Activity Recognition
arXiv:2603.10046v1 Announce Type: new Abstract: Wearable sensors in Internet of Things (IoT) ecosystems increasingly support applications such as remote health monitoring, elderly care, and smart home automation, all of which rely on robust human activity recognition (HAR). Continual learning systems...
Training Language Models via Neural Cellular Automata
arXiv:2603.10055v1 Announce Type: new Abstract: Pre-training is crucial for large language models (LLMs), as it is when most representations and capabilities are acquired. However, natural language pre-training has problems: high-quality text is finite, it contains human biases, and it entangles...
A Survey of Weight Space Learning: Understanding, Representation, and Generation
arXiv:2603.10090v1 Announce Type: new Abstract: Neural network weights are typically viewed as the end product of training, while most deep learning research focuses on data, features, and architectures. However, recent advances show that the set of all possible weight values...
Hardware Efficient Approximate Convolution with Tunable Error Tolerance for CNNs
arXiv:2603.10100v1 Announce Type: new Abstract: Modern CNNs' high computational demands hinder edge deployment, as traditional ``hard'' sparsity (skipping mathematical zeros) loses effectiveness in deep layers or with smooth activations like Tanh. We propose a ``soft sparsity'' paradigm using a hardware...
Lost in the Middle at Birth: An Exact Theory of Transformer Position Bias
arXiv:2603.10123v1 Announce Type: new Abstract: The ``Lost in the Middle'' phenomenon -- a U-shaped performance curve where LLMs retrieve well from the beginning and end of a context but fail in the middle -- is widely attributed to learned Softmax...
SiMPO: Measure Matching for Online Diffusion Reinforcement Learning
arXiv:2603.10250v1 Announce Type: new Abstract: A commonly used family of RL algorithms for diffusion policies conducts softmax reweighting over the behavior policy, which usually induces an over-greedy policy and fails to leverage feedback from negative samples. In this work, we...
GSVD for Geometry-Grounded Dataset Comparison: An Alignment Angle Is All You Need
arXiv:2603.10283v1 Announce Type: new Abstract: Geometry-grounded learning asks models to respect structure in the problem domain rather than treating observations as arbitrary vectors. Motivated by this view, we revisit a classical but underused primitive for comparing datasets: linear relations between...
What crackdown? Trump's EPA enforcement claims don't pass sniff test.
75% of the criminal cases closed last fiscal year originated before Trump took office.
United States v. Johnson
Drug detection dogs are critical tools in the fight against drug trafficking. However, law enforcement canines are imperfect: They sometimes incorrectly alert when performing...The post<em>United States v. Johnson</em>appeared first onHarvard Law Review.
PrivPRISM: Automatically Detecting Discrepancies Between Google Play Data Safety Declarations and Developer Privacy Policies
arXiv:2603.09214v1 Announce Type: new Abstract: End-users seldom read verbose privacy policies, leading app stores like Google Play to mandate simplified data safety declarations as a user-friendly alternative. However, these self-declared disclosures often contradict the full privacy policies, deceiving users about...
Abundant Intelligence and Deficient Demand: A Macro-Financial Stress Test of Rapid AI Adoption
arXiv:2603.09209v1 Announce Type: new Abstract: We formalize a macro-financial stress test for rapid AI adoption. Rather than a productivity bust or existential risk, we identify a distribution-and-contract mismatch: AI-generated abundance coexists with demand deficiency because economic institutions are anchored to...
Robust Regularized Policy Iteration under Transition Uncertainty
arXiv:2603.09344v1 Announce Type: new Abstract: Offline reinforcement learning (RL) enables data-efficient and safe policy learning without online exploration, but its performance often degrades under distribution shift. The learned policy may visit out-of-distribution state-action pairs where value estimates and learned dynamics...
MEMO: Memory-Augmented Model Context Optimization for Robust Multi-Turn Multi-Agent LLM Games
arXiv:2603.09022v1 Announce Type: new Abstract: Multi-turn, multi-agent LLM game evaluations often exhibit substantial run-to-run variance. In long-horizon interactions, small early deviations compound across turns and are amplified by multi-agent coupling. This biases win rate estimates and makes rankings unreliable across...
AI Act Evaluation Benchmark: An Open, Transparent, and Reproducible Evaluation Dataset for NLP and RAG Systems
arXiv:2603.09435v1 Announce Type: new Abstract: The rapid rollout of AI in heterogeneous public and societal sectors has subsequently escalated the need for compliance with regulatory standards and frameworks. The EU AI Act has emerged as a landmark in the regulatory...
Interpretable Markov-Based Spatiotemporal Risk Surfaces for Missing-Child Search Planning with Reinforcement Learning and LLM-Based Quality Assurance
arXiv:2603.08933v1 Announce Type: new Abstract: The first 72 hours of a missing-child investigation are critical for successful recovery. However, law enforcement agencies often face fragmented, unstructured data and a lack of dynamic, geospatial predictive tools. Our system, Guardian, provides an...
Curveball Steering: The Right Direction To Steer Isn't Always Linear
arXiv:2603.09313v1 Announce Type: new Abstract: Activation steering is a widely used approach for controlling large language model (LLM) behavior by intervening on internal representations. Existing methods largely rely on the Linear Representation Hypothesis, assuming behavioral attributes can be manipulated using...
Logos: An evolvable reasoning engine for rational molecular design
arXiv:2603.09268v1 Announce Type: new Abstract: The discovery and design of functional molecules remain central challenges across chemistry,biology, and materials science. While recent advances in machine learning have accelerated molecular property prediction and candidate generation, existing models tend to excel either...
MedMASLab: A Unified Orchestration Framework for Benchmarking Multimodal Medical Multi-Agent Systems
arXiv:2603.09909v1 Announce Type: new Abstract: While Multi-Agent Systems (MAS) show potential for complex clinical decision support, the field remains hindered by architectural fragmentation and the lack of standardized multimodal integration. Current medical MAS research suffers from non-uniform data ingestion pipelines,...
Quantifying the Necessity of Chain of Thought through Opaque Serial Depth
arXiv:2603.09786v1 Announce Type: new Abstract: Large language models (LLMs) tend to externalize their reasoning in their chain of thought, making the chain of thought a good target for monitoring. This is partially an inherent feature of the Transformer architecture: sufficiently...
Bioalignment: Measuring and Improving LLM Disposition Toward Biological Systems for AI Safety
arXiv:2603.09154v1 Announce Type: new Abstract: Large language models (LLMs) trained on internet-scale corpora can exhibit systematic biases that increase the probability of unwanted behavior. In this study, we examined potential biases towards synthetic vs. biological technological solutions across four domains...
Reading, Not Thinking: Understanding and Bridging the Modality Gap When Text Becomes Pixels in Multimodal LLMs
arXiv:2603.09095v1 Announce Type: new Abstract: Multimodal large language models (MLLMs) can process text presented as images, yet they often perform worse than when the same content is provided as textual tokens. We systematically diagnose this "modality gap" by evaluating seven...
One Language, Two Scripts: Probing Script-Invariance in LLM Concept Representations
arXiv:2603.08869v1 Announce Type: new Abstract: Do the features learned by Sparse Autoencoders (SAEs) represent abstract meaning, or are they tied to how text is written? We investigate this question using Serbian digraphia as a controlled testbed: Serbian is written interchangeably...
Modelling the Diachronic Emergence of Phoneme Frequency Distributions
arXiv:2603.09503v1 Announce Type: new Abstract: Phoneme frequency distributions exhibit robust statistical regularities across languages, including exponential-tailed rank-frequency patterns and a negative relationship between phonemic inventory size and the relative entropy of the distribution. The origin of these patterns remains largely...
ESAinsTOD: A Unified End-to-End Schema-Aware Instruction-Tuning Framework for Task-Oriented Dialog Modeling
arXiv:2603.09691v1 Announce Type: new Abstract: Existing end-to-end modeling methods for modular task-oriented dialog systems are typically tailored to specific datasets, making it challenging to adapt to new dialog scenarios. In this work, we propose ESAinsTOD, a unified End-to-end Schema-Aware Instruction-tuning...
Benchmarking Political Persuasion Risks Across Frontier Large Language Models
arXiv:2603.09884v1 Announce Type: new Abstract: Concerns persist regarding the capacity of Large Language Models (LLMs) to sway political views. Although prior research has claimed that LLMs are not more persuasive than standard political campaign practices, the recent rise of frontier...
Generalized Reduction to the Isotropy for Flexible Equivariant Neural Fields
arXiv:2603.08758v1 Announce Type: new Abstract: Many geometric learning problems require invariants on heterogeneous product spaces, i.e., products of distinct spaces carrying different group actions, where standard techniques do not directly apply. We show that, when a group $G$ acts transitively...
Expressivity-Efficiency Tradeoffs for Hybrid Sequence Models
arXiv:2603.08859v1 Announce Type: new Abstract: Hybrid sequence models--combining Transformer and state-space model layers--seek to gain the expressive versatility of attention as well as the computational efficiency of state-space model layers. Despite burgeoning interest in hybrid models, we lack a basic...