TimeSqueeze: Dynamic Patching for Efficient Time Series Forecasting
arXiv:2603.11352v1 Announce Type: new Abstract: Transformer-based time series foundation models face a fundamental trade-off in choice of tokenization: point-wise embeddings preserve temporal fidelity but scale poorly with sequence length, whereas fixed-length patching improves efficiency by imposing uniform boundaries that may...
MaterialFigBENCH: benchmark dataset with figures for evaluating college-level materials science problem-solving abilities of multimodal large language models
arXiv:2603.11414v1 Announce Type: new Abstract: We present MaterialFigBench, a benchmark dataset designed to evaluate the ability of multimodal large language models (LLMs) to solve university-level materials science problems that require accurate interpretation of figures. Unlike existing benchmarks that primarily rely...
A Semi-Decentralized Approach to Multiagent Control
arXiv:2603.11802v1 Announce Type: new Abstract: We introduce an expressive framework and algorithms for the semi-decentralized control of cooperative agents in environments with communication uncertainty. Whereas semi-Markov control admits a distribution over time for agent actions, semi-Markov communication, or what we...
FinRule-Bench: A Benchmark for Joint Reasoning over Financial Tables and Principles
arXiv:2603.11339v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly applied to financial analysis, yet their ability to audit structured financial statements under explicit accounting principles remains poorly explored. Existing benchmarks primarily evaluate question answering, numerical reasoning, or anomaly...
Understanding Wikidata Qualifiers: An Analysis and Taxonomy
arXiv:2603.11767v1 Announce Type: new Abstract: This paper presents an in-depth analysis of Wikidata qualifiers, focusing on their semantics and actual usage, with the aim of developing a taxonomy that addresses the challenges of selecting appropriate qualifiers, querying the graph, and...
Adversarial Reinforcement Learning for Detecting False Data Injection Attacks in Vehicular Routing
arXiv:2603.11433v1 Announce Type: new Abstract: In modern transportation networks, adversaries can manipulate routing algorithms using false data injection attacks, such as simulating heavy traffic with multiple devices running crowdsourced navigation applications, to mislead vehicles toward suboptimal routes and increase congestion....
DIVE: Scaling Diversity in Agentic Task Synthesis for Generalizable Tool Use
arXiv:2603.11076v1 Announce Type: new Abstract: Recent work synthesizes agentic tasks for post-training tool-using LLMs, yet robust generalization under shifts in tasks and toolsets remains an open challenge. We trace this brittleness to insufficient diversity in synthesized tasks. Scaling diversity is...
CreativeBench: Benchmarking and Enhancing Machine Creativity via Self-Evolving Challenges
arXiv:2603.11863v1 Announce Type: new Abstract: The saturation of high-quality pre-training data has shifted research focus toward evolutionary systems capable of continuously generating novel artifacts, leading to the success of AlphaEvolve. However, the progress of such systems is hindered by the...
Multi-Task Reinforcement Learning for Enhanced Multimodal LLM-as-a-Judge
arXiv:2603.11665v1 Announce Type: new Abstract: Multimodal Large Language Models (MLLMs) have been widely adopted as MLLM-as-a-Judges due to their strong alignment with human judgment across various visual tasks. However, most existing judge models are optimized for single-task scenarios and struggle...
SemBench: A Universal Semantic Framework for LLM Evaluation
arXiv:2603.11687v1 Announce Type: new Abstract: Recent progress in Natural Language Processing (NLP) has been driven by the emergence of Large Language Models (LLMs), which exhibit remarkable generative and reasoning capabilities. However, despite their success, evaluating the true semantic understanding of...
Compression Favors Consistency, Not Truth: When and Why Language Models Prefer Correct Information
arXiv:2603.11749v1 Announce Type: new Abstract: Why do language models sometimes prefer correct statements even when trained on mixed-quality data? We introduce the Compression--Consistency Principle: next-token prediction favors hypotheses that allow shorter and more internally consistent descriptions of the training data....
Large Language Models for Biomedical Article Classification
arXiv:2603.11780v1 Announce Type: new Abstract: This work presents a systematic and in-depth investigation of the utility of large language models as text classifiers for biomedical article classification. The study uses several small and mid-size open source models, as well as...
DatedGPT: Preventing Lookahead Bias in Large Language Models with Time-Aware Pretraining
arXiv:2603.11838v1 Announce Type: new Abstract: In financial backtesting, large language models pretrained on internet-scale data risk introducing lookahead bias that undermines their forecasting validity, as they may have already seen the true outcome during training. To address this, we present...
PersonaTrace: Synthesizing Realistic Digital Footprints with LLM Agents
arXiv:2603.11955v1 Announce Type: new Abstract: Digital footprints (records of individuals' interactions with digital systems) are essential for studying behavior, developing personalized applications, and training machine learning models. However, research in this area is often hindered by the scarcity of diverse...
BTZSC: A Benchmark for Zero-Shot Text Classification Across Cross-Encoders, Embedding Models, Rerankers and LLMs
arXiv:2603.11991v1 Announce Type: new Abstract: Zero-shot text classification (ZSC) offers the promise of eliminating costly task-specific annotation by matching texts directly to human-readable label descriptions. While early approaches have predominantly relied on cross-encoder models fine-tuned for natural language inference (NLI),...
To Words and Beyond: Probing Large Language Models for Sentence-Level Psycholinguistic Norms of Memorability and Reading Times
arXiv:2603.12105v1 Announce Type: new Abstract: Large Language Models (LLMs) have recently been shown to produce estimates of psycholinguistic norms, such as valence, arousal, or concreteness, for words and multiword expressions, that correlate with human judgments. These estimates are obtained by...
Long-Context Encoder Models for Polish Language Understanding
arXiv:2603.12191v1 Announce Type: new Abstract: While decoder-only Large Language Models (LLMs) have recently dominated the NLP landscape, encoder-only architectures remain a cost-effective and parameter-efficient standard for discriminative tasks. However, classic encoders like BERT are limited by a short context window,...
Sparking Scientific Creativity via LLM-Driven Interdisciplinary Inspiration
arXiv:2603.12226v1 Announce Type: new Abstract: Despite interdisciplinary research leading to larger and longer-term impact, most work remains confined to single-domain academic silos. Recent AI-based approaches to scientific discovery show promise for interdisciplinary research, but many prioritize rapidly designing experiments and...
Procedural Fairness via Group Counterfactual Explanation
arXiv:2603.11140v1 Announce Type: new Abstract: Fairness in machine learning research has largely focused on outcome-oriented fairness criteria such as Equalized Odds, while comparatively less attention has been given to procedural-oriented fairness, which addresses how a model arrives at its predictions....
Attention Gathers, MLPs Compose: A Causal Analysis of an Action-Outcome Circuit in VideoViT
arXiv:2603.11142v1 Announce Type: new Abstract: The paper explores how video models trained for classification tasks represent nuanced, hidden semantic information that may not affect the final outcome, a key challenge for Trustworthy AI models. Through Explainable and Interpretable AI methods,...
Systematic Scaling Analysis of Jailbreak Attacks in Large Language Models
arXiv:2603.11149v1 Announce Type: new Abstract: Large language models remain vulnerable to jailbreak attacks, yet we still lack a systematic understanding of how jailbreak success scales with attacker effort across methods, model families, and harm types. We initiate a scaling-law framework...
Huntington Disease Automatic Speech Recognition with Biomarker Supervision
arXiv:2603.11168v1 Announce Type: new Abstract: Automatic speech recognition (ASR) for pathological speech remains underexplored, especially for Huntington's disease (HD), where irregular timing, unstable phonation, and articulatory distortion challenge current models. We present a systematic HD-ASR study using a high-fidelity clinical...
Representation Finetuning for Continual Learning
arXiv:2603.11201v1 Announce Type: new Abstract: The world is inherently dynamic, and continual learning aims to enable models to adapt to ever-evolving data streams. While pre-trained models have shown powerful performance in continual learning, they still require finetuning to adapt effectively...
Duration Aware Scheduling for ASR Serving Under Workload Drift
arXiv:2603.11273v1 Announce Type: new Abstract: Scheduling policies in large-scale Automatic Speech Recognition (ASR) serving pipelines play a key role in determining end-to-end (E2E) latency. Yet, widely used serving engines rely on first-come-first-served (FCFS) scheduling, which ignores variability in request duration...
Single molecule localization microscopy challenge: a biologically inspired benchmark for long-sequence modeling
arXiv:2603.11296v1 Announce Type: new Abstract: State space models (SSMs) have recently achieved strong performance on long sequence modeling tasks while offering improved memory and computational efficiency compared to transformer based architectures. However, their evaluation has been largely limited to synthetic...
abx_amr_simulator: A simulation environment for antibiotic prescribing policy optimization under antimicrobial resistance
arXiv:2603.11369v1 Announce Type: new Abstract: Antimicrobial resistance (AMR) poses a global health threat, reducing the effectiveness of antibiotics and complicating clinical decision-making. To address this challenge, we introduce abx_amr_simulator, a Python-based simulation package designed to model antibiotic prescribing and AMR...
Attention Sinks Are Provably Necessary in Softmax Transformers: Evidence from Trigger-Conditional Tasks
arXiv:2603.11487v1 Announce Type: new Abstract: Transformers often display an attention sink: probability mass concentrates on a fixed, content-agnostic position. We prove that computing a simple trigger-conditional behavior necessarily induces a sink in softmax self-attention models. Our results formalize a familiar...
How to watch Jensen Huang’s Nvidia GTC 2026 keynote
GTC — which stands for GPU Technology Conference — is Nvidia's flagship annual event, where the chipmaker typically uses the spotlight to announce new products, champion partnerships, and lay out its vision for the future of computing. Huang's keynote will...
Facebook Marketplace now lets Meta AI respond to buyers’ messages
When buyers inquire about an item’s availability, sellers can use Meta AI to automatically draft replies using information from their listing, such as the description, availability, pickup location, and price.
Bumble introduces an AI dating assistant, ‘Bee’
Bumble's new AI assistant Bee will move the dating app beyond the swipe by matching people based on compatibility and goals.