Design and evaluation of an agentic workflow for crisis-related synthetic tweet datasets
arXiv:2603.13625v1 Announce Type: new Abstract: Twitter (now X) has become an important source of social media data for situational awareness during crises. Crisis informatics research has widely used tweets from Twitter to develop and evaluate artificial intelligence (AI) systems for...
GRPO and Reflection Reward for Mathematical Reasoning in Large Language Models
arXiv:2603.14041v1 Announce Type: new Abstract: The enhancement of reasoning capabilities in large language models (LLMs) has garnered significant attention, with supervised fine-tuning (SFT) and reinforcement learning emerging as dominant paradigms. While recent studies recognize the importance of reflection in reasoning...
Privacy Preserving Topic-wise Sentiment Analysis of the Iran Israel USA Conflict Using Federated Transformer Models
arXiv:2603.13655v1 Announce Type: new Abstract: The recent escalation of the Iran Israel USA conflict in 2026 has triggered widespread global discussions across social media platforms. As people increasingly use these platforms for expressing opinions, analyzing public sentiment from these discussions...
MeTok: An Efficient Meteorological Tokenization with Hyper-Aligned Group Learning for Precipitation Nowcasting
arXiv:2603.13752v1 Announce Type: new Abstract: Recently, Transformer-based architectures have advanced meteorological prediction. However, this position-centric tokenizer conflicts with the core principle of meteorological systems, where the weather phenomena undoubtedly involve synergistic interactions among multiple elements while positional information constitutes merely...
EnterpriseOps-Gym: Environments and Evaluations for Stateful Agentic Planning and Tool Use in Enterprise Settings
arXiv:2603.13594v1 Announce Type: new Abstract: Large language models are shifting from passive information providers to active agents intended for complex workflows. However, their deployment as reliable AI workers in enterprise is stalled by benchmarks that fail to capture the intricacies...
APEX-Searcher: Augmenting LLMs' Search Capabilities through Agentic Planning and Execution
arXiv:2603.13853v1 Announce Type: new Abstract: Retrieval-augmented generation (RAG), based on large language models (LLMs), serves as a vital approach to retrieving and leveraging external knowledge in various domain applications. When confronted with complex multi-hop questions, single-round retrieval is often insufficient...
GradMem: Learning to Write Context into Memory with Test-Time Gradient Descent
arXiv:2603.13875v1 Announce Type: new Abstract: Many large language model applications require conditioning on long contexts. Transformers typically support this by storing a large per-layer KV-cache of past activations, which incurs substantial memory overhead. A desirable alternative is ompressive memory: read...
Large Language Models Reproduce Racial Stereotypes When Used for Text Annotation
arXiv:2603.13891v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used for automated text annotation in tasks ranging from academic research to content moderation and hiring. Across 19 LLMs and two experiments totaling more than 4 million annotation judgments,...
Beyond Explicit Edges: Robust Reasoning over Noisy and Sparse Knowledge Graphs
arXiv:2603.14006v1 Announce Type: new Abstract: GraphRAG is increasingly adopted for converting unstructured corpora into graph structures to enable multi-hop reasoning. However, standard graph algorithms rely heavily on static connectivity and explicit edges, often failing in real-world scenarios where knowledge graphs...
CMHL: Contrastive Multi-Head Learning for Emotionally Consistent Text Classification
arXiv:2603.14078v1 Announce Type: new Abstract: Textual Emotion Classification (TEC) is one of the most difficult NLP tasks. State of the art approaches rely on Large language models (LLMs) and multi-model ensembles. In this study, we challenge the assumption that larger...
OasisSimp: An Open-source Asian-English Sentence Simplification Dataset
arXiv:2603.14111v1 Announce Type: new Abstract: Sentence simplification aims to make complex text more accessible by reducing linguistic complexity while preserving the original meaning. However, progress in this area remains limited for mid-resource and low-resource languages due to the scarcity of...
Selective Fine-Tuning of GPT Architectures for Parameter-Efficient Clinical Text Classification
arXiv:2603.14183v1 Announce Type: new Abstract: The rapid expansion of electronic health record (EHR) systems has generated large volumes of unstructured clinical narratives that contain valuable information for disease identification, patient cohort discovery, and clinical decision support. Extracting structured knowledge from...
Mitigating Overthinking in Large Reasoning Language Models via Reasoning Path Deviation Monitoring
arXiv:2603.14251v1 Announce Type: new Abstract: Large Reasoning Language Models (LRLMs) demonstrate impressive capabilities on complex tasks by utilizing long Chain-of-Thought reasoning. However, they are prone to overthinking, which generates redundant reasoning steps that degrade both performance and efficiency. Recently, early-exit...
Exposing Long-Tail Safety Failures in Large Language Models through Efficient Diverse Response Sampling
arXiv:2603.14355v1 Announce Type: new Abstract: Safety tuning through supervised fine-tuning and reinforcement learning from human feedback has substantially improved the robustness of large language models (LLMs). However, it often suppresses rather than eliminates unsafe behaviors, leaving rare but critical failures...
Extending Minimal Pairs with Ordinal Surprisal Curves and Entropy Across Applied Domains
arXiv:2603.14400v1 Announce Type: new Abstract: The minimal pairs paradigm of comparing model probabilities for contrasting completions has proven useful for evaluating linguistic knowledge in language models, yet its application has largely been confined to binary grammaticality judgments over syntactic phenomena....
BiT-MCTS: A Theme-based Bidirectional MCTS Approach to Chinese Fiction Generation
arXiv:2603.14410v1 Announce Type: new Abstract: Generating long-form linear fiction from open-ended themes remains a major challenge for large language models, which frequently fail to guarantee global structure and narrative diversity when using premise-based or linear outlining approaches. We present BiT-MCTS,...
Creative Convergence or Imitation? Genre-Specific Homogeneity in LLM-Generated Chinese Literature
arXiv:2603.14430v1 Announce Type: new Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities in narrative generation. However, they often produce structurally homogenized stories, frequently following repetitive arrangements and combinations of plot events along with stereotypical resolutions. In this paper, we...
Echoes Across Centuries: Phonetic Signatures of Persian Poets
arXiv:2603.14443v1 Announce Type: new Abstract: This study examines phonetic texture in Persian poetry as a literary-historical phenomenon rather than a by-product of meter or a feature used only for classification. The analysis draws on a large corpus of 1,116,306 mesras...
PARSA-Bench: A Comprehensive Persian Audio-Language Model Benchmark
arXiv:2603.14456v1 Announce Type: new Abstract: Persian poses unique audio understanding challenges through its classical poetry, traditional music, and pervasive code-switching - none captured by existing benchmarks. We introduce PARSA-Bench (Persian Audio Reasoning and Speech Assessment Benchmark), the first benchmark for...
Translational Gaps in Graph Transformers for Longitudinal EHR Prediction: A Critical Appraisal of GT-BEHRT
arXiv:2603.13231v1 Announce Type: new Abstract: Transformer-based models have improved predictive modeling on longitudinal electronic health records through large-scale self-supervised pretraining. However, most EHR transformer architectures treat each clinical encounter as an unordered collection of codes, which limits their ability to...
Continual Fine-Tuning with Provably Accurate and Parameter-Free Task Retrieval
arXiv:2603.13235v1 Announce Type: new Abstract: Continual fine-tuning aims to adapt a pre-trained backbone to new tasks sequentially while preserving performance on earlier tasks whose data are no longer available. Existing approaches fall into two categories which include input- and parameter-adaptation....
Beyond Attention: True Adaptive World Models via Spherical Kernel Operator
arXiv:2603.13263v1 Announce Type: new Abstract: The pursuit of world model based artificial intelligence has predominantly relied on projecting high-dimensional observations into parameterized latent spaces, wherein transition dynamics are subsequently learned. However, this conventional paradigm is mathematically flawed: it merely displaces...
Federated Personal Knowledge Graph Completion with Lightweight Large Language Models for Personalized Recommendations
arXiv:2603.13264v1 Announce Type: new Abstract: Personalized recommendation increasingly relies on private user data, motivating approaches that can adapt to individuals without centralizing their information. We present Federated Targeted Recommendations with Evolving Knowledge graphs and Language Models (FedTREK-LM), a framework that...
CAMEL-CLIP: Channel-aware Multimodal Electroencephalography-text Alignment for Generalizable Brain Foundation Models
arXiv:2603.13272v1 Announce Type: new Abstract: Electroencephalography (EEG) foundation models have shown promise for learning generalizable representations, yet they remain sensitive to channel heterogeneity, such as changes in channel composition or ordering. We propose channel-aware multimodal EEG-text alignment contrastive language-image pretraining...
Learning from Partial Chain-of-Thought via Truncated-Reasoning Self-Distillation
arXiv:2603.13274v1 Announce Type: new Abstract: Reasoning-oriented language models achieve strong performance by generating long chain-of-thought traces at inference time. However, this capability comes with substantial and often excessive computational cost, which can materialize in redundant or inefficient reasoning. We study...
PREBA: Surgical Duration Prediction via PCA-Weighted Retrieval-Augmented LLMs and Bayesian Averaging Aggregation
arXiv:2603.13275v1 Announce Type: new Abstract: Accurate prediction of surgical duration is pivotal for hospital resource management. Although recent supervised learning approaches-from machine learning (ML) to fine-tuned large language models (LLMs)-have shown strong performance, they remain constrained by the need for...
FastODT: A tree-based framework for efficient continual learning
arXiv:2603.13276v1 Announce Type: new Abstract: Machine learning models deployed in real-world settings must operate under evolving data distributions and constrained computational resources. This challenge is particularly acute in non-stationary domains such as energy time series, weather monitoring, and environmental sensing....
Learning Retrieval Models with Sparse Autoencoders
arXiv:2603.13277v1 Announce Type: new Abstract: Sparse autoencoders (SAEs) provide a powerful mechanism for decomposing the dense representations produced by Large Language Models (LLMs) into interpretable latent features. We posit that SAEs constitute a natural foundation for Learned Sparse Retrieval (LSR),...
Demand Acceptance using Reinforcement Learning for Dynamic Vehicle Routing Problem with Emission Quota
arXiv:2603.13279v1 Announce Type: new Abstract: This paper introduces and formalizes the Dynamic and Stochastic Vehicle Routing Problem with Emission Quota (DS-QVRP-RR), a novel routing problems that integrates dynamic demand acceptance and routing with a global emission constraint. A key contribution...
FedTreeLoRA: Reconciling Statistical and Functional Heterogeneity in Federated LoRA Fine-Tuning
arXiv:2603.13282v1 Announce Type: new Abstract: Federated Learning (FL) with Low-Rank Adaptation (LoRA) has become a standard for privacy-preserving LLM fine-tuning. However, existing personalized methods predominantly operated under a restrictive Flat-Model Assumption: they addressed client-side \textit{statistical heterogeneity} but treated the model...