Democratising Clinical AI through Dataset Condensation for Classical Clinical Models
arXiv:2603.09356v1 Announce Type: new Abstract: Dataset condensation (DC) learns a compact synthetic dataset that enables models to match the performance of full-data training, prioritising utility over distributional fidelity. While typically explored for computational efficiency, DC also holds promise for healthcare...
From Representation to Clusters: A Contrastive Learning Approach for Attributed Hypergraph Clustering
arXiv:2603.09370v1 Announce Type: new Abstract: Contrastive learning has demonstrated strong performance in attributed hypergraph clustering. Typically, existing methods based on contrastive learning first learn node embeddings and then apply clustering algorithms, such as k-means, to these embeddings to obtain the...
The how and why of gun control
A Second Opinion is a recurring series by Haley Proctor on the Second Amendment and constitutional litigation. Last Monday, the Supreme Court heard argument in United States v. Hemani. In […]The postThe how and why of gun controlappeared first onSCOTUSblog.
SCOTUSblog’s new podcast partners
SCOTUSblog is excited to announce the addition of podcasts Amarica’s Constitution and Divided Argument to its podcast lineup, joining Advisory Opinions. While both podcasts will maintain their editorial and creative independence, […]The postSCOTUSblog’s new podcast partnersappeared first onSCOTUSblog.
Birthright citizenship: legal takeaways of mice and men and elephants and dogs
Brothers in Law is a recurring series by brothers Akhil and Vikram Amar, with special emphasis on measuring what the Supreme Court says against what the Constitution itself says. For more content from […]The postBirthright citizenship: legal takeaways of mice...
SCOTUStoday for Tuesday, March 10
SCOTUSblog is excited to announce the addition of podcasts Amarica’s Constitution and Divided Argument to its podcast lineup, joining Advisory Opinions. In a new, jam-packed episode, the hosts of all […]The postSCOTUStoday for Tuesday, March 10appeared first onSCOTUSblog.
AI Now Co-ED Amba Kak Gives Remarks Before the UN General Assembly on AI Governance - AI Now Institute
AI-powered apps struggle with long-term retention, new report shows
AI can drive stronger early monetization for apps, but sustaining value remains the challenge, RevenueCat's latest report finds.
ChatGPT can now create interactive visuals to help you understand math and science concepts
Instead of just reading an explanation or looking at a static diagram, users can now engage directly with interactive visuals.
AgentMail raises $6M to build an email service for AI agents
AgentMail provides an API platform that lets you give AI agents their own email inboxes, with support for two-way conversations, parsing, threading, labeling, searching, and replying.
Thinking Machines Lab inks massive compute deal with Nvidia
The multi-year deal involves at least a gigawatt of compute power and also includes a strategic investment from Nvidia.
Google gives in to users’ complaints over AI-powered ‘Ask Photos’ search feature
The option appears on the Google Photos Search screen and lets users pick which experience they want.
Sandbar secures $23M Series A for its AI note-taking ring
Sandbar aims to ship the Stream, which can be used to take notes, chat with an AI assistant, and for media playback, this summer.
Elaborating a Human Rights-Friendly Copyright Framework for Generative AI
Hierarchical Latent Structures in Data Generation Process Unify Mechanistic Phenomena across Scale
arXiv:2603.06592v1 Announce Type: new Abstract: Contemporary studies have uncovered many puzzling phenomena in the neural information processing of Transformer-based language models. Building a robust, unified understanding of these phenomena requires disassembling a model within the scope of its training. While...
A Coin Flip for Safety: LLM Judges Fail to Reliably Measure Adversarial Robustness
arXiv:2603.06594v1 Announce Type: new Abstract: Automated \enquote{LLM-as-a-Judge} frameworks have become the de facto standard for scalable evaluation across natural language processing. For instance, in safety evaluation, these judges are relied upon to evaluate harmfulness in order to benchmark the robustness...
Validation of a Small Language Model for DSM-5 Substance Category Classification in Child Welfare Records
arXiv:2603.06836v1 Announce Type: new Abstract: Background: Recent studies have demonstrated that large language models (LLMs) can perform binary classification tasks on child welfare narratives, detecting the presence or absence of constructs such as substance-related problems, domestic violence, and firearms involvement....
Reforming the Mechanism: Editing Reasoning Patterns in LLMs with Circuit Reshaping
arXiv:2603.06923v1 Announce Type: new Abstract: Large language models (LLMs) often exhibit flawed reasoning ability that undermines reliability. Existing approaches to improving reasoning typically treat it as a general and monolithic skill, applying broad training which is inefficient and unable to...
Hierarchical Embedding Fusion for Retrieval-Augmented Code Generation
arXiv:2603.06593v1 Announce Type: new Abstract: Retrieval-augmented code generation often conditions the decoder on large retrieved code snippets. This ties online inference cost to repository size and introduces noise from long contexts. We present Hierarchical Embedding Fusion (HEF), a two-stage approach...
Counting on Consensus: Selecting the Right Inter-annotator Agreement Metric for NLP Annotation and Evaluation
arXiv:2603.06865v1 Announce Type: new Abstract: Human annotation remains the foundation of reliable and interpretable data in Natural Language Processing (NLP). As annotation and evaluation tasks continue to expand, from categorical labelling to segmentation, subjective judgment, and continuous rating, measuring agreement...
Deep Research, Shallow Evaluation: A Case Study in Meta-Evaluation for Long-Form QA Benchmarks
arXiv:2603.06942v1 Announce Type: new Abstract: Recent advances have made long-form report-generating systems widely available. This has prompted evaluation frameworks that use LLM-as-judge protocols and claim verification, along with meta-evaluation frameworks that seek to validate these methods. Many of the meta-evaluations...
AutoChecklist: Composable Pipelines for Checklist Generation and Scoring with LLM-as-a-Judge
arXiv:2603.07019v1 Announce Type: new Abstract: Checklists have emerged as a popular approach for interpretable and fine-grained evaluation, particularly with LLM-as-a-Judge. Beyond evaluation, these structured criteria can serve as signals for model alignment, reinforcement learning, and self-correction. To support these use...
Hit-RAG: Learning to Reason with Long Contexts via Preference Alignment
arXiv:2603.07023v1 Announce Type: new Abstract: Despite the promise of Retrieval-Augmented Generation in grounding Multimodal Large Language Models with external knowledge, the transition to extensive contexts often leads to significant attention dilution and reasoning hallucinations. The surge in information density causes...
Enhancing Consistency of Werewolf AI through Dialogue Summarization and Persona Information
arXiv:2603.07111v1 Announce Type: new Abstract: The Werewolf Game is a communication game where players' reasoning and discussion skills are essential. In this study, we present a Werewolf AI agent developed for the AIWolfDial 2024 shared task, co-hosted with the 17th...
Emotion Transcription in Conversation: A Benchmark for Capturing Subtle and Complex Emotional States through Natural Language
arXiv:2603.07138v1 Announce Type: new Abstract: Emotion Recognition in Conversation (ERC) is critical for enabling natural human-machine interactions. However, existing methods predominantly employ categorical or dimensional emotion annotations, which often fail to adequately represent complex, subtle, or culturally specific emotional nuances....
Scaling Self-Supervised Speech Models Uncovers Deep Linguistic Relationships: Evidence from the Pacific Cluster
arXiv:2603.07238v1 Announce Type: new Abstract: Similarities between language representations derived from Self-Supervised Speech Models (S3Ms) have been observed to primarily reflect geographic proximity or surface typological similarities driven by recent expansion or contact, potentially missing deeper genealogical signals. We investigate...
Taiwan Safety Benchmark and Breeze Guard: Toward Trustworthy AI for Taiwanese Mandarin
arXiv:2603.07286v1 Announce Type: new Abstract: Global safety models exhibit strong performance across widely used benchmarks, yet their training data rarely captures the cultural and linguistic nuances of Taiwanese Mandarin. This limitation results in systematic blind spots when interpreting region-specific risks...
RILEC: Detection and Generation of L1 Russian Interference Errors in English Learner Texts
arXiv:2603.07366v1 Announce Type: new Abstract: Many errors in student essays can be explained by influence from the native language (L1). L1 interference refers to errors influenced by a speaker's first language, such as using stadion instead of stadium, reflecting lexical...
Domain-Specific Quality Estimation for Machine Translation in Low-Resource Scenarios
arXiv:2603.07372v1 Announce Type: new Abstract: Quality Estimation (QE) is essential for assessing machine translation quality in reference-less settings, particularly for domain-specific and low-resource language scenarios. In this paper, we investigate sentence-level QE for English to Indic machine translation across four...
The Dual-Stream Transformer: Channelized Architecture for Interpretable Language Modeling
arXiv:2603.07461v1 Announce Type: new Abstract: Standard transformers entangle all computation in a single residual stream, obscuring which components perform which functions. We introduce the Dual-Stream Transformer, which decomposes the residual stream into two functionally distinct components: a token stream updated...