Will the mystery of the Dobbs leak ever be solved?
Justice Clarence Thomas’ virtual appearance last week at a legal conference in Washington, D.C. brought renewed attention to court security. Thomas had originally planned to attend in person, but he […]The postWill the mystery of the Dobbs leak ever be...
SCOTUStoday for Friday, March 6
On this day in 1857, the Supreme Court released its opinion in Dred Scott v. Sandford, holding that Scott, an enslaved man who spent time in free territory, was not […]The postSCOTUStoday for Friday, March 6appeared first onSCOTUSblog.
Syrian nationals urge Supreme Court to keep ruling in place allowing them to stay in the United States
A group of Syrian nationals urged the Supreme Court on Thursday to leave in place a ruling by a federal judge in New York City that allows them to remain […]The postSyrian nationals urge Supreme Court to keep ruling in...
Court grapples with whether federal law supersedes negligent hiring claims against freight brokers
Updated on March 6 at 10:50 a.m. The Supreme Court on Wednesday heard argument in Montgomery v. Caribe Transport II, LLC, a case on whether federal law prevents state law […]The postCourt grapples with whether federal law supersedes negligent hiring...
Supreme Court rules that New Jersey Transit can be sued in other states
The Supreme Court on Wednesday ruled in Galette v. New Jersey Transit Corporation that two men who were seriously injured in New York and Pennsylvania by buses operated by New […]The postSupreme Court rules that New Jersey Transit can be...
AI Now Institute
AI Now Institute | 19,196 followers on LinkedIn. The AI Now Institute produces diagnosis and actionable policy research on artificial intelligence.
Musk fails to block California data disclosure law he fears will ruin xAI
Musk can't convince judge public doesn’t care about where AI training data comes from.
Tech industry is in tariff hell, even if refunds are automated
Trade groups urge court to create a simple blueprint for tariff refunds.
Trump gets data center companies to pledge to pay for power generation
With no enforcement and questionable economics, it may not make a difference.
Anthropic’s Pentagon deal is a cautionary tale for startups chasing federal contracts
The Pentagon has officially designated Anthropic a supply-chain risk after the two failed to agree on how much control the military should have over its AI models, including its use in autonomous weapons and mass domestic surveillance. As Anthropic’s $200...
Anthropic vs. the Pentagon, the SaaSpocalypse, and why competitions is good, actually
The Pentagon has officially designated Anthropic a supply-chain risk after the two failed to agree on how much control the military should have over its AI models, including its use in autonomous weapons and mass domestic surveillance. As Anthropic’s $200...
DiligenceSquared uses AI, voice agents to make M&A research affordable
Instead of relying on expensive management consultants, the startup uses AI voice agents to conduct interviews with customers of the companies the PE firms are considering buying.
Anthropic CEO Dario Amodei could still be trying to make a deal with Pentagon
Anthropic's $200 million contract with the Department of Defense broke down due to disagreements over giving the military unrestricted access to its AI.
A Dual-Helix Governance Approach Towards Reliable Agentic AI for WebGIS Development
arXiv:2603.04390v1 Announce Type: new Abstract: WebGIS development requires rigor, yet agentic AI frequently fails due to five large language model (LLM) limitations: context constraints, cross-session forgetting, stochasticity, instruction failure, and adaptation rigidity. We propose a dual-helix governance framework reframing these...
One Bias After Another: Mechanistic Reward Shaping and Persistent Biases in Language Reward Models
arXiv:2603.03291v1 Announce Type: cross Abstract: Reward Models (RMs) are crucial for online alignment of language models (LMs) with human preferences. However, RM-based preference-tuning is vulnerable to reward hacking, whereby LM policies learn undesirable behaviors from flawed RMs. By systematically measuring...
Language Model Goal Selection Differs from Humans' in an Open-Ended Task
arXiv:2603.03295v1 Announce Type: cross Abstract: As large language models (LLMs) get integrated into human decision-making, they are increasingly choosing goals autonomously rather than only completing human-defined ones, assuming they will reflect human preferences. However, human-LLM similarity in goal selection remains...
TTSR: Test-Time Self-Reflection for Continual Reasoning Improvement
arXiv:2603.03297v1 Announce Type: cross Abstract: Test-time Training enables model adaptation using only test questions and offers a promising paradigm for improving the reasoning ability of large language models (LLMs). However, it faces two major challenges: test questions are often highly...
TATRA: Training-Free Instance-Adaptive Prompting Through Rephrasing and Aggregation
arXiv:2603.03298v1 Announce Type: cross Abstract: Large Language Models (LLMs) have improved substantially alignment, yet their behavior remains highly sensitive to prompt phrasing. This brittleness has motivated automated prompt engineering, but most existing methods (i) require a task-specific training set, (ii)...
Developing an AI Assistant for Knowledge Management and Workforce Training in State DOTs
arXiv:2603.03302v1 Announce Type: cross Abstract: Effective knowledge management is critical for preserving institutional expertise and improving the efficiency of workforce training in state transportation agencies. Traditional approaches, such as static documentation, classroom-based instruction, and informal mentorship, often lead to fragmented...
HumanLM: Simulating Users with State Alignment Beats Response Imitation
arXiv:2603.03303v1 Announce Type: cross Abstract: Large Language Models (LLMs) are increasingly used to simulate how specific users respond to a given context, enabling more user-centric applications that rely on user feedback. However, existing user simulators mostly imitate surface-level patterns and...
Token-Oriented Object Notation vs JSON: A Benchmark of Plain and Constrained Decoding Generation
arXiv:2603.03306v1 Announce Type: cross Abstract: Recently presented Token-Oriented Object Notation (TOON) aims to replace JSON as a serialization format for passing structured data to LLMs with significantly reduced token usage. While showing solid accuracy in LLM comprehension, there is a...
How does fine-tuning improve sensorimotor representations in large language models?
arXiv:2603.03313v1 Announce Type: cross Abstract: Large Language Models (LLMs) exhibit a significant "embodiment gap", where their text-based representations fail to align with human sensorimotor experiences. This study systematically investigates whether and how task-specific fine-tuning can bridge this gap. Utilizing Representational...
Towards Self-Robust LLMs: Intrinsic Prompt Noise Resistance via CoIPO
arXiv:2603.03314v1 Announce Type: cross Abstract: Large language models (LLMs) have demonstrated remarkable and steadily improving performance across a wide range of tasks. However, LLM performance may be highly sensitive to prompt variations especially in scenarios with limited openness or strict...
M-QUEST -- Meme Question-Understanding Evaluation on Semantics and Toxicity
arXiv:2603.03315v1 Announce Type: cross Abstract: Internet memes are a powerful form of online communication, yet their nature and reliance on commonsense knowledge make toxicity detection challenging. Identifying key features for meme interpretation and understanding, is a crucial task. Previous work...
The Influence of Iconicity in Transfer Learning for Sign Language Recognition
arXiv:2603.03316v1 Announce Type: cross Abstract: Most sign language recognition research relies on Transfer Learning (TL) from vision-based datasets such as ImageNet. Some extend this to alternatively available language datasets, often focusing on signs with cross-linguistic similarities. This body of work...
Can Large Language Models Derive New Knowledge? A Dynamic Benchmark for Biological Knowledge Discovery
arXiv:2603.03322v1 Announce Type: cross Abstract: Recent advancements in Large Language Model (LLM) agents have demonstrated remarkable potential in automatic knowledge discovery. However, rigorously evaluating an AI's capacity for knowledge discovery remains a critical challenge. Existing benchmarks predominantly rely on static...
Discern Truth from Falsehood: Reducing Over-Refusal via Contrastive Refinement
arXiv:2603.03323v1 Announce Type: cross Abstract: Large language models (LLMs) aligned for safety often suffer from over-refusal, the tendency to reject seemingly toxic or benign prompts by misclassifying them as toxic. This behavior undermines models' helpfulness and restricts usability in sensitive...
Controlling Chat Style in Language Models via Single-Direction Editing
arXiv:2603.03324v1 Announce Type: cross Abstract: Controlling stylistic attributes in large language models (LLMs) remains challenging, with existing approaches relying on either prompt engineering or post-training alignment. This paper investigates this challenge through the lens of representation engineering, testing the hypothesis...
IntPro: A Proxy Agent for Context-Aware Intent Understanding via Retrieval-conditioned Inference
arXiv:2603.03325v1 Announce Type: cross Abstract: Large language models (LLMs) have become integral to modern Human-AI collaboration workflows, where accurately understanding user intent serves as a crucial step for generating satisfactory responses. Context-aware intent understanding, which involves inferring user intentions from...