Deactivating Refusal Triggers: Understanding and Mitigating Overrefusal in Safety Alignment
arXiv:2603.11388v1 Announce Type: new Abstract: Safety alignment aims to ensure that large language models (LLMs) refuse harmful requests by post-training on harmful queries paired with refusal answers. Although safety alignment is widely adopted in industry, the overrefusal problem where aligned...
Multilingual Financial Fraud Detection Using Machine Learning and Transformer Models: A Bangla-English Study
arXiv:2603.11358v1 Announce Type: new Abstract: Financial fraud detection has emerged as a critical research challenge amid the rapid expansion of digital financial platforms. Although machine learning approaches have demonstrated strong performance in identifying fraudulent activities, most existing research focuses exclusively...
Assessing Cognitive Biases in LLMs for Judicial Decision Support: Virtuous Victim and Halo Effects
arXiv:2603.10016v1 Announce Type: cross Abstract: We investigate whether large language models (LLMs) display human-like cognitive biases, focusing on potential implications for assistance in judicial sentencing, a decision-making system where fairness is paramount. Two of the most relevant biases were chosen:...
Taiwan Safety Benchmark and Breeze Guard: Toward Trustworthy AI for Taiwanese Mandarin
arXiv:2603.07286v1 Announce Type: new Abstract: Global safety models exhibit strong performance across widely used benchmarks, yet their training data rarely captures the cultural and linguistic nuances of Taiwanese Mandarin. This limitation results in systematic blind spots when interpreting region-specific risks...
Few Tokens, Big Leverage: Preserving Safety Alignment by Constraining Safety Tokens during Fine-tuning
arXiv:2603.07445v1 Announce Type: new Abstract: Large language models (LLMs) often require fine-tuning (FT) to perform well on downstream tasks, but FT can induce safety-alignment drift even when the training dataset contains only benign data. Prior work shows that introducing a...
Leakage Safe Graph Features for Interpretable Fraud Detection in Temporal Transaction Networks
arXiv:2603.06632v1 Announce Type: new Abstract: Illicit transaction detection is often driven by transaction level attributes however, fraudulent behavior may also manifest through network structure such as central hubs, high flow intermediaries, and coordinated neighborhoods. This paper presents a time respecting,...
The Judicial Demand for Explainable Artificial Intelligence
A recurrent concern about machine learning algorithms is that they operate as “black boxes,” making it difficult to identify how and why the algorithms reach particular decisions, recommendations, or predictions. Yet judges will confront machine learning algorithms with increasing frequency,...
Fourth Amendment Equilibrium Adjustment in an Age of Technological Upheaval
The Digital Fourth Amendment is written by Professor Orin Kerr, one of the country’s foremost authorities on the Fourth Amendment, electronic privacy, and criminal procedure. Kerr’s work has been deeply influential in shaping how courts are looking at and deciding...
The Semantics of Jury Nullification: How Terminology Shapes (and Misshapes) the Jury’s Role
Sometimes what we call a practice can matter just as much as the practice itself. Jury nullification has a storied history dating back to...The postThe Semantics of Jury Nullification: How Terminology Shapes (and Misshapes) the Jury’s Roleappeared first onHarvard Law...
Submit to The Georgetown Law Journal
Survey of Text Mining Techniques Applied to Judicial Decisions Prediction
This paper reviews the most recent literature on experiments with different Machine Learning, Deep Learning and Natural Language Processing techniques applied to predict judicial and administrative decisions. Among the most outstanding findings, we have that the most used data mining...
Protecting Noncitizens’ Liberty When the Executive Seeks to Punish
On March 15, 2025, the White House announced that President Trump had invoked an eighteenth-century wartime authority to order the summary removal of noncitizens who were believed to be members of the Venezuelan gang Tren de Aragua.Proclamation No. 10,903, 90...
Experto Crede - Minnesota Law Review
Experto Crede is the official Minnesota Law Review podcast. Listen to the latest episodes on Soundcloud, Spotify, or iTunes! Season 5 5.1 How the Liberal First Amendment Under-Protects Democracy with Professor Tabatha Abu El-Haj The guest for this episode is...
Counsel Fees and Procedural Justice
Introduction Imagine you are charged with a felony. You are indigent, so the judge appoints a lawyer to represent you. Several months later, you are convicted and sentenced to almost nine years in prison. To your surprise, however, you are...
Legal Framework For The Use Of Artificial Intelligence (AI) Technology In The Canadian Criminal Justice System
Defining and Regulating Criminal Legal Risks of AI-Generated Content
The Invisible Prison: Pathways and Prevention
ARTICLE The Invisible Prison: Pathways and Prevention Margaret F. Brinig* & Marsha Garrison** In this Article, we propose a new strategy for curbing crime and delinquency and demonstrate the inadequacy of current reform efforts. Our analysis relies on our own,...
In Defense of Empiricism in Family Law
ARTICLE In Defense of Empiricism in Family Law Elizabeth S. Scott* It is fitting to include an essay defending the application of empirical research to family law and policy in a symposium honoring the scholarly career of Peg Brinig, who...
Anti-Domination and Administration
The foundations of the administrative state are being reshaped, both by the continuing transformations of administrative law doctrine by the courts and by the ambitions for restructuring the executive branch among the current presidential administration. But at the same time,...
Predictive Policing for Reform? Indeterminacy and Intervention in Big Data Policing
Predictive analytics and artificial intelligence are applied widely across law enforcement agencies and the criminal justice system. Despite criticism that such tools reinforce inequality and structural discrimination, proponents insist that they will nonetheless improve the equality and fairness of outcomes...
Academic Programs
Branstetter Litigation & Dispute Resolution Program Criminal Justice Program Energy, Environment, & Land Use Program George Barrett Social Justice Program Intellectual Property Program
Consistency of Large Reasoning Models Under Multi-Turn Attacks
arXiv:2602.13093v2 Announce Type: new Abstract: Large reasoning models with reasoning capabilities achieve state-of-the-art performance on complex tasks, but their robustness under multi-turn adversarial pressure remains underexplored. We evaluate nine frontier reasoning models under adversarial attacks. Our findings reveal that reasoning...
Sparse Autoencoders are Capable LLM Jailbreak Mitigators
arXiv:2602.12418v1 Announce Type: cross Abstract: Jailbreak attacks remain a persistent threat to large language model safety. We propose Context-Conditioned Delta Steering (CC-Delta), an SAE-based defense that identifies jailbreak-relevant sparse features by comparing token-level representations of the same harmful request with...
Paris AI Safety Breakfast #3: Yoshua Bengio
The third of our 'AI Safety Breakfasts' event series, featuring Yoshua Bengio on the evolution of AI capabilities, loss-of-control scenarios, and proactive vs reactive defense.
There Can Be Only Two (Verdicts): The Presumption of Innocence and Jury Verdicts in Criminal Trials
Named Entity Recognition for Payment Data Using NLP
arXiv:2602.14009v1 Announce Type: new Abstract: Named Entity Recognition (NER) has emerged as a critical component in automating financial transaction processing, particularly in extracting structured information from unstructured payment data. This paper presents a comprehensive analysis of state-of-the-art NER algorithms specifically...
An Agentic LLM Framework for Adverse Media Screening in AML Compliance
arXiv:2602.23373v1 Announce Type: new Abstract: Adverse media screening is a critical component of anti-money laundering (AML) and know-your-customer (KYC) compliance processes in financial institutions. Traditional approaches rely on keyword-based searches that generate high false-positive rates or require extensive manual review....
The Non-Punishment Principle and Restorative Justice
The non-punishment principle is a legal norm that has increasingly gained legitimacy over the past quarter-century within international, regional, and domestic law on human trafficking. At its core, this principle opposes the punishment of human trafficking victims for unlawful conduct...
The Constitutionality of Indiscriminate Data Surveillance
Soon enough, the police will have the capacity to know almost everything about everyone. Not because most of us are suspected of doing anything wrong, but because indiscriminate data surveillance—“indiscriminate” meaning precisely that it is not driven by individualized suspicion...