Think Tank

Philosophy Fellowship 2023 | CAIS Project

The Center for AI Safety is offering grants for philosophers to pursue research in conceptual AI safety.

· March 7, 2026 · 6 min read · 34 views

Sections Program Details 2023 Fellows 2023 Speakers Fellowship News Donate Applications for the 2023 fellowship are now closed. Thanks to everyone who applied. The Program As AI capabilities continue to improve dramatically, the need for safety research has become increasingly apparent. But given the relative youth of the field, much of the conceptual groundwork has yet to be done. The CAIS Philosophy Fellowship invites philosophers from a variety of backgrounds to acquire an in-depth understanding of the current state of AI safety and contribute to novel and field-orienting research directions. Philosophy Fellowship Motivation How the work of philosophers contributes to the broader sociotechnical AI safety community. 1. Conceptual problem Identify a lack of conceptual clarity in the existing AI safety literature. 2. Conceptual Clarification Dissect the problem using rigorous conceptual analysis and relevant philosophical literature. 3. Sociotechnical Orientation Publish conceptual research to inform sociotechnical strategy. 1. Conceptual Problem Advanced AI creates unique conceptual difficulties. Artificial intelligence is reshaping many aspects of day-to-day life. As AI continues on the trajectory to outperform humans on a wide range of cognitive tasks, questions about their properties and potential harms grow increasingly urgent. Conceptual Examples: How can we build systems that are more likely to behave ethically in the face of a rapidly changing world? What processes might shape the behavior of advanced AI systems? Could advanced AI systems pose an existential risk, and if so, how? 2. Conceptual Clarification Academic philosophers are particularly well-positioned to address these conceptual difficulties Philosophers are experts at thinking hard about abstract conceptual problems with no clear answers. Their expertise in working with imprecise concepts makes them the ideal candidates to address the conceptual issues that are characteristic of AI safety. 3. Sociotechnical Orientation Conceptual clarity orients the broader sociotechnical landscape Having the frameworks to analyze which concerns are the most urgent, which are the most likely candidates for serious harm, and how to navigate these risks enables researchers and key decision-makers to reassess their strategies. Goals & Outcomes: This fellowship addresses the need for conceptual clarification through research and field-building efforts. Research: Our team of philosophers critique and build on the existing conceptual AI safety literature, producing new conceptual frameworks to guide technical research. Thus far, our fellows have collectively produced eighteen original papers, soon to be published, covering topics including interpretability, corrigibility, and multipolar scenarios, to name a few. Field-building: We aim for the influence of this fellowship to extend beyond our current cohort, promoting and incentivizing conceptual AI safety research within the broader academic philosophy community. To date, our fellows have received $50,000 in funding to run a workshop connecting technical and conceptual AI safety researchers , organized numerous workshops, and created a special issue journal publication in Philosophical Studies . 2023 Fellowship: Simon Goldstein Simon Goldstein is an Associate Professor in philosophy at Australian Catholic University. Jacqueline Harding Jacqueline Harding is a PhD Student in Symbolic Systems at Stanford University. Cameron Kirk-Giannini Cameron Domenico Kirk-Giannini is an assistant professor of philosophy at Rutgers University–Newark. Nicholas Laskowski Nick Laskowski is an Assistant Professor in the Philosophy Department at University of Maryland, College Park. Nathaniel Sharadin Nate Sharadin is an Assistant Professor of Philosophy at the University of Hong Kong. Dmitri Gallow Dmitri Gallow is a Senior Research Fellow at the Dianoia Institute of Philosophy at the Australian Catholic University. Mitchell Barrington Mitchell Barrington is a PhD student in Philosophy at the University of Michigan - Ann Arbor. Harry Lloyd Harry R. Lloyd is a PhD student in philosophy at Yale University. Frank Hong Frank Hong received his PhD in Philosophy from USC and is an incoming postdoc at Hong Kong University. Bill D'Alessandro William D’Alessandro is a Postdoctoral Fellow at the Munich Center for Mathematical Philosophy and will be a Marie Curie/UKRI Postdoctoral Fellow at the University of Oxford. Elliott Thornley Elliott Thornley is a Postdoctoral Research Fellow in Philosophy at the University of Oxford. He is working on coherence and corrigibility. Robert Long Robert Long recently completed his PhD in philsophy at New York University, during which he also worked as a Research Fellow at the Future of Humanity Institute. 2023 Guest Speakers: Peter Railton Gregory S. Kavka Distinguished Professor of Philosophy at the University of Michigan - Ann Arbor Hilary Greaves Professor of Philosophy at the University of Oxford, Former Director of the Global Priorities Institute Shelly Kagan Clark Professor of Philosophy at Yale University Vincent Müller Alexander von Humboldt Professor of Ethics and Philosophy of AI at the University of Erlangen-Nuremburg L.A. Paul Millstone Family Professor of Philosophy and Professor of Cognitive Science at Yale University Victoria Krakovna AI Research Scientist at DeepMind Jacob Steinhardt Assistant Professor of Computer Science and AI at UC Berkeley David Krueger Assistant Professor of Computer Science and AI at Cambridge University Walter Sinnott-Armstrong Chauncey Stillman Professor of Ethics at Duke University Lara Buchak Professor of Philosophy at Princeton University Johann Frick Associate Professor of Philosophy at the University of California, Berkeley Wendell Wallach Hastings Center senior advisor, ethicist, and scholar at Yale’s Center for Bioethics Rohin Shah Research Scientist at DeepMind 2023 Fellowship News Stay up to date on the latest news and research from the CAIS Philosophy Fellowship. Sign up for email alerts and announcements of future programs. Thank you! Your submission has been received! Oops! Something went wrong while submitting the form. October 24, 2023 Draft Published

Elliott Thornley:

The Shutdown Problem: Three Theorems October 7, 2023 Publication

Jacqueline Harding:

Operationalising Representation in Natural Language Processing (British Journal for the Philosophy of Science) August 28, 2023 Draft Published

Peter Park, Simon Goldstein, and CAIS contributors.:

AI Deception: A Survey of Examples, Risks, and Potential Solutions August 9, 2023 Journal Article

Nathaniel Sharadin:

Predicting and Preferring (Inquiry) August 2, 2023 Journal Article

Simon Goldstein & Cameron Kirk-Giannini:

Language Agents Reduce the Risk of Existential Catastrophe (AI & Society). July 21, 2023 Workshop - 1st AI Impacts Workshop . This workshop, hosted by the AI & Humanity Lab at the University of Hong Kong, will focus on the topic of benchmarking for ML and AI systems. (March 14-15, 2024). July 19, 2023 Draft Published

Mitchell Barrington:

Absolutist AI. July 14, 2023 Op-Ed

Jacqueline Harding & Cameron Kirk-Giannini:

AI's Future Worries Us. So Does it's Present (Boston Globe) . July 6, 2023 Op-Ed

Nathaniel Sharadin:

Hong Kong can be a leader in mitigating the dangers of AI (Hong Kong Free Press). July 4, 2023 Blog Post

Simon Goldstein & Cameron Kirk-Giannini:

A Case for AI Wellbeing (DailyNous). June 30, 2023 Publication

Jacqueline Harding, William D'Alessandro, Nicholas Laskowski, & Robert Long:

AI Language Models Cannot Replace Human Research Participants ( AI & Society) . June 20, 2023 Call for Papers

Submissions for the

Philosophical Studies special edition on AI Safety (edited by Cameron Kirk-Giannini and Dan Hendrycks) are due by November 1! June 14, 2023 Draft Published

J. Dmitri Gallow:

Instrumental Convergence? June 13, 2023 Draft Published

Frank Hong:

Group Prioritarianism: Why AI Should Not Replace Humanity . June 9, 2023 Op-ed

Nathaniel Sharadin:

Growing threat of AI misuse makes the need for effective, targeted regulation all the more urgent (South China Morning Post). June 8, 2023 Op-ed

Nathaniel Sharadin:

Most AI Research Shouldn't be Publicly Released (Bulletin of Atomic Scientists). June 6, 2023 Media

Nathaniel Sharadin on

Bloomberg Radio London (from 20:00). June 1, 2023 Media

Nathaniel Sharadin on

BBC Radio (from 2:12:00). May 31, 2023 Media

Simon Goldstein on

SBS News. May 31, 2023 Draft Published

Simon Goldstein:

Shutdown-Seeking AI May 27, 2023 Draft Published

William D'Alessandro:

Is Deontological AI Safe? May 23, 2023 Draft Published

Cameron Kirk-Giannini & Simon Goldstein:

The Polarity Problem . May 12, 2023 Draft Published

Simon Goldstein:

Aggregating Utilities for Corrigible AI. April 27, 2023 Op-ed

Simon Goldstein & Cameron Kirk-Giannini:

Is it Ethical to Create Generative Agents? Is it Safe? (ABC News). February 20, 2023 Draft Published

Elliott Thornley:

There are No Coherence Theorems . Thank you! Your submission has been received. Sorry, something went wrong while submitting the form. Want to help reduce risks from AI? Donate to support our mission Learn more about AI Frontiers . No technical background required.

Executive Summary

The article discusses the CAIS Philosophy Fellowship 2023, which aims to engage philosophers in addressing conceptual challenges in AI safety. The fellowship seeks to bridge the gap between philosophical inquiry and practical AI safety research, fostering novel conceptual frameworks to guide technical advancements. The program is structured around identifying conceptual problems, providing rigorous philosophical analysis, and orienting sociotechnical strategies. The fellowship has already produced significant research outputs, including eighteen original papers on topics such as interpretability and corrigibility, demonstrating its potential impact on the broader AI safety community.

Key Points

▸ The fellowship targets the need for conceptual clarity in AI safety research.
▸ Philosophers are uniquely positioned to address abstract and complex conceptual problems in AI.
▸ The fellowship aims to influence the broader academic and technical AI safety landscape through research and field-building efforts.

Merits

Interdisciplinary Approach

The fellowship effectively bridges the gap between philosophy and AI safety, leveraging philosophical expertise to address complex conceptual issues in AI.

Field-Building Impact

The program not only produces research but also aims to incentivize and promote conceptual AI safety research within the broader academic community.

Practical Outcomes

The fellowship has already yielded eighteen original papers, demonstrating tangible contributions to the field of AI safety.

Demerits

Limited Scope

The fellowship's focus on conceptual problems may overlook practical and technical aspects of AI safety that require immediate attention.

Selective Participation

The fellowship may not fully represent the diversity of philosophical perspectives, potentially limiting the breadth of conceptual analysis.

Long-Term Impact Uncertainty

While the fellowship has produced initial outputs, the long-term impact on AI safety strategies and policies remains to be seen.

Expert Commentary

The CAIS Philosophy Fellowship 2023 represents a significant step towards integrating philosophical inquiry into the field of AI safety. By addressing conceptual problems through rigorous philosophical analysis, the fellowship aims to provide a foundation for more informed and effective AI safety strategies. The interdisciplinary nature of the program is particularly noteworthy, as it highlights the importance of collaboration between different academic disciplines to tackle complex challenges. However, the fellowship's focus on conceptual issues may overlook practical and technical aspects that require immediate attention. Additionally, the long-term impact of the fellowship's research on AI safety policies and strategies remains to be seen. Overall, the fellowship's contributions to the field are promising, and its potential to influence broader academic and technical communities is substantial.

Recommendations

✓ Expand the fellowship to include a broader range of philosophical perspectives and interdisciplinary collaborations to enhance the diversity of conceptual analysis.
✓ Incorporate practical and technical aspects of AI safety into the fellowship's research agenda to ensure a more comprehensive approach to AI safety.

Sources

Center for AI Safety

Philosophy Fellowship 2023 | CAIS Project

AI Commentary

Executive Summary

Key Points

Merits

Interdisciplinary Approach

Field-Building Impact

Practical Outcomes

Demerits

Limited Scope

Selective Participation

Long-Term Impact Uncertainty

Expert Commentary

Recommendations

Sources

Related Articles

Prominent Scientists, Faith Leaders, Policymakers and Artists Call for a …

Statement: Head of US Policy on the White House AI …

“This is What it Means to be Pro-Human” Declares Broad …

Future of Life Institute Launches Multimillion Dollar Nationwide AI Regulation …

JCG, PC

HSOLLC Co., Ltd.