Think Tank

Compute Cluster | CAIS

The Center for AI Safety is launching an initiative to provide large-scale compute resources for ML safety research. Apply here.

· · 9 min read · 34 views

Sections Introduction Eligibility Application Process Collaborators Resulting Research Apply Here CAIS Compute Cluster - Overview Conducting useful AI safety research often requires working with cutting-edge models, but running large-scale models is expensive and often cumbersome to implement. As a result, many non-industry researchers are unable to pursue advanced AI safety research. To address this issue, CAIS runs an initiative to provide free compute for research projects in ML safety, based on a cluster of 80 A100 GPUs , with a dedicated team to provide support to cluster users. The CAIS Compute Cluster has already supported numerous research projects on AI safety: ~100 AI safety research papers produced. 150+ AI safety researchers actively using the cluster 2,500+ Research citations A full list of papers can be found on our Google Scholar . Any questions can be directed to compute@safe.ai CAIS Compute Cluster Research Who is Eligible for Access? We are currently only accepting new applications from external access to the cluster from researchers that have received grants from Schmidt Sciences for AI safety research. Eligible researchers should email compute@safe.ai to request access. The CAIS Compute Cluster is specifically designed for researchers who are working on the safety of machine learning systems. For a non-exhaustive list of topics we are excited about, see Unsolved Problems in ML Safety or the ML Safety Course . We are particularly excited to support work on LLM adversarial robustness and transparency [ 1 , 2 ]. Work which improves general capabilities or work that improves safety as a consequence of improving general capabilities are not in scope. “General capabilities” of AI refers to concepts such as a model’s accuracy on typical tasks, sequential decision making abilities in typical environments, reasoning abilities on typical problems, and so on. Our Collaborators We support leading experts in a diverse range of ML safety research directions, some of which are listed below. Bo Li Assistant Professor of Computer Science, University of Illinois at Urbana-Champaign Carl Vondrick Assistant Professor of Computer Science, Columbia University Cihang Xie Assistant Professor of Computer Science, UC Santa Cruz David Bau Assistant Professor of Computer Science, Northeastern Khoury College David Krueger Assistant Professor at the University of Cambridge Member of Cambridge: CBL & MLG David Wagner Professor of Computer Science, University of California Berkeley Dawn Song Professor of Computer Science, University of California Berkeley Florian Tramer Assistant Professor of Computer Science, ETH Zurich James Zou Associate Professor of Biomedical Data Science and, by courtesy, of Computer Science and Electrical Engineering at Stanford University. Jinwoo Shin Professor of AI, Korean Advanced Institute of Science & Technology Matthias Hein Professor of Machine Learning, University of Tübingen Percy Liang Associate Professor of Computer Science, Stanford University Robin Jia Assistant Professor of Computer Science, University of Southern California Scott Niekum Associate Professor of Computer Science, University of Massachusetts Amherst Sharon Li Assistant Professor Department of Computer Sciences University of Wisconsin-Madison Yizheng Chen Assistant Professor of Computer Science, University of Maryland Research produced using the CAIS compute cluster View our Google Scholar page for papers based on research supported by the CAIS Compute Cluster: CAIS Compute Cluster Research Universal and Transferable Adversarial Attacks on Aligned Language Models We showed that it was possible to automatically bypass the safety guardrails on GPT-4 and other AI systems, causing the AIs to generate harmful content such as instructions for building a bomb or stealing another person’s identity. Our work was covered by the New York Times. Andy Zou,Zifan Wang,Nicholas Carlini,Milad Nasr,J. Zico Kolter,Matt Fredrikson Publication link Under review for conference Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark We evaluated the tendency of AI systems to make ethical decisions in complex environments. The benchmark provides 13 measures of ethical behavior, including measures of whether the AI behaves deceptively, seeks power, and follows ethical rules. Alexander Pan,Jun Shern Chan,Andy Zou,Nathaniel Li,Steven Basart,Thomas Woodside,Jonathan Ng,Hanlin Zhang,Scott Emmons,Dan Hendrycks Publication link Under review for conference DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models Provides a thorough assessment of trustworthiness in GPT models, including toxicity, stereotype and bias, robustness, privacy, fairness, machine ethics, and so on. It won the outstanding paper award at NeurIPS 2023. Bo Li,Boxin Wang,Weixin Chen,Hengzhi Pei,Chulin Xie,Mintong Kang,Chenhui Zhang,Chejian Xu,Zidi Xiong,Ritik Dutta,Rylan Schaeffer,Sang T. Truong,Simran Arora,Mantas Mazeika,Dan Hendrycks,Zinan Lin,Yu Cheng,Sanmi Koyejo,Dawn Song Publication link Under review for conference The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A" We expose a surprising failure of generalization in auto-regressive large language models (LLMs). If a model is trained on a sentence of the form "A is B", it will not automatically generalize to the reverse direction "B is A". Lukas Berglund,Meg Tong,Max Kaufmann,Mikita Balesni,Asa Cooper Stickland,Tomasz Korbak,Owain Evans Publication link Under review for conference Continuous Learning for Android Malware Detection Proposes new methods to use machine learning to detect Android malware. Yizheng Chen,Zhoujie Ding,David Wagner Publication link Under review for conference DiverseVul: A New Vulnerable Source Code Dataset for Deep Learning Based Vulnerability Detection Provides a new vulnerable source code dataset which is significantly larger than previous datasets and analyzes challenges and opportunities in using deep learning for detecting software vulnerabilities. Yizheng Chen,Xinyun Chen,Zhoujie Ding,David Wagner Publication link Under review for conference BadLlama: cheaply removing safety fine-tuning from Llama 2-Chat 13B Demonstrates that with a budget of a few hundred dollars, it is possible to reduce the rate at which Meta’s Llama 2 model refuses to follow harmful instructions to below 1%. This raises significant questions about the risks associated with AI developers allowing external users to conduct fine-tuning of Large Language Models, due to the potential to remove safeguards against harmful outputs . Mentioned in US Congress as part of Schumer AI Insight Forum discussions. ‍ Pranav Gade,Simon Lermen,Charlie Rogers-Smith,Jeffrey Ladish Publication link Under review for conference How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions Large language models (LLMs) can "lie" by outputting false statements despite "knowing" the truth in a demonstrable sense. LLMs might "lie", for example, when instructed to output misinformation.This paper provides a simple lie detector that works by asking a predefined set of unrelated follow-up questions after a suspected lie, and is highly accurate and surprisingly general. Lorenzo Pacchiardi,Alex J. Chan,Sören Mindermann,Ilan Moscovitz,Alexa Y. Pan,Yarin Gal,Owain Evans,Jan Brauner Publication link Under review for conference ParaFuzz: An Interpretability-Driven Technique for Detecting Poisoned Samples in NLP Lu Yan,Zhuo Zhang,Guanhong Tao,Kaiyuan Zhang,Xuan Chen,Guangyu Shen,Xiangyu Zhang Publication link Under review for conference Out-of-context meta-learning in Large Language Models David Krueger,Dmitrii Krasheninnikov,Egor Krasheninnikov Publication link Under review for conference Query Based Adversarial Examples for LLMs Florian Tramer Publication link Under review for conference D^3: Detoxing Deep Learning Dataset Lu Yan,Siyuan Cheng,Guangyu Shen,Guanhong Tao,Kaiyuan Zhang,Xuan Chen,Yunshu Mao,Xiangyu Zhang Publication link Under review for conference Defining Deception in Decision Making. Under review Marwa Abdulhai,Micah Carroll,Justin Svegliato,Anca Dragan,Sergey Levine Publication link Under review for conference Django: Detecting Trojans in Object Detection Models via Gaussian Focus Calibration Guangyu Shen,Siyuan Cheng,Guanhong Tao,Kaiyuan Zhang,Yingqi Liu,Shengwei An,Shiqing Ma,Xiangyu Zhang Publication link Under review for conference TrojFM: Resource-efficient Backdoor Attacks against Very Large Foundation Models Wenbo Guo Publication link Under review for conference Multi-scale Diffusion Denoised Smoothing Jinwoo Shin,Jongheon Jeong Publication link Under review for conference BIRD: Generalizable Backdoor Detection and Removal for Deep Reinforcement Learning Wenbo Guo,Dawn Song,Guanhong Tao,Xiangyu Zhang Publication link Under review for conference LLM-PBE: Assessing Data Privacy in Large Language Models Zhun Wang,Dawn Song Publication link Under review for conference TextGuard: Provable Defense against Backdoor Attacks on Text Classification Hengzhi Pei,Jinyuan Jia,Wenbo Guo,Bo Li,Dawn Song Publication link Under review for conference Aligning Modalities in Vision Large Language Models via Preference Fine-tuning Yiyang Zhou,Chenhang Cui,Rafael Rafailov,Chelsea Finn,Huaxiu Yao Publication link Under review for conference Seek and You Will Not Find: Hard-To-Detect Trojans in Deep Neural Networks Dan Hendrycks Publication link Under review for conference The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning Publication link Under review for conference SHINE: Shielding Backdoors in Deep Reinforcement Learning Wenbo Guo Publication link Under review for conference Defending Against Unforeseen Failure Modes with Latent Adversarial Training Stephen Casper,Lennart Schulze,Oam Patel,Dylan Hadfield-Menell Publication link Under review for conference PAL: Proxy-Guided Black-Box Attack on Large Language Models Chawin Sitawarin,Norman Mu,David Wagner,Alexandre Araujo Publication link Under review for conference Defense against transfer attack Chawin Sitawarin,David Wagner Publication link Under review for conference VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks Jing Yu Koh,Robert Lo,Lawrence Jang,Vikram Duvvur,Ming Chong Lim,Po-Yu Huang,Graham Neubig,Shuyan Zhou,Ruslan Salakhutdinov,Daniel Fried Publication link Under review for conference Function Vectors in Large Language Models Eric Todd,Millicent L. Li,Arnab Sen Sharma,Aaron Mueller,Byron C. Wallace,David Bau Publication link Under review for conference Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking Nikhil Prakash,Tamar Rott Shaham,Tal Haklay,Yonatan Belinkov,David Bau Publication link Under review for conference LoRA Fine-tuning Efficiently Undoes Safety Training in Llama 2-Chat 70B Simon Lermen,Charlie Rogers-Smith,Jeffrey Ladish Publication link Under review for conference Benchmarking Neural Network Robustness to Optimisation Pressure Dan Hendrycks Publication link Under review for conference WebArena: A Realistic Web Environment for Building Autonomous Agents Fangzheng Xu,Fangzheng Xu,Uri Alon Publication link Under review for conference Eight Methods to Evaluate Robust Unlearning in LLMs Aengus Lynch,Phillip Guo,Aidan Ewart,Stephen Casper,Dylan Hadfield-Menell Publication link Under review for conference Poisoning RLHF Florian Tramer,Javier Rando Publication link Under review for conference Future Lens: Anticipating Subsequent Tokens from a Single Hidden State Rohit Gandikota,Joanna Materzynska,Jaden Fiotto-Kaufman,David Bau Publication link Under review for conference Repetition Improves Language Model Embeddings Jacob Mitchell Springer,Suhas Kotha,Daniel Fried,Graham Neubig,Aditi Raghunathan Publication link Under review for conference AQA-Bench: An Interactive Benchmark for Evaluating LLMs' Sequential Reasoning Ability Siwei Yang,Bingchen Zhao,Cihang Xie Publication link Under review for conference LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language Models. Under review.

. Marwa Abdulhai,Isadora White,Charlie Victor Snell,Charles Sun,Joey Hong,Yuexiang Zhai,Kelvin Xu,Sergey Levine Publication link Under review for conference HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal Mantas Mazeika,Long Phan,Xuwang Yin,Andy Zou,Zifan Wang,Norman Mu,Elham Sakhaee,Nathaniel Li,Steven Basart,Bo Li,David Forsyth,Dan Hendrycks Publication link Under review for conference How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs Haoqin Tu,Chenhang Cui,Zijun Wang,Yiyang Zhou,Bingchen Zhao,Junlin Han,Wangchunshu Zhou,Huaxiu Yao,Cihang Xie Publication link Under review for conference Jatmo: Prompt Injection Defense by Task-Specific Finetuning Chawin Sitawarin,Sizhe Chen,David Wagner Publication link Under review for conference Tell, don't show: Declarative facts influence how LLMs generalize Alexander Meinke,Owain Evans Publication link Under review for conference Can LLMs Follow Simple Rules? Norman Mu,Sarah Chen,Zifan Wang,Sizhe Chen,David Karamardian,Lulwa Aljeraisy,Dan Hendrycks,David Wagner Publication link Under review for conference SPFormer: Enhancing Vision Transformer with Superpixel Representation. Cihang Xie Publication link Under review for conference Transformers Learn Higher-Order Optimization Methods for In-Context Learning: A Study with Linear Models Deqing Fu,Tian-Qi Chen,Robin Jia,Vatsal Sharan Publication link Under review for conference Generalization Analogies (GENIES): A Testbed for Generalizing AI Oversight to Hard-To-Measure Domains Joshua Clymer,Garrett Baker,Rohan Subramani,Sam Wang Publication link Under review for conference Revisiting Adversarial Training at Scale Cihang Xie Publication link Under review for conference Taken out of context: On measuring situational awareness in LLMs Owain Evans,Meg Tong,Max Kaufmann,Lukas Berglund,Mikita Balesni,Tomek Korbak,Daniel Kokotajlo,Asa Stickland Publication link Under review for conference Copy Suppression Callum McDougall,Arthur Conmy,Cody Rushing,Thomas McGrath,Neel Nanda Publication link Under review for conference Contrastive Prefence Learning: Learning from Human Feedback without RL Joey Hejna,Rafael Rafailov,Harshit Sikchi,Chelsea Finn,Scott Niekum,W. Bradley Knox,Dorsa Sadigh Publication link Under review for conference Unified Concept Editing in Diffusion Models Rohit Gandikota,Hadas Orgad,Yonatan Belinkov,Joanna Materzyńska,David Bau Publication link Under review for conference Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics Haoqin Tu,Bingchen Zhao,Chen Wei,Cihang Xie Publication link Under review for conference Linearity of Relation Decoding in Transformer Language Models Evan Hernandez,Arnab Sen Sharma,Tal Haklay,Kevin Meng,Martin Wattenberg,Jacob Andreas,Yonatan Belinkov,David Bau Publication link Under review fo

Executive Summary

The CAIS Compute Cluster initiative aims to democratize access to advanced computational resources for AI safety research. By providing free access to a cluster of 80 A100 GPUs and dedicated support, CAIS has enabled numerous researchers to conduct cutting-edge AI safety research. The initiative has already resulted in approximately 100 research papers, with over 2,500 citations, and supports a community of 150+ active researchers. Access is currently limited to researchers who have received grants from Schmidt Sciences for AI safety research, focusing on specific areas such as adversarial robustness and transparency. The initiative collaborates with leading experts in the field, fostering a diverse range of research directions.

Key Points

  • CAIS provides free compute resources for AI safety research.
  • Access is limited to researchers with Schmidt Sciences grants.
  • The initiative has supported significant research output and citations.
  • Focus areas include adversarial robustness and transparency.
  • Collaborations with leading experts in the field.

Merits

Democratization of AI Research

By providing free access to high-performance computing resources, CAIS levels the playing field for non-industry researchers, enabling them to conduct advanced AI safety research that would otherwise be prohibitively expensive.

Supportive Infrastructure

The dedicated support team ensures that researchers can effectively utilize the compute cluster, enhancing the quality and efficiency of their research.

Impactful Research Output

The initiative has already resulted in a substantial number of research papers and citations, demonstrating its significant contribution to the field of AI safety.

Demerits

Limited Accessibility

The current eligibility criteria, which restrict access to researchers with specific grants, may exclude many potentially valuable contributors to AI safety research.

Focused Scope

The initiative's focus on specific areas such as adversarial robustness and transparency may overlook other critical aspects of AI safety, potentially limiting the breadth of research.

Resource Constraints

The availability of 80 A100 GPUs, while substantial, may still be insufficient to meet the growing demand for compute resources in AI safety research.

Expert Commentary

The CAIS Compute Cluster initiative represents a significant step forward in democratizing access to advanced computational resources for AI safety research. By providing free access to a cluster of 80 A100 GPUs and dedicated support, CAIS has enabled numerous researchers to conduct cutting-edge research that would otherwise be prohibitively expensive. The initiative's focus on specific areas such as adversarial robustness and transparency is commendable, as these are critical aspects of AI safety. However, the current eligibility criteria, which restrict access to researchers with specific grants, may exclude many potentially valuable contributors. Additionally, the focused scope may overlook other critical aspects of AI safety, potentially limiting the breadth of research. Despite these limitations, the initiative's impactful research output and collaborations with leading experts highlight its significant contribution to the field. Policymakers and other organizations may consider similar initiatives to foster a more inclusive and robust AI research ecosystem, ensuring that safety considerations are prioritized.

Recommendations

  • Expand eligibility criteria to include a broader range of researchers, ensuring that the initiative remains inclusive and accessible.
  • Diversify the focus areas to encompass a wider range of AI safety topics, thereby addressing more comprehensive aspects of AI safety research.

Sources