We are excited to support work that addresses the technical and governance aspects of reducing AI risk.

The Alignment Problem

“The primary concern is not spooky emergent consciousness but simply the ability to make high-quality decisions... "
- Dr. Stuart Russel

A growing number of experts believe that ensuring AI systems have the objectives we want them to have will be a challenging technical problem. This is a concern of existential proportions because:
1. There are strong economic incentives to improve AI capabilities quickly, which could cause safety to be neglected.
2. Any highly capable system will be motivated to preserve itself and acquire resources in order to meet its objectives, which could result in our extinction if it isn't carefully aligned. Learn more.

The Governance Angle

Ensuring AI systems are aligned will come at a cost (called the alignment tax). Technical research can help lower that tax, but we still need to ensure that companies and governments will pay it. AI governance involves creating and implementing strategies to do this. It also aims to more generally make the transition to a world with advanced AI systems go well. This includes considering concerns about power centralization, coordination failure, and other risks that could be introduced by highly capable AI. Learn more.

Past Projects

Summer 2021
Multi-Agent Inverse Reinforcement Learning: Suboptimal Demonstrations and Alternative Solution Concepts
Reward learning methods intended for use in multi-agent settings with realistic human actors must account for suboptimal human reasoning and model social dynamics and outcomes which accurately reflect this. Multi-agent inverse reinforce-ment learning (MIRL) can be used to learn rewards from agents in social envi-ronments, but to do so realistically, must break from the conceptually simple andcomputationally tractable formalisms of game
Cheaper language model alignment from human feedback
To better align with human preferences, recent text generation algorithms leverage human feedback on examples of the task at hand. However, high-quality feedback data is prohibitively expensive. We propose a methodology in which meaningful binary comparisons are drawn from noisy user feedback to advicegiven in an online forum.
China's National Team Approach to AI Policy Making
Scientists play an ever crucial role in policy-making about complex policy issues such as artificial intelligence, but the role of Chinese scientists in policy-making is significantly less studied. This article argues that existing frameworks to study the roles of Chinese scientists in policy-making are not yet fit for the purpose because the effect of political regime types on scientists' influence is less considered. The Chinese regime and political environment shape the role of scientists in the primary ways:
Self-Enforcing Treaties Reduce Risk from Technology Races
Certain treaties aim to reduce risk from dangerous technologies, such as nuclear weapons and ozone-depleting chemicals. This paper examines how treaties can reduce risk from future races to develop powerful technologiessuch as artificial intelligence. Technology races risk loss of life or other catastrophes when safe development slowsa project’s progress in the race. This tradeoff gives competitors an incentive to skimp on safety, which increasesthe risk of catastrophe to all competitors. I model a treaty that uses side payments to persuade projects todevelop a technology safely, reducing risk from the race. I find that a treaty only occurs if the race includes manycompetitors, otherwise a treaty cannot be negotiated. In contrast to earlier work, I show that openness abouttechnical capabilities can reduce risk by allowing states to negotiate a treaty.1