Note: this was a summary I wrote of the session “Community Capacity” at the Foresight Institute’s 2020 AGI Strategy Meeting

The problem - (long term) AI Safety needs talent and more resources!

What is the current state of AI Safety research, and what type of work are we missing most to ensure a flourishing long term future for humanity? At the 2020 AGI Strategy Meeting, Jaan Tallin presented his thoughts on this question. Jaan has been actively involved in AI Safety since reading Eliezer Yudkowsky’s writings on LessWrong in 2008, and helped co-found both the Centre for the Study of Existential Risk and the Future of Life Institute.

Regarding the state of AI safety research today, a useful metaphor is “the drill and the city”. In this model, the city which represents the current boom in AI related work. Below ground, there is a drill, which is digging deep into the Earth. “Above ground” activities include things like discussions about ethics, the development of broad AI principles and policy papers, and near-term AI safety research on hot topics such as robustness, explainability, bias, and transparency. These subjects are relatively easy for people to get involved in and formulate opinions on, and are discussed heavily in the popular press. These subjects are also of commercial relevance, for both near term safety and branding, and thus are invested in by industry. The vast majority of work is “above ground”. By contrast, there are perhaps only two dozen PhD students working “at the tip of the drill”, on hard technical problems in long term AI safety. Very few people are capable of understanding the “tip of the drill” work, and many people working in the city aren’t even aware of it.

Besides being limited in terms of numbers, technical work on fundamentals of AI safety is highly concentrated in just three geographic locations - San Francisco/Berkeley (OpenAI, MIRI, BERI, CHAI), London (DeepMind), and Oxford (FHI). There doesn’t appear to be any long term AI safety work in defense departments (one possible exception is the Johns Hopkins Applied Physics Laboratory, a US Department of Defense contractor). It is well known that defense departments have had trouble attracting AI talent due to ethical concerns among tech workers. Companies also tend to offer greater prestige and salary - Google’s Deepmind, for instance, has been very successful in vacuuming up top talent. While it is possible additional AI Safety work is taking place in secret DoD labs, we can’t rely on that being the case. Given the small number of people currently involved in deep fundamental work, and the importance of such work for the long term future of humanity, getting more people to work at “the tip of the drill” is critical.

An open question is how much current work on near term AI-safety (things like robustness, explainability, transparency, and assured autonomy) might benefit the long term work happening at the “tip of the drill”. Tip of the drill work deals with controlling general purpose agents, some of which may be capable of re-writing their code and have superhuman intelligence. Current day methods for robustness, explainability, and transparency are extremely primitive and arguably don’t really work that well even for today’s deep neural networks. While it’s possible today’s research will eventually lead to future techniques which do generalize to AGIs, we can’t rely on this, so we must also attack the general AI alignment problem head-on.


We’ve established the need for more people attacking hard problems in fundamental AI safety. Currently, private industry is not stepping up to meet this need, although some companies now have teams working on “AI Ethics”. While AI ethics initiatives can be good for public relations, they can also be a liability and magnet for controversy, as the 2019 breakup of Google’s AI ethics board demonstrates. Furthermore, these types of AI ethics initiatives and teams are not set up to do the deep technical work that is required. Thus, we need private donors to step up to fund AI safety research. One issue is that even if private donors agree that deep AI safety work is needed, they lack the technical expertise to evaluate which organizations they should donate to. Thus, donors need recommenders who can be trusted to make good recommendations. To facilitate this one can imagine a system with three tiers - AI safety organizations at the bottom, recommenders in the middle, and donors on top. Donors would rate recommenders, and recommenders would rate AI safety organizations.

Another strategy is to support public awareness of existential and catastrophic risk, including AI Safety. One silver lining of the COVID-19 pandemic is that it may foster discussion about biosecurity risk, and in turn existential risks more generally. This could lead to more donations to organizations combating existential risk and perhaps directly to AI Safety organizations as well.

Working with media influencers and spreading “rational memes” could all provide dividends. These rational memes would counter actively-harmful memes such as the infamous “terminator” meme. The 2017 viral short film Slaughterbots co-produced by Stuart Russel and the Future of Life Institute provides an example of what can be done. The film is credited with raising the public’s awareness about lethal autonomous weapons and also shaped the language used in discourse on the subject (“armed quadcopters” became “slaughterbots”).