Key research and resources

Literature Reviews by Dan Hendrycks

Dan Hendrycks is the founding director of the Centre of AI Safety, an organisation which rose to prominence after their co-signed statement on existential risks resulted in the UN Secretary-General calling for an “international AI watchdog”.

Prior to focusing on AI safety specifically, Hendrycks was an accomplished AI researcher in his own right; he earned a PhD from UC Berkeley, and quickly made significant contributions to the field, including a widely-used algorithm used in foundation models such as GPT-3.

An Overview of Catastrophic AI Risks

“This paper provides an overview of the main sources of catastrophic AI risks, which we organize into four categories: malicious use, in which individuals or groups intentionally use AIs to cause harm; AI race, in which competitive environments compel actors to deploy unsafe AIs or cede control to AIs; organizational risks, highlighting how human factors and complex systems can increase the chances of catastrophic accidents; and rogue AIs, describing the inherent difficulty in controlling agents far more intelligent than humans.”

Link: https://arxiv.org/pdf/2306.12001.pdf

Unsolved Problems in ML Safety

“Deploying and monitoring powerful machine learning systems will require high caution, similar to the caution observed for modern nuclear power plants, military aircraft carriers, air traffic control, and other high-risk systems. These complex and hazardous systems are now operated by high reliability organizations (HROs) which are relatively successful at avoiding catastrophes.”

Link: https://arxiv.org/pdf/2109.13916.pdf


Yoshua Bengio’s recent statements on AI risks

Recognised worldwide as one of the leading experts in artificial intelligence, Yoshua Bengio is most known for his pioneering work in deep learning, earning him the 2018 A.M. Turing Award, “the Nobel Prize of Computing,” with Geoffrey Hinton and Yann LeCun.

In 2019, he was awarded the prestigious Killam Prize and in 2022, became the computer scientist with the highest h-index in the world. He is a Fellow of both the Royal Society of London and Canada, Knight of the Legion of Honor of France and Officer of the Order of Canada.

FAQ on Catastrophic AI Risks

“My current estimate places a 95% confidence interval for the time horizon of superhuman intelligence at 5 to 20 years.”

Link: https://yoshuabengio.org/2023/06/24/faq-on-catastrophic-ai-risks/

How Rogue AIs may Arise

“Much more research in AI safety is needed, both at the technical level and at the policy level. For example, banning powerful AI systems (say beyond the abilities of GPT-4) that are given autonomy and agency would be a good start. This would entail both national regulation and international agreements.”

Link: https://yoshuabengio.org/2023/05/22/how-rogue-ais-may-arise/


Research from the Centre for the Governance of AI (GovAI)

Formed from Oxford University’s Future of Humanity Institute, GovAI has been the unofficial global headquarters for AI governance since it was founded in 2018. Their research agenda helped shape the field, and they collaborate closely with Governments, think-tanks and leading AI labs such as OpenAI and DeepMind.

Below are some of the most important research papers released from GovAI and their alumni:

Model evaluation for extreme risks

“Developers must be able to identify dangerous capabilities (through “dangerous capability evaluations”) and the propensity of models to apply their capabilities for harm (through “alignment evaluations”). These evaluations will become critical for keeping policymakers and other stakeholders informed, and for making responsible decisions about model training, deployment, and security.”

Link: https://arxiv.org/pdf/2305.15324.pdf

Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims

“Software Mechanisms and Recommendations: … 5. Standards setting bodies should work with academia and industry to develop audit trail requirements for safety-critical applications of AI systems. 6. Organizations developing AI and funding bodies should support research into the interpretability of AI systems, with a focus on supporting risk assessment and auditing.”

Link: https://arxiv.org/pdf/2004.07213.pdf

Protecting Society from AI Misuse: When are Restrictions on Capabilities Warranted? 

“To prevent some misuses of AI, we argue that targeted interventions on certain capabilities will be warranted. These restrictions may include controlling who can access certain types of AI models, what they can be used for, whether outputs are filtered or can be traced back to their user, and the resources needed to develop them.”

Link: https://arxiv.org/pdf/2303.09377.pdf

Frontier AI Regulation: Managing Emerging Risks to Public Safety

“Frontier AI models pose a distinct regulatory challenge: dangerous capabilities can arise unexpectedly; it is difficult to robustly prevent a deployed model from being misused; and, it is difficult to stop a model’s capabilities from proliferating broadly. To address these challenges, at least three building blocks for the regulation of frontier models are needed:

  1. Standard-setting processes to identify appropriate requirements for frontier AI developers

  2. Registration and reporting requirements to provide regulators with visibility into frontier AI development processes, and;

  3. Mechanisms to ensure compliance with safety standards for the development and deployment of frontier AI models.

Industry self-regulation is an important first step. However, wider societal discussions and government intervention will be needed to create standards and to ensure compliance with them.”

Link: https://arxiv.org/pdf/2307.03718.pdf

Towards Best Practices in AGI Safety and Governance (results from a survey of experts)

“Respondents agreed especially strongly that AGI labs should conduct pre-deployment risk assessments, dangerous capabilities evaluations, third-party model audits, safety restrictions on model usage, and red teaming. 98% of respondents somewhat or strongly agreed that these practices should be implemented.”

Link: https://www.governance.ai/research-paper/towards-best-practices-in-agi-safety-and-governance


Other notable contributions to the field

Strategies and challenges for monitoring AI systems

Why and how governments should monitor artificial intelligence

“In this paper we outline a proposal for improving the governance of artificial intelligence (AI) by investing in government capacity to systematically measure and monitor the capabilities and impacts of AI systems”

Link: https://arxiv.org/abs/2108.12427

Introduction to Compute Governance

Verifying Rules on Large-Scale Neural Network Training via Compute Monitoring

“As advanced machine learning systems’ capabilities begin to play a significant role in geopolitics and societal order, it may become imperative that (1) governments be able to enforce rules on the development of advanced ML systems within their borders, and (2) countries be able to verify each other’s compliance with potential future international agreements on advanced ML development.

This work analyzes one mechanism to achieve this, by monitoring the computing hardware used for large-scale NN training. The framework’s primary goal is to provide governments high confidence that no actor uses large quantities of specialized ML chips to execute a training run in violation of agreed rules.”

Link: https://arxiv.org/pdf/2303.11341.pdf

Regulating AI to prevent disinformation campaigns

Generative Language Models and Automated Influence Operations: Emerging Threats and Potential Mitigations

“For each of these stages, we can think about how an influence operation might be disrupted by using the following sets of questions as starting points:

  • Model Design and Construction: How could AI models be built so they are robust against being misused to create disinformation? Could governments, civil society, or AI producers limit the proliferation of models capable of generating misinformation?

  • Model Access: How could AI models become more difficult for bad actors to access for influence operations? What steps could AI providers and governments take?

  • Content Dissemination: What steps can be taken to deter, monitor, or limit the spread of AI- generated content on social media platforms or news sites? How might the “rules of engagement” on the internet be altered to make the spread of AI-generated disinformation more difficult?”

Link: https://arxiv.org/pdf/2301.04246.pdf

Please note, this research was compiled by volunteers for Australians for AI Safety. It's inclusion is not endorsement by the signatories.