Free Resources for Understanding AI Safety
My time at the Recurse Center has sparked a deep interest in AI safety. I believe that how we handle both the opportunities and risks of AI systems will largely shape our future, and I want to share resources that can help myself and others learn more about the field of AI safety.
Below you’ll find materials I’ve personally used and recommendations I’ve gathered along the way. This is an evolving list that I will edit over time. Consider this post a roadmap for building foundational knowledge in AI Safety, an area that has captured my curiosity and where I hope to work in the future.
This roadmap is organized into two main parts:
- Core Path: Essential prerequisites, fundamentals, and must-read papers for understanding AI safety. My goal with this section is to identify core pre-requisites for an engineer interested in understanding and working in AI safety.
- Expansion Areas: Additional resources for diving deeper into specific areas of interest. Once you’ve completed the core section, you will be able to go further with the resources in this area.
Since I’m creating the roadmap for myself, this assumes a software engineering background, but does not assume any previous AI knowledge. All resources are free.
Core Path
1. Technical Foundation
1a. Mathematics
- Linear Algebra review
- Key focus: matrix operations, eigenvectors, vector spaces
- 3Blue1Brown’s “Essence of Linear Algebra” video series
- Multivariate calculus review
- Key focus: gradients, chain rule, optimization
- 3Blue1Brown’s “Essence of Calculus” series
- Khan academy is also very helpful for this topic
- Statistics review
- Key focus: probability distributions, hypothesis testing, Bayesian inference
1b. Programming
- Basic Python ability, with some familiarity with Numpy and Pandas
1c. Machine Learning Basics
- Core ML Concepts
- FastAI Part 1 and Andrej Karpathy’s Zero to Hero are good places to start. By the end of these courses, you should feel comfortable with the following topics at a high level:
- Basic ML concepts
- Supervised vs unsupervised learning
- Model training workflow
- Overfitting and underfitting
- Neural Networks
- Basic architectures (CNNs, RNNs, Transformers)
- Activation functions and layers
- Backpropagation
- Training & Optimization
- Loss functions
- Gradient descent variants
- Hyperparameter tuning
- Model Evaluation
- Metrics and validation
- Testing strategies
- Error analysis
- Basic ML concepts
- FastAI Part 1 and Andrej Karpathy’s Zero to Hero are good places to start. By the end of these courses, you should feel comfortable with the following topics at a high level:
2. Essential AI Safety Material
2a. Must-Read Papers
- “Concrete Problems in AI Safety” (2016)
- “The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation” (revised 2024)
- “AI Governance: A Research Agenda”
2b. Key Lectures
- “CAIS AI Safety Lecture Series”
- This lecture series complements the AI Safety, Ethics and Society textbook
Congratulations! You have completed the Core section of the roadmap. Now you get to hone in on areas that are interesting to you within the field of AI safety.
Expansion Areas
AI Safety is an extremely broad field, and there are many areas to explore. The following topics have been borrowed from the Wikipedia entry on Research areas in AI safety as an initial stab at categorizing interest areas. I hope to add specific papers and resources here for each focus area as I go.
-
Technical safety and alignment
Includes topics like:
- interpretability and transparency
- reliability
- alignment
- adversarial robustness
-
Governance and policy
Includes topics like:
- international cooperation
- ethics
- standards and benchmarks
-
Systems safety
Includes topics like:
- Monitoring and auditing
- Testing and validation
-
Societal impacts and long-term safety
Includes topics like:
- environmental consequences
- socioeconomic impacts
- power concentration
- long-term risk assessments
Additional Resources
AI safety classes and long-form lectures
Shorter videos and podcasts
Hands-on ML courses
Andrew Ng’s coursera courses, like the Deep Learning specialization, have also been recommended to me. I’m not including them here as they are not free as of this blog post’s publication date.
Organizations
- AI Safety Awareness Foundation
- CHAI UC Berkeley Center for Human-Compatible AI
- Future of Life
- CAIS
- Alignment Research Center
Newsletters, Communities and other references
Published on: 2024-02-10