aelydens

Free Resources for Understanding AI Safety

Feb 10 2025

My time at the Recurse Center has sparked a deep interest in AI safety. I believe that how we handle both the opportunities and risks of AI systems will largely shape our future, and I want to share resources that can help myself and others learn more about the field of AI safety.

Below you’ll find materials I’ve personally used and recommendations I’ve gathered along the way. This is an evolving list that I will edit over time. Consider this post a roadmap for building foundational knowledge in AI Safety, an area that has captured my curiosity and where I hope to work in the future.

This roadmap is organized into two main parts:

Core Path: Essential prerequisites, fundamentals, and must-read papers for understanding AI safety. My goal with this section is to identify core pre-requisites for an engineer interested in understanding and working in AI safety.
Expansion Areas: Additional resources for diving deeper into specific areas of interest. Once you’ve completed the core section, you will be able to go further with the resources in this area.

Since I’m creating the roadmap for myself, this assumes a software engineering background, but does not assume any previous AI knowledge. All resources are free.

Core Path

1. Technical Foundation

1a. Mathematics

Linear Algebra review
- Key focus: matrix operations, eigenvectors, vector spaces
- 3Blue1Brown’s “Essence of Linear Algebra” video series
Multivariate calculus review
- Key focus: gradients, chain rule, optimization
- 3Blue1Brown’s “Essence of Calculus” series
- Khan academy is also very helpful for this topic
Statistics review
- Key focus: probability distributions, hypothesis testing, Bayesian inference

1b. Programming

Basic Python ability, with some familiarity with Numpy and Pandas

1c. Machine Learning Basics

Core ML Concepts
- FastAI Part 1 and Andrej Karpathy’s Zero to Hero are good places to start. By the end of these courses, you should feel comfortable with the following topics at a high level:
  - Basic ML concepts
    - Supervised vs unsupervised learning
    - Model training workflow
    - Overfitting and underfitting
  - Neural Networks
    - Basic architectures (CNNs, RNNs, Transformers)
    - Activation functions and layers
    - Backpropagation
  - Training & Optimization
    - Loss functions
    - Gradient descent variants
    - Hyperparameter tuning
  - Model Evaluation
    - Metrics and validation
    - Testing strategies
    - Error analysis

2. Essential AI Safety Material

2a. Must-Read Papers

2b. Key Lectures

“CAIS AI Safety Lecture Series”
This lecture series complements the AI Safety, Ethics and Society textbook

Congratulations! You have completed the Core section of the roadmap. Now you get to hone in on areas that are interesting to you within the field of AI safety.

Expansion Areas

AI Safety is an extremely broad field, and there are many areas to explore. The following topics have been borrowed from the Wikipedia entry on Research areas in AI safety as an initial stab at categorizing interest areas. I hope to add specific papers and resources here for each focus area as I go.

Technical safety and alignment

Includes topics like:
- interpretability and transparency
- reliability
- alignment
- adversarial robustness
Governance and policy

Includes topics like:
- international cooperation
- ethics
- standards and benchmarks
Systems safety

Includes topics like:
- Monitoring and auditing
- Testing and validation
Societal impacts and long-term safety

Includes topics like:
- environmental consequences
- socioeconomic impacts
- power concentration
- long-term risk assessments

Additional Resources

AI safety classes and long-form lectures

AI Alignment Research Engineer Accelerator - ARENA

Shorter videos and podcasts

“Solving the AI Alignment Problem | Paul Christiano”

Hands-on ML courses

Andrew Ng’s coursera courses, like the Deep Learning specialization, have also been recommended to me. I’m not including them here as they are not free as of this blog post’s publication date.

Organizations

Newsletters, Communities and other references

Published on: 2024-02-10