My favorite papers

Some of my favorite papers.

Interpretability

2021 · Elhage, et al.
2022 · Kevin Meng, David Bau, Alex Andonian, Yonatan Belinkov
2022 · Collin Burns, Haotian Ye, Dan Klein, Jacob Steinhardt

AI Safety

2012 · Nick Bostrom

Causality

2021 · Elias Bareinboim, Juan D. Correa, Duligur Ibeling, Thomas Icard

Classics

2003 · Yoshua Bengio, Réjean Ducharme, Pascal Vincent, Christian Jauvin
2017 · Vaswani, et al.