My favorite papers
Some of my favorite papers.
Interpretability
2021 · Elhage, et al.
2022 · Kevin Meng, David Bau, Alex Andonian, Yonatan Belinkov
2022 · Collin Burns, Haotian Ye, Dan Klein, Jacob Steinhardt
AI Safety
2012 · Nick Bostrom
Causality
2021 · Elias Bareinboim, Juan D. Correa, Duligur Ibeling, Thomas Icard
Classics
2003 · Yoshua Bengio, Réjean Ducharme, Pascal Vincent, Christian Jauvin
2017 · Vaswani, et al.