Posts | Andy Arditi

Papers

Jun 17, 2024
Refusal in Language Models Is Mediated by a Single Direction

Blog posts

Mar 22, 2025
Do models say what they learn?
Jan 13, 2025
Finding features causally upstream of refusal
Dec 21, 2024
AI as systems, not just models
Jul 23, 2024
Unlearning via RMU is mostly shallow
Apr 27, 2024
Refusal in LLMs is mediated by a single direction
Dec 8, 2023
Refusal mechanisms: initial experiments with Llama-2-7b-chat
Dec 14, 2022
The anatomy of proof generation
Oct 25, 2022
KZG in practice: polynomial commitment schemes and their usage in scaling Ethereum
Aug 17, 2022
Zero-knowledge: theoretical foundations II
Jul 9, 2022
Zero-knowledge: theoretical foundations I
May 24, 2022
Stablecoins