Saev

saev is a package for training sparse autoencoders (SAEs) on vision transformers (ViTs) in PyTorch. It also includes an interactive webapp for looking through a trained SAE's features.

API reference docs are available below, as well as the source code on GitHub.

My logbook is a set of notes that might also be useful.

Package Docs

saev

Package to train SAEs for vision models.

probing

Package for probing for individual features in trained SAEs.

faithfulness

Package for measuring SAE feature faithfulness through feature manipulation.