Sam Stevens

[Home] [Writing] [Research] [Blog] [Contact] [CV]

Blog

About Arbitrary thoughts, sometimes with links to particularly interesting articles. # indicates a permalink to the specific “post.”

# Why Saev?

Why did I build saev instead of using existing libraries, like Overcomplete? First, SAE training has different computational bottlenecks compared to ViT or LLM training. Second, SAE training has different complexity bottlenecks, which leads to different abstractions in the code.

The big computational bottleneck in SAE training is loading model activations from disk since SAEs are pretty small compared to other models. In comparison, most LLM training is bottlenecked by matrix multiplication rather than dataloading. Because of this, things like torch.compile, custom kernels, etc are less important than building efficient dataloaders. Saev includes a bunch of specialized activation dataloaders for different purposes (ShuffledDataLoader for training, OrderedDataLoader for inference, IndexedDataset for random access) to help with this.

The other thing is code organization. SAE papers often follow a fixed pattern:

  1. Record activations from a ViT or other model.
  2. Train lots of SAEs on the activations.
  3. Run SAE inference on the training or validation activations.
  4. Explore the SAE predictions, either with summary statistics or looking at individual predictions.

The saev/framework module has code to do this flexibly enough for different projects, but quickly and simply enough to speed you up. The framework really leverages all the components efficiently. For example, the different dataloaders expect ViT activations on disk in a particular format; saev/framework/shards.py saves activations in that format. Another example: saev/framework/inference.py saves SAE predictions as a sparse matrix, which makes it much easier to flexibly analyze without needing to load ViT activations or SAE checkpoints.

November 23, 2025

# Positive and Negative Features in SAEs.

Two recent papers (Fel et al. and Zhu et al.) imply that models represent “opposite” features (like vertical vs horizontal or male vs female) as opposite directions. From Fel: “Despite their opposition, the vectors are nearly colinear with opposite signs, suggesting the model uses polarity to encode meaning.” Zhu et al. deliberately build this into their SAE architecture by keeping high magnitude features instead of strictly positive features.

Anthropic discussed this a little bit in their Toy Models of Superposition but not empirically. I think it’s also hard to build a toy model of this phenomenon because the “opposite-ness” is more qualitative than quantitative. “Horizontal” and “vertical” are used in many of the same contexts, but they represent different ideas within that context. So in some sense they are the same, and in another sense they are opposites.

November 18, 2025

# Personal Computing with AI.

Tiny Corp sells a computer called the tinybox to “commoditize the petaflop and enable AI for everyone.” I love the idea of tinygrad and the tinybox, but I don’t think the tinybox will ever be able to run the best LLMs. I think it will be infeasible for me to afford sufficient compute (VRAM and FLOP/s) to run the best LLMs for many years. Thus, I have to outsource that to model providers (OpenAI and Anthropic, but also third parties like Groq or Cerebras).

In contrast, I can spend compute to provide more parallel environments for the LLMs. Rather than use Codex Cloud or Google’s Jules, I can run coding environments (Docker images) on my personal compute in parallel. I will still be compute bound, but I think it will be more general compute (CPU) rather than matrix-multiplies.

November 15, 2025


[Links] [Source]

Sam Stevens, 2024