Module contrib.semseg.manipulation
Manipulate representations by increasing or decreasing the presence of a feature in a ViT activation, then use the linear probe for inference.
Record class-specific scores before and after manipulation to see that you can directly manipulate abilities to complete downstream tasks.
Functions
def main(cfg: Manipulation)
def manipulate(cfg: Manipulation,
sae: SparseAutoencoder,
acts_BWHD: jaxtyping.Float[Tensor, 'batch width height d_vit']) ‑> tuple[jaxtyping.Float[Tensor, 'batch width height d_vit'], jaxtyping.Float[Tensor, 'batch width height d_vit']]