Activation Patching

2 articles tagged “Activation Patching”

Local MI Lab: How Controls Broke My First Induction Story

I built Local MI Lab because SELF-GROUND had become too heavy for the amount of mechanistic interpretability practice I actually had. The negation SAE work taught me a lot, but it also wrapped every …