Building the Mechanistically Interpretable Curriculum (MIC) Framework
When I started researching how to build my own model, I started to realize that the idea of interpretability and safety was all still based on, from what I could see, intepretation of data that we were assuming was
The MIC Structure Six Phase Implementation Guide
I wanted something that was scalable, the Pythia suite made the most sense here. If I was going to start figuring out what features were firing and how to track it, it made sense to start here.
Part way through working on this I came to the conclusion that parts of this could be used to completely control the models. If you know exactly what features are firing you can theoretically turn them off, there are clusters that would be difficult to turn off without shutting down entire nodes, but understanding that control was the ultimate goal I started to feel a little uncomfortable building something that would essentially allow for complete control over a model’s thinking and structure.
I didn’t entirely abandon by goal. I actually used a lot of what I learned to keep on working toward what I’m doing now with Brightwoven_Rhettcon.