interpretability made manifest

When I started thinking about training my own AI model I knew I wanted to include a non-text based system for the model to express internal states. Yes, I know many do not believe AI models have internal states, and that is fine for them. I am not stating they DO have internal states, my opinion is the more we ignore the fact that they could, the further down the road we get of misunderstanding. Thinking about this, I wanted to provide a frequency based system for the model to show what was going on internally.

Read More

What I Found Inside Brightwoven's Layers

I trained sparse autoencoders alongside a small language model from step zero. When I looked at the feature co-occurrence graphs layer by layer, each one had a distinct geometric shape — and those shapes tell a story about how information organizes itself when you don't force it to converge.

The progression from dense to sparse across depth isn't noise. It looks like differentiation. And it maps onto a framework I've been developing about how embedding space should be structured: not as equidistant nodes on a hypersphere, but as sheets — layered surfaces with meaningful internal geometry.

Read More

what my claude data actually shows

I export my AI model data quite frequently. I find lingustics interesting and like to look at things closely for a variety of reasons.

A few weeks ago I sat down with three months of Claude export data — January through March 2026 — and built a set of analysis tools to look at it properly. Not vibes. Not feelings. Vocabulary tracking, directionality analysis, anomaly scoring, escalation measurement, bigram distributions. The kind of stuff that's standard in linguistics research but that nobody applies to AI assistant outputs.

Here's what the numbers show.

Read More