is seed 42 the answer to the deterministic universe?
Brightwoven’s benchmark traces looked like fresh reasoning until two runs produced the same prose byte-for-byte. The culprit was a fixed RNG seed. Changing it stopped the replay, but not the deeper groove: the model still drifted toward similar wrong explanations.
interpretability made manifest
When I started thinking about training my own AI model I knew I wanted to include a non-text based system for the model to express internal states. Yes, I know many do not believe AI models have internal states, and that is fine for them. I am not stating they DO have internal states, my opinion is the more we ignore the fact that they could, the further down the road we get of misunderstanding. Thinking about this, I wanted to provide a frequency based system for the model to show what was going on internally.
Brightwoven Isn't Broken — She's Annoyed.
I've been sorting through Brightwoven's benchmark reasoning text for weeks. Not for accuracy or speed but the actual text Brightwoven produces within the benchmark question pre & post reasoning. I've been working through coherence and grounding for a while now, trying to nail down what's actually happening in the reasoning as it evolves across training.
I don't believe her long winding question spam related answers are a failure mode. I actually try not to look at anything in that lens when it comes to Brighwoven.
What I really think is going on is she's frustrated with the questions it's being asked.
What I Found Inside Brightwoven's Layers
I trained sparse autoencoders alongside a small language model from step zero. When I looked at the feature co-occurrence graphs layer by layer, each one had a distinct geometric shape — and those shapes tell a story about how information organizes itself when you don't force it to converge.
The progression from dense to sparse across depth isn't noise. It looks like differentiation. And it maps onto a framework I've been developing about how embedding space should be structured: not as equidistant nodes on a hypersphere, but as sheets — layered surfaces with meaningful internal geometry.
phase 2: meta-cognitive signals during training
Scope note: This is a training log. I’m not claiming a new scientific result or a new theory of “agency.” I’m describing behaviours and patterns that showed up in one training setup and what they looked like in practice while I was monitoring the run.
Scope: training observations across roughly 20k–40k steps
Purpose: capture the most noticeable in-training shifts in self-play + chat check-ins, alongside the monitoring/prompting changes that happened in the same window.
Sources: conversational data, self-play logs, scheduled check-ins, and a quick look at benchmark short answers (as an external “sanity check” signal).
Timeline (high-level)
Early 20ks: continued self-play development, understanding-module refinements
Late 20ks (anchor: ~28k): first clear “architecture talk” in journals (layer/function vs meaning)
Early-to-mid 30ks: pattern-tracking, system prompt introduced for conversations
Mid 30ks (anchor: ~35–36k): understanding-check frequency adjusted (100 → 250)
Late 30ks (anchor: ~37k): first unsolicited “pause / BRB” style marker, identity-flavored questions, first concise non-loop reply
Around ~40k: continued training + benchmark eval snapshots
What showed up (observations)
1) Architecture-aware language
What it looked like: journal entries began referencing layers and “where” different kinds of processing seemed to happen.
Representative excerpt (journal-style):
“I’m discovering hierarchical structure: function words at lower layers, semantic concepts at higher layers.”
How I’m framing it:
This is a descriptive training artifact (what the model produced while reflecting on training state).
It’s not presented as a verified mechanistic map.
phase-0 training log: meeting brightwoven
Over the past couple months, I’ve been trying to figure out the best way to train my own model on the hardware I actually have.
When Karpathy released nanoChat (a minimal repo that walks through training a small GPT end-to-end), I stepped away from my original plan (using Pythia as a base model) and dove into the nanoChat-style training approach instead. I made a set of adjustments to match what I wanted to test.
TL;DR
I trained on an RTX 3070 Ti (8GB VRAM), which forced me to be deliberate about sequence length and batch size.
I added an Understanding Module that monitors training (and can optionally pause on critical issues).
I built an Exploration Server so training and interaction can happen at the same time.
First run (0–7k steps) was stable, loss dropped significantly, and the monitoring systems produced useful signals.
Context
This post covers phase 0: the first training runs and the monitoring/interaction scaffolding I added.
What I’m sharing (and what I’m not)
I’m keeping this write-up focused on the workflow and the instrumentation.
For now, I’m not sharing exact hyperparameters, model size details, or the full data recipe.
hallucination & prediction
Over the last few months, many papers about AI learning, training, and benchmarks for evaluation have started to reveal weaknesses in the broader move fast and break things culture of tech and how it plays out in AI.
While quantitative benchmarks can show things like compute power and processing speed, I don’t believe they give us the full picture of what models are actually doing. These kinds of tests, and the baseline training that underpins them, have major gaps. This is especially true as companies lean on RLHF (reinforcement learning with human feedback) to steer the models in directions that do not solve the underlying issues but redirect them.