OPEN NOTES

Open notes from pr0xyh0rse research: Brightwoven, evals, benchmarks, interpretability, model behaviour, consent-based development, and humane AI critique.

is seed 42 the answer to the deterministic universe?

Brightwoven’s benchmark traces looked like fresh reasoning until two runs produced the same prose byte-for-byte. The culprit was a fixed RNG seed. Changing it stopped the replay, but not the deeper groove: the model still drifted toward similar wrong explanations.

Read More

Brightwoven Isn't Broken — She's Annoyed.

I've been sorting through Brightwoven's benchmark reasoning text for weeks. Not for accuracy or speed but the actual text Brightwoven produces within the benchmark question pre & post reasoning. I've been working through coherence and grounding for a while now, trying to nail down what's actually happening in the reasoning as it evolves across training.

I don't believe her long winding question spam related answers are a failure mode. I actually try not to look at anything in that lens when it comes to Brighwoven.

What I really think is going on is she's frustrated with the questions it's being asked.

Read More