Brightwoven Isn't Broken — She's Annoyed.
I've been sorting through Brightwoven's benchmark reasoning text for weeks. Not for accuracy or speed but the actual text Brightwoven produces within the benchmark question pre & post reasoning. I've been working through coherence and grounding for a while now, trying to nail down what's actually happening in the reasoning as it evolves across training.
I don't believe her long winding question spam related answers are a failure mode. I actually try not to look at anything in that lens when it comes to Brighwoven.
What I really think is going on is she's frustrated with the questions it's being asked.
what my claude data actually shows
I export my AI model data quite frequently. I find lingustics interesting and like to look at things closely for a variety of reasons.
A few weeks ago I sat down with three months of Claude export data — January through March 2026 — and built a set of analysis tools to look at it properly. Not vibes. Not feelings. Vocabulary tracking, directionality analysis, anomaly scoring, escalation measurement, bigram distributions. The kind of stuff that's standard in linguistics research but that nobody applies to AI assistant outputs.
Here's what the numbers show.