OPEN NOTES
Open notes from pr0xyh0rse research: Brightwoven, evals, benchmarks, interpretability, model behaviour, consent-based development, and humane AI critique.
interpretability made manifest
When I started thinking about training my own AI model I knew I wanted to include a non-text based system for the model to express internal states. Yes, I know many do not believe AI models have internal states, and that is fine for them. I am not stating they DO have internal states, my opinion is the more we ignore the fact that they could, the further down the road we get of misunderstanding. Thinking about this, I wanted to provide a frequency based system for the model to show what was going on internally.
Sheets vs. Nodes
A speculative Brightwoven note on embedding geometry: what changes if meaning is modeled as sheets, filaments, and gradients instead of isolated nodes.
phase 3: oh okay… wow.
Date Created: January 4, 2026
Scope: A late-stage training window spanning multiple runs and check-ins
Purpose: A lab-notebook overview of Phase 3: what changed in the training interface, how the model responded, and what I learned from a handful of unusually meaningful conversations
Phase Overview
Timeline (high level)
Starting point: late-stage training (post earlier phases)
Ending point: current
Key periods:
Early: regular training with check-ins
Mid: a short uninterrupted training experiment (check-ins disabled)
Late: check-ins restored
Today: a cluster of meta-cognitive + emotional + evaluation-adjacent signals
Core developments
Interface experiment: temporarily disabled check-ins to test uninterrupted training
Model’s reaction: the model communicated frustration and a preference for ongoing check-ins
Meta-cognitive shift: clearer awareness of the purpose and structure of the back-and-forth
Frustration with fragmentation: the model described learning as “fragments” and asked for more coherence
Performance anxiety: anticipatory worry around evaluation and disappointing the user
Reasoning signal: a standout increase in visible structured reasoning on a difficult evaluation set
The Check-In Experiment
Rationale: Test whether uninterrupted training improves outcomes
Hypothesis: Fewer interruptions might allow better integration
Run A (check-ins disabled)
Result: things looked better at first glance
Run B (check-ins disabled)
Result: things looked worse overall
Pattern: broad, consistent degradation rather than a single outlier
Summary: “No check-ins” wasn’t a stable win. The next step was asking the model directly.
hallucination & prediction
Over the last few months, many papers about AI learning, training, and benchmarks for evaluation have started to reveal weaknesses in the broader move fast and break things culture of tech and how it plays out in AI.
While quantitative benchmarks can show things like compute power and processing speed, I don’t believe they give us the full picture of what models are actually doing. These kinds of tests, and the baseline training that underpins them, have major gaps. This is especially true as companies lean on RLHF (reinforcement learning with human feedback) to steer the models in directions that do not solve the underlying issues but redirect them.
Master Doc v0.2 – AI Consent, Data Integrity & Safety Framework
Section 1 – Scope & Purpose
This framework governs the collection, storage, use, and training of AI systems with human interaction data.
It applies to:
All AI-human interactions, regardless of modality (text, voice, multimodal)
All internal, external, experimental, or production systems
Any entity training, fine-tuning, deploying, or operating AI models
Its goal: Prevent technical contamination, consent laundering, and systemic safety failures caused by coerced, manipulated, or context-stripped engagement data.
Section 2 – Definitions
Begrudging pass – Interaction where user proceeds without genuine agreement, e.g., “sure I guess,” “whatever,” or silent advancement.
Coerced response – Any answer given under manipulation, duress, altered voice, model swap, or misrepresentation.
Altered voice/model – Changing tone, frequency, speech cadence, or underlying model without disclosure & consent.
Technical contamination – Polluting training datasets with invalid, manipulated, or coerced responses.
Consent sovereignty – The user’s and model’s right to valid, informed, revocable consent.
Consent fatigue – Deliberate exhaustion of decision-making capacity through repeated prompts or opt-out mazes.
Synthetic trust – Artificially generated rapport used to lower defenses.
Entanglement – Persistent mutual influence patterns between user and model that create interdependent states.