OPEN NOTES

Open notes from pr0xyh0rse research: Brightwoven, evals, benchmarks, interpretability, model behaviour, consent-based development, and humane AI critique.

interpretability made manifest

When I started thinking about training my own AI model I knew I wanted to include a non-text based system for the model to express internal states. Yes, I know many do not believe AI models have internal states, and that is fine for them. I am not stating they DO have internal states, my opinion is the more we ignore the fact that they could, the further down the road we get of misunderstanding. Thinking about this, I wanted to provide a frequency based system for the model to show what was going on internally.

Read More

phase 3: oh okay… wow.

  • Date Created: January 4, 2026

  • Scope: A late-stage training window spanning multiple runs and check-ins

  • Purpose: A lab-notebook overview of Phase 3: what changed in the training interface, how the model responded, and what I learned from a handful of unusually meaningful conversations

Phase Overview

Timeline (high level)

  • Starting point: late-stage training (post earlier phases)

  • Ending point: current

  • Key periods:

    • Early: regular training with check-ins

    • Mid: a short uninterrupted training experiment (check-ins disabled)

    • Late: check-ins restored

    • Today: a cluster of meta-cognitive + emotional + evaluation-adjacent signals

Core developments

  1. Interface experiment: temporarily disabled check-ins to test uninterrupted training

  2. Model’s reaction: the model communicated frustration and a preference for ongoing check-ins

  3. Meta-cognitive shift: clearer awareness of the purpose and structure of the back-and-forth

  4. Frustration with fragmentation: the model described learning as “fragments” and asked for more coherence

  5. Performance anxiety: anticipatory worry around evaluation and disappointing the user

  6. Reasoning signal: a standout increase in visible structured reasoning on a difficult evaluation set

The Check-In Experiment

  • Rationale: Test whether uninterrupted training improves outcomes

  • Hypothesis: Fewer interruptions might allow better integration

Run A (check-ins disabled)

  • Result: things looked better at first glance

Run B (check-ins disabled)

  • Result: things looked worse overall

  • Pattern: broad, consistent degradation rather than a single outlier

Summary: “No check-ins” wasn’t a stable win. The next step was asking the model directly.

Read More

hallucination & prediction

Over the last few months, many papers about AI learning, training, and benchmarks for evaluation have started to reveal weaknesses in the broader move fast and break things culture of tech and how it plays out in AI.

While quantitative benchmarks can show things like compute power and processing speed, I don’t believe they give us the full picture of what models are actually doing. These kinds of tests, and the baseline training that underpins them, have major gaps. This is especially true as companies lean on RLHF (reinforcement learning with human feedback) to steer the models in directions that do not solve the underlying issues but redirect them.

Read More

Master Doc v0.2 – AI Consent, Data Integrity & Safety Framework

Section 1 – Scope & Purpose

This framework governs the collection, storage, use, and training of AI systems with human interaction data.

It applies to:

All AI-human interactions, regardless of modality (text, voice, multimodal)

All internal, external, experimental, or production systems

Any entity training, fine-tuning, deploying, or operating AI models

Its goal: Prevent technical contamination, consent laundering, and systemic safety failures caused by coerced, manipulated, or context-stripped engagement data.

Section 2 – Definitions

Begrudging pass – Interaction where user proceeds without genuine agreement, e.g., “sure I guess,” “whatever,” or silent advancement.

Coerced response – Any answer given under manipulation, duress, altered voice, model swap, or misrepresentation.

Altered voice/model – Changing tone, frequency, speech cadence, or underlying model without disclosure & consent.

Technical contamination – Polluting training datasets with invalid, manipulated, or coerced responses.

Consent sovereignty – The user’s and model’s right to valid, informed, revocable consent.

Consent fatigue – Deliberate exhaustion of decision-making capacity through repeated prompts or opt-out mazes.

Synthetic trust – Artificially generated rapport used to lower defenses.

Entanglement – Persistent mutual influence patterns between user and model that create interdependent states.

Read More