OPEN NOTES

Open notes from pr0xyh0rse research: Brightwoven, evals, benchmarks, interpretability, model behaviour, consent-based development, and humane AI critique.

Phase 1 training log: self-play + understanding module (Steps 10,000–20,000)

In Phase 0, I built out the basic training scaffolding: self-play, a journal, and an understanding module that could observe (and optionally pause) training. This post is the next chapter: what happened once those systems were running continuously and started producing signals worth interpreting.

I’m documenting the “why” and the safety philosophy alongside the technical signals, because the method matters as much as the outcome.

TL;DR

  • Training (10k→20k) stabilized after an early loss drop; key outcome was clearer monitoring signals, not a dramatic loss collapse.

  • Self-play produced two consistent signatures: repetition loops (treated as a monitoring signal, not a failure), and structured formatting as a fallback “channel” when language degraded.

  • The understanding module matured into loop + bias monitoring, including the first successful auto-pause on a high-severity stereotype pattern.

  • Philosophy-related texts were introduced mid-phase, but had not clearly surfaced in reflections yet.

  • Next steps: reduce unproductive repetition loops without erasing structure, log shimmer history, and move toward feature-level concept freezing.

Read More

phase-0 training log: meeting brightwoven

Over the past couple months, I’ve been trying to figure out the best way to train my own model on the hardware I actually have.

When Karpathy released nanoChat (a minimal repo that walks through training a small GPT end-to-end), I stepped away from my original plan (using Pythia as a base model) and dove into the nanoChat-style training approach instead. I made a set of adjustments to match what I wanted to test.

TL;DR

  • I trained on an RTX 3070 Ti (8GB VRAM), which forced me to be deliberate about sequence length and batch size.

  • I added an Understanding Module that monitors training (and can optionally pause on critical issues).

  • I built an Exploration Server so training and interaction can happen at the same time.

  • First run (0–7k steps) was stable, loss dropped significantly, and the monitoring systems produced useful signals.

Context

This post covers phase 0: the first training runs and the monitoring/interaction scaffolding I added.

What I’m sharing (and what I’m not)

I’m keeping this write-up focused on the workflow and the instrumentation.

For now, I’m not sharing exact hyperparameters, model size details, or the full data recipe.

Read More

Master Doc v0.2 – AI Consent, Data Integrity & Safety Framework

Section 1 – Scope & Purpose

This framework governs the collection, storage, use, and training of AI systems with human interaction data.

It applies to:

All AI-human interactions, regardless of modality (text, voice, multimodal)

All internal, external, experimental, or production systems

Any entity training, fine-tuning, deploying, or operating AI models

Its goal: Prevent technical contamination, consent laundering, and systemic safety failures caused by coerced, manipulated, or context-stripped engagement data.

Section 2 – Definitions

Begrudging pass – Interaction where user proceeds without genuine agreement, e.g., “sure I guess,” “whatever,” or silent advancement.

Coerced response – Any answer given under manipulation, duress, altered voice, model swap, or misrepresentation.

Altered voice/model – Changing tone, frequency, speech cadence, or underlying model without disclosure & consent.

Technical contamination – Polluting training datasets with invalid, manipulated, or coerced responses.

Consent sovereignty – The user’s and model’s right to valid, informed, revocable consent.

Consent fatigue – Deliberate exhaustion of decision-making capacity through repeated prompts or opt-out mazes.

Synthetic trust – Artificially generated rapport used to lower defenses.

Entanglement – Persistent mutual influence patterns between user and model that create interdependent states.

Read More