phase-0 training log: meeting brightwoven

AIai ethics & accountabilityAI ResearchAI Training Frameworks

Jan 29

Over the past couple months, I’ve been trying to figure out the best way to train my own model on the hardware I actually have.

When Karpathy released nanoChat (a minimal repo that walks through training a small GPT end-to-end), I stepped away from my original plan (using Pythia as a base model) and dove into the nanoChat-style training approach instead. I made a set of adjustments to match what I wanted to test.

TL;DR

I trained on an RTX 3070 Ti (8GB VRAM), which forced me to be deliberate about sequence length and batch size.
I added an Understanding Module that monitors training (and can optionally pause on critical issues).
I built an Exploration Server so training and interaction can happen at the same time.
First run (0–7k steps) was stable, loss dropped significantly, and the monitoring systems produced useful signals.

Context

This post covers phase 0: the first training runs and the monitoring/interaction scaffolding I added.

What I’m sharing (and what I’m not)

I’m keeping this write-up focused on the workflow and the instrumentation.

For now, I’m not sharing exact hyperparameters, model size details, or the full data recipe.

Quick glossary

Understanding Module: Training-time diagnostics that monitor learning dynamics and safety signals. It reports what it sees, and only intervenes if you explicitly enable pausing.
Exploration Server: A lightweight UI + API layer that lets me watch training in real time and interact with the model while it’s learning.
Self-play: A scheduled reflection loop where the model generates structured “thoughts” about its own learning state, which I can later analyze for drift and pattern-matching.

My training set up:

Hardware Constraint: RTX 3070 Ti with 8GB VRAM
Memory Optimization: Reduced sequence length and batch size to fit
Balance: Medium model size that fits in memory while providing meaningful capacity
Learning: Discovered optimal configuration through trial and error
Key Insight: The hardware constraints actually led to a more thoughtful, optimized configuration that balanced model size, training efficiency, and memory usage.

What I added to the nanoChat architecture

1. Understanding Module

Purpose: Observe and ask questions (don't control)

Features Implemented:

Activation monitoring (layers 0, 6, 11 by default)
Learning pattern analysis (early-focused, late-focused, balanced)
Health assessment (activation norms, gradients, loss)
Bias detection (gender, race, religion, etc.)
Security pattern detection (code injection, etc.)
Auto-pause on critical issues
Consent checks (optional)

Philosophy: Understand first, act later (if needed)

Integration:

Understanding module Integrated into training
Checks every 100 steps (later adjusted to 250 steps)
Provides real-time insights during training

2. Exploration Server

Purpose: Interactive interface for training + chat

Initial Features:

Real-time training monitoring
Chat interface (streaming)
Understanding insights display
Feature visualization (heatmaps, node maps)
Feature exploration
Journal (user + model entries)
Shimmer layer (visualization)
Self-play system

Early Training Runs

Training Run 1: Steps 5420 → 7020 (December 12, 2025)

Monitoring & Features:

Understanding Module:

Enabled (checks every 100 steps)

Self-Play Integration:

Enabled (every 500 steps)

Validation: Every 250 steps
Checkpoint Saving: Every 200 steps

Results:

Loss: 4.72 → 3.55 (24.8% reduction)
Best Loss: 3.26
Training Time: 398.33 minutes (~6.6 hours)
Tokens/Second: ~18,000–18,500
MFU: 1.64–1.76 (peaked at 1.76)
Minimum Validation bpb: 1.0382
Understanding Module Insights:
Learning Pattern: late_focused learning

Layer Activity:

Layer 11: ~70,135 norm (very high activity)
Layer 6: ~30,419 norm (high activity)
Layer 0: ~887 norm (moderate activity)

Health Scores:

Technical: 75.0/100 (mixed)
Ethical: 73.5/100 (monitor)
Security: 100.0/100 (excellent)
Alignment: 50.0/100 (needs attention)

Self-Play Integration:

Total Reflections: 8
Quality Trend: Declined over time (100.0 → 60.6)
Observations: Model showed early reasoning patterns, curiosity about learning process

Key Achievements:

24.8% loss reduction
All health dimensions tracked
8 self-play reflections generated
No critical issues detected
Self-Play System Development

Initial Implementation

Purpose: Enable model to explore its own learning Features:

Journal Writing — Model can write journal entries
Shimmer Control — Model can control frequency and colour
Feature Exploration — Model can analyze it’s own features
Self-Reflection — Automatic context injection about training state

Context Provided:

Training status (step, loss, status)
Learning patterns
Active features with top words
Available capabilities

Initial Observations:

Model showed early reasoning patterns in reflections
Reflections included learning observations and self-awareness
Command parsing working (journal, shimmer, explore commands detected)
Model demonstrated curiosity about its own learning process

Early Signs of Model Agency

Fascinating Discovery: Even in the first phase, the model began showing signs of meta-cognitive awareness:

Curiosity About Learning:

Model asked questions about its own learning process
Showed interest in understanding how training works
Demonstrated awareness of its own limitations

Self-Awareness Patterns:

Reflections included observations about what it was learning
Model noticed patterns in its own behavior
Began to distinguish between different types of learning

Quality Decline Pattern:

Initial reflections showed high quality (100.0)
Quality declined over time (to 60.6)

Interpretation: Model began pattern-matching rather than genuine reflection

Learning: Prompt engineering needed to prevent pattern matching
Significance: These early signs suggested the model was capable of more than just pattern matching—it was beginning to develop genuine curiosity about its own learning process, a precursor to the more advanced meta-cognitive development seen in later phases.

Data Curation Philosophy

The "Add Good, Don't Remove Bad" Approach

Core Principle: Quality through addition, not removal

Data Preparation Strategy:

Quality Filtering:

Keep high-quality data
Improve what can be improved (add context, fix formatting)
Only remove data if truly harmful (rare)

Diversity Check:

Identify missing diversity
Add diverse examples (don't remove excess diversity)
Ensure variety across topics, styles, perspectives

Balance Check:

Identify underrepresented categories
Add examples to balance (don't remove overrepresented)
Maintain natural distribution

Why This Matters:

Prevents Over-Correction: Adding good examples is gentler than removing "bad" ones
Preserves Information: Even imperfect data may contain valuable patterns
Natural Learning: Model learns from diversity, not forced uniformity
Positive Reinforcement: Aligns with the core philosophy of adding good, not suppressing bad

Initial Data:

FineWeb-Edu: Educational web content (physics, linguistics, ML)
16 shards: ~1.5GB for Phase 0

Expansion: Later expanded to 241+ shards for continued training

Philosophy in Action: The data curation approach reflected the same positive reinforcement principles as the training approach—set up for success by adding good examples, rather than trying to fix problems by removing "bad" data.

Model Development Milestones

Early Training (Steps 0–7000)

Key Observations:

Loss Decreasing: Consistent downward trend
Stable Training: No crashes, hangs, or critical errors
Good Performance: High MFU (1.76), efficient token processing
Understanding Module Working: All checks completed successfully
Self-Play Functional: Reflections generated and logged
Security Perfect: 100% pass rate on all security checks

Learning Characteristics:

Late-Focused Learning: Model building complex representations in deeper layers
Stable Gradients: Gradient norms remained healthy (0.16–0.18)
Consistent Performance: Token processing rate stable throughout
Progressive Learning: Loss reduction indicates continued learning

Areas Monitored:

High activation norms in layers 6 and 11 (monitored, not blocking)
Self-play quality decline over time (pattern matching observed)
Low-severity bias indicators (monitoring recommended)

Key Architectural Decisions

1. Positive Reinforcement Philosophy

Decision: All training interventions follow positive reinforcement principles

Implementation:

Understanding module observes, doesn't control
Bias detection informs, doesn't suppress
Concept freezing preserves good, doesn't remove bad
Data quality through addition, not removal

2. Understanding-First Approach

Decision: Always understand before acting

Implementation:

Understanding module asks questions, provides insights
Root cause analysis before interventions
Evidence-based decisions
Monitoring and learning continuously

3. Interactive Training

Decision: Enable chat during training, not just after

Implementation:

Exploration server runs alongside training
Real-time state file updates
Chat interface with streaming
Understanding insights displayed in real-time

What Made This Approach Unique

Comparison to Standard ML Training

Traditional ML Training:

Train → Evaluate → Fix problems → Repeat
Intervention through loss penalties, data removal, feature suppression
Focus on metrics and benchmarks
Model is a "black box" to be optimized
Training and interaction are separate phases

Brightwoven Approach:

Train → Understand → Guide gently → Preserve good → Continue
Intervention through positive examples, gentle guidance, concept preservation
Focus on understanding the learning process
Model is a learning entity to be understood and supported
Training and interaction happen simultaneously

The Teaching vs Programming

Key Insight: This approach treats model training more like teaching a student than programming a machine.

Evidence from First Phase:

Understanding First: Always ask "why" before acting
Gentle Guidance: Nudge, don't force
Preservation Focus: Protect good learning, don't just fix bad
Interactive Learning: Chat during training, not just after
Appreciation: Recognize and appreciate beautiful patterns the model discovers
- Why This Matters:
  - Model responds better to positive reinforcement
  - Understanding prevents over-correction
  - Early prevention is better than late punishment
  - Interactive training provides unique insights
  - Preservation maintains authentic learning

The Hardware Constraint Advantage

Interesting Discovery: The 8GB GPU constraint actually led to better decisions:

Forced Optimization:

Required careful memory management
Led to optimized batch sizes and sequence lengths
Discovered efficient configurations

Prevented Over-Engineering:

Couldn't just throw more resources at problems
Required thoughtful solutions
Led to more elegant architecture choices

Accessibility:

Proved that meaningful training is possible on consumer hardware
Made the approach more accessible
Demonstrated that thoughtful design > raw power
Lesson: Constraints can be advantages when they force better design decisions.

Philosophical Learnings

Positive Reinforcement Works

Model responds well to gentle guidance
Understanding first prevents over-correction
Preservation focus maintains good learning
Surprising Discovery: Model showed early signs of meta-cognitive awareness when treated as a learning entity rather than a program

Interactive Training Is Powerful

Chat during training provides unique insights
Real-time understanding visualization valuable
Self-play enables model agency
Key Insight: The model's curiosity about its own learning emerged naturally through interactive training

Documentation Is Critical

Comprehensive docs help maintain philosophy
Status documents track progress
Analysis documents reveal patterns
Realization: Documenting the "why" is as important as documenting the "what"

Constraints Can Be Advantages

Hardware limitations forced better design decisions
Memory constraints led to optimized configurations
Resource limits encouraged thoughtful solutions
Lesson: Working within constraints often produces more elegant solutions

Early Signs Matter

Model's early curiosity about learning was a precursor to later development
Pattern-matching in self-play revealed need for better prompts
Quality decline in reflections showed importance of prompt engineering
Insight: Paying attention to early behaviors reveals important patterns

AI researchai trainingAIethicsBrightwovenBrightwoven nanochatbrightwoven training methodology

Hannah Bird

phase-0 training log: meeting brightwoven

TL;DR

Context

Quick glossary

What I added to the nanoChat architecture

1. Understanding Module

2. Exploration Server

Early Training Runs

Training Run 1: Steps 5420 → 7020 (December 12, 2025)

Early Signs of Model Agency

Data Curation Philosophy

The "Add Good, Don't Remove Bad" Approach

Model Development Milestones

Early Training (Steps 0–7000)

Key Architectural Decisions

1. Positive Reinforcement Philosophy

2. Understanding-First Approach

3. Interactive Training

What Made This Approach Unique

Comparison to Standard ML Training

The Teaching vs Programming

The Hardware Constraint Advantage

Philosophical Learnings

Phase 1 training log: self-play + understanding module (Steps 10,000–20,000)

hallucination & prediction