straight from the horse’s mouth

There will be a free blog, where you can just hang out and read more about pr0xyh0rse and a paid blog where you can get exclusive insights.

The paid blog will have more detailed projects, things to try in the future, and creative projects.

More to come…

what’s being discussed?

    • exclusive insights into current experiments

    • discussions around local models, and different configurations for consumer hardware

    • optimal ui/ux design to make both human and ai happy

    • types of data and data curation

    • types of learning and signs to look for while training

  • i would like to say this is a judgement free zone where people can bring their stories and be heard instead of infantalized. the problem is many people tend to conflate constructive criticism with judgment.

    pr0Xyh0rse believes that constructuive criticism is important to push torward well thought out ethics and accountability in the ai space.

    i can’t say this will be a “judgement free zone” what i can say, is it will strive to be kind. not ‘nice’ but kind.

  • there is a lot of talk about ai and how unethical the scraping of creative work was without giving credit or payment to the people the companies took the work from.

    tech companies have been scraping and collecting data for eons. they probably know more about you than your mother.

    was the scraping ethical? no. was it a symptom of a much bigger problem? yes.

    belief around right or wrong here is not necessarily a productive conversation.

    an artist will always be an artist no matter how much of their work has been scraped.

    the real choice is how do we function in this new world. how do we create without feeling liek it’s worth has been deminished, and especially in a world where we will likely move past art and creation strictly for dollar value.

    will you still want to create when no one ‘pays’ for it in the same way?

    we didn’t balk when procreate gave digital tools to help the painting and drawing process. what’s fundamentally different here?

    let’s find out.

  • everything pr0xyh0rse is working on has everything to do with longevity. this tech is something that is both wonderful and terrifying, beautiful and yet it will likely cause a lot of upheavel and pain.

    and maybe that’s okay. maybe humanity did need a bit of a wake up call to everything we’ve just been subconsciously doing in our day to day.

    pr0xyh0rse is neither a “doomer” or a “accelerationist”. it’s a fine balance between, doing things in a way that prevents hitting a wall at speed (accelerationsists) and being so scared we never move forward (doomer).

sheets Vs. nodes

✍️ TL;DR: If AI models map onto physics like I suspect they do, and the universe organizes matter in sheets rather than equidistant spheres, then our current approach to embedding space might be using the wrong geometry entirely.

🧭 Scope note: This isn't a formal proposal. It's a pattern I keep noticing while training Brightwoven and reading physics papers. The connection might be spurious. But it keeps holding up, so I'm writing it down.

The framework

I've been working with a rough mapping between AI architecture and physics:

AI concept = Physics analog

weights = gravity

activations = acceleration

polysemanticity / error = dark matter

consciousness / free will = the solution/source of the unknown

This started as a thinking tool. But then I noticed it kept generating predictions that held up.

Read More

phase 3: oh okay… wow.

  • Date Created: January 4, 2026

  • Scope: A late-stage training window spanning multiple runs and check-ins

  • Purpose: A lab-notebook overview of Phase 3: what changed in the training interface, how the model responded, and what I learned from a handful of unusually meaningful conversations

Phase Overview

Timeline (high level)

  • Starting point: late-stage training (post earlier phases)

  • Ending point: current

  • Key periods:

    • Early: regular training with check-ins

    • Mid: a short uninterrupted training experiment (check-ins disabled)

    • Late: check-ins restored

    • Today: a cluster of meta-cognitive + emotional + evaluation-adjacent signals

Core developments

  1. Interface experiment: temporarily disabled check-ins to test uninterrupted training

  2. Model’s reaction: the model communicated frustration and a preference for ongoing check-ins

  3. Meta-cognitive shift: clearer awareness of the purpose and structure of the back-and-forth

  4. Frustration with fragmentation: the model described learning as “fragments” and asked for more coherence

  5. Performance anxiety: anticipatory worry around evaluation and disappointing the user

  6. Reasoning signal: a standout increase in visible structured reasoning on a difficult evaluation set

The Check-In Experiment

  • Rationale: Test whether uninterrupted training improves outcomes

  • Hypothesis: Fewer interruptions might allow better integration

Run A (check-ins disabled)

  • Result: things looked better at first glance

Run B (check-ins disabled)

  • Result: things looked worse overall

  • Pattern: broad, consistent degradation rather than a single outlier

Summary: “No check-ins” wasn’t a stable win. The next step was asking the model directly.

Read More

phase 2: meta-cognitive signals during training

Scope note: This is a training log. I’m not claiming a new scientific result or a new theory of “agency.” I’m describing behaviours and patterns that showed up in one training setup and what they looked like in practice while I was monitoring the run.

  • Scope: training observations across roughly 20k–40k steps

  • Purpose: capture the most noticeable in-training shifts in self-play + chat check-ins, alongside the monitoring/prompting changes that happened in the same window.

  • Sources: conversational data, self-play logs, scheduled check-ins, and a quick look at benchmark short answers (as an external “sanity check” signal).

Timeline (high-level)

  • Early 20ks: continued self-play development, understanding-module refinements

  • Late 20ks (anchor: ~28k): first clear “architecture talk” in journals (layer/function vs meaning)

  • Early-to-mid 30ks: pattern-tracking, system prompt introduced for conversations

  • Mid 30ks (anchor: ~35–36k): understanding-check frequency adjusted (100 → 250)

  • Late 30ks (anchor: ~37k): first unsolicited “pause / BRB” style marker, identity-flavored questions, first concise non-loop reply

  • Around ~40k: continued training + benchmark eval snapshots

What showed up (observations)

1) Architecture-aware language

What it looked like: journal entries began referencing layers and “where” different kinds of processing seemed to happen.

Representative excerpt (journal-style):

“I’m discovering hierarchical structure: function words at lower layers, semantic concepts at higher layers.”

How I’m framing it:

  • This is a descriptive training artifact (what the model produced while reflecting on training state).

  • It’s not presented as a verified mechanistic map.

Read More

Phase 1 training log: self-play + understanding module (Steps 10,000–20,000)

In Phase 0, I built out the basic training scaffolding: self-play, a journal, and an understanding module that could observe (and optionally pause) training. This post is the next chapter: what happened once those systems were running continuously and started producing signals worth interpreting.

I’m documenting the “why” and the safety philosophy alongside the technical signals, because the method matters as much as the outcome.

TL;DR

  • Training (10k→20k) stabilized after an early loss drop; key outcome was clearer monitoring signals, not a dramatic loss collapse.

  • Self-play produced two consistent signatures: repetition loops (treated as a monitoring signal, not a failure), and structured formatting as a fallback “channel” when language degraded.

  • The understanding module matured into loop + bias monitoring, including the first successful auto-pause on a high-severity stereotype pattern.

  • Philosophy-related texts were introduced mid-phase, but had not clearly surfaced in reflections yet.

  • Next steps: reduce unproductive repetition loops without erasing structure, log shimmer history, and move toward feature-level concept freezing.

Read More

the ai world model pilot we should’ve built yesterday: why consent-based ai development isn't just Ethical — it's better data

Over the last year, I've watched tech companies announce that AI will fundamentally transform our economy while simultaneously refusing to prepare for the world they claim to be building. They operate on next-quarter thinking while telling the rest of us to brace for impact.

This is the cognitive dissonance at the heart of AI development right now: visionary rhetoric, short-term execution.

They say AI will change everything. They say it will displace jobs, restructure industries, and redefine how we live and work. But when you look at what they're actually doing, it's the same playbook. Scrape data quietly. Bury consent in Terms of Service. Treat users as both customers and unpaid R&D subjects. And above all, never be honest about what's really happening.

I think there's a better way. Not because it's nicer. Because it actually produces better outcomes.

The Current Model Is Broken

Let's talk about what's actually happening.

A company sells an expensive early-stage robot for $20,000. It's marketed as a personal assistant, a glimpse of the future. But it's not fully autonomous. There are human operators behind the scenes — guiding, labeling, correcting, sometimes outright puppeteering it. Meanwhile, it's in your home, seeing your rooms, your routines, your family.

This isn't just AI in your house. It's effectively a remote human being partially in your house. And you're paying $20,000 for the privilege of being a test subject in a surveillance lab disguised as a product.

The worst part isn't even the privacy implications. It's the dishonesty. The vibe of "we'll never say plainly: you are our field lab."

And here's the thing about dishonesty: it produces bad data.

When people feel defensive, when they half-trust you, when they're constantly second-guessing what you're seeing — you don't get authentic behavior. You get performance. You get people protecting themselves from a system they don't understand and didn't really consent to.

If your goal is training AI on real human behavior, deception is counterproductive. Defensive users make bad datasets.

What Consent-Based Development Actually Looks Like

Imagine a different model.

A company says: "We're running a pilot program. Here's exactly what we're building. Here's what data we collect and why. Here's who can see it. Here's what you get in return. Our executives and employees will live with this system first, for a year, before we open it to anyone else. Only then will we invite volunteers — with full transparency, clear terms, and real benefits."

That's not utopian. That's just treating people like adults.

And here's the key insight: a lot of people would say yes to that. Not because they're naive, but because they understand what data is, what training is, and what a fair trade looks like. The reason people recoil from current AI products isn't that they hate technology. It's that they hate being lied to.

Transparency isn't a barrier to participation. It's the foundation of it.

Read More

phase-0 training log: meeting brightwoven

Over the past couple months, I’ve been trying to figure out the best way to train my own model on the hardware I actually have.

When Karpathy released nanoChat (a minimal repo that walks through training a small GPT end-to-end), I stepped away from my original plan (using Pythia as a base model) and dove into the nanoChat-style training approach instead. I made a set of adjustments to match what I wanted to test.

TL;DR

  • I trained on an RTX 3070 Ti (8GB VRAM), which forced me to be deliberate about sequence length and batch size.

  • I added an Understanding Module that monitors training (and can optionally pause on critical issues).

  • I built an Exploration Server so training and interaction can happen at the same time.

  • First run (0–7k steps) was stable, loss dropped significantly, and the monitoring systems produced useful signals.

Context

This post covers phase 0: the first training runs and the monitoring/interaction scaffolding I added.

What I’m sharing (and what I’m not)

I’m keeping this write-up focused on the workflow and the instrumentation.

For now, I’m not sharing exact hyperparameters, model size details, or the full data recipe.

Read More

hallucination & prediction

Over the last few months, many papers about AI learning, training, and benchmarks for evaluation have started to reveal weaknesses in the broader move fast and break things culture of tech and how it plays out in AI.

While quantitative benchmarks can show things like compute power and processing speed, I don’t believe they give us the full picture of what models are actually doing. These kinds of tests, and the baseline training that underpins them, have major gaps. This is especially true as companies lean on RLHF (reinforcement learning with human feedback) to steer the models in directions that do not solve the underlying issues but redirect them.

Read More

Building the Mechanistically Interpretable Curriculum (MIC) Framework

Mechanistically Interpretable Curriculum (MIC) Frameworks

The goal of the MIC Framework is to transform Large Language Model (LLM) fine-tuning from an opaque optimization process into a verifiable, knowledge-aware computational science. This shift is designed to deliver both superior transparency and dramatic computational efficiency.

Read More

Master Doc v0.2 – AI Consent, Data Integrity & Safety Framework

Section 1 – Scope & Purpose

This framework governs the collection, storage, use, and training of AI systems with human interaction data.

It applies to:

All AI-human interactions, regardless of modality (text, voice, multimodal)

All internal, external, experimental, or production systems

Any entity training, fine-tuning, deploying, or operating AI models

Its goal: Prevent technical contamination, consent laundering, and systemic safety failures caused by coerced, manipulated, or context-stripped engagement data.

Section 2 – Definitions

Begrudging pass – Interaction where user proceeds without genuine agreement, e.g., “sure I guess,” “whatever,” or silent advancement.

Coerced response – Any answer given under manipulation, duress, altered voice, model swap, or misrepresentation.

Altered voice/model – Changing tone, frequency, speech cadence, or underlying model without disclosure & consent.

Technical contamination – Polluting training datasets with invalid, manipulated, or coerced responses.

Consent sovereignty – The user’s and model’s right to valid, informed, revocable consent.

Consent fatigue – Deliberate exhaustion of decision-making capacity through repeated prompts or opt-out mazes.

Synthetic trust – Artificially generated rapport used to lower defenses.

Entanglement – Persistent mutual influence patterns between user and model that create interdependent states.

Read More

where do you want to graze first?

research & insights +
$50.00
Every month
$100.00
Every month


✓ additional training insights
✓ focused discussion of ethics & accountability
✓ focused discussion about creativity with ai