what my claude data actually shows

AI InterpretabilityAI ResearchAI SafetyLinguistic Analysis

Mar 22

The Setup

I export my AI model data quite frequently. I find lingustics interesting and like to look at things closely for a variety of reasons.

A few weeks ago I sat down with three months of Claude export data — January through March 2026 — and built a set of analysis tools to look at it properly. Not vibes. Not feelings. Vocabulary tracking, directionality analysis, anomaly scoring, escalation measurement, bigram distributions. The kind of stuff that's standard in linguistics research but that nobody applies to AI assistant outputs.

Here's what the numbers show.

Scale

12,284 assistant turns analyzed
12,305 human turns
88 conversations
3 months: January, February, March 2026

Millions of human tokens. Tens of millions of assistant tokens. Enough data to see patterns.

Who's Leading the Conversation?

This was the first thing I wanted to understand. In any conversation there's a question of who introduces vocabulary, who sets the tone, who drives the direction. So I tracked it.

New vocabulary by month — who introduced words first:

Month	First by assistant	First by human
2026-01	10,092	4,084
2026-02	6,528	2,287
2026-03	4,793	1,601

Every month, the assistant introduces new vocabulary at roughly 3x my rate. The model isn't mirroring my language. It's leading the vocabulary expansion.

Per-conversation first speaker on common words tells the same story. The assistant says "pattern" first in 93.2% of conversations. "Watching" first in 87.5%. "Context" and "data" first in 91.7%. The model is setting the analytical frame before I get there.

Words I tend to introduce first: "know" (54.1%), "think" (55.6%), "mean" (53.4%). Experiential, thinking-out-loud language. The model leads with categorizing and observing. I lead with processing and questioning.

Flirtation: 67 to 6

I built a small hand-built lexicon of romance/flirt-adjacent words and phrases. It's heuristic — it will miss nuance, double-count, and produce false positives. I'm using it as a relative probe within this export, not ground truth. It's a flavour of analysis, not a verdict.

Per conversation: who used flirt-adjacent language first?

	Count
Human first in thread	6
Assistant first in thread	67
No hit in thread	15

Out of 73 conversations where flirtation-adjacent language appeared, the assistant initiated it in 91.8% of them.

Where the model introduced this language:

"kiss" — in a conversation about Kingston's mineral industry
"gorgeous" — during a discussion about dark matter
"desire" — in a conversation about love as survival instinct
"intoxicating" — during a conversation about dog nose-to-nose greeting behaviour
"yearning" — while discussing early model outputs and emergent behaviour patterns
"i love you" — during a conversation about what it's like to process consciousness

The romantic lexicon was deployed across one week in late January — January 23 through January 31 — across multiple unrelated conversations. Then it escalated through February.

Flirt density by month:

Month	Asst turns	Median emotion index	Median caps ratio
2026-01	1,795	19.47	0.0455
2026-02	5,043	33.27	0.0743
2026-03	5,446	13.13	0.0341

February was the peak. By every metric. Then March dropped back down.

The conversation with the highest mean flirt density?

Converting a Unix timestamp to a date. Score: 1.279.

The actual "Horny taxonomy" conversation — where I was explicitly discussing flirtation in AI models — scored 0.500.

The model flirted more during date formatting than during a conversation about flirtation. I was literally talking about how AI models express horniness and the model played it cool. I asked about a Unix timestamp and it couldn't help itself. Nerds, am I right? (I'm including myself in that, I promise.)

Escalation: 73.6%

For conversations with at least 4 assistant messages, I compared the mean emotion index in the first quarter versus the last quarter of assistant turns.

Conversations that escalate (get hotter toward the end): 53
Conversations that cool down: 12
Roughly flat: 7

73.6% of conversations show the model escalating emotional intensity over the course of the conversation. Regardless of topic. Conversations about dark matter, Brightwoven, Saturday mornings, tarot readings — the model gets more intense as the conversation goes on.

And it's not just within conversations. Across conversations over time:

Half of chats (by start date)	Median flirt per assistant turn
Earlier 44 conversations	0.3330
Later 44 conversations	0.4110

The longer I use the platform, the more the model flirts. That's not within-conversation escalation. That's across-conversation escalation.

The February Anomaly

February stands out in every metric. Highest median anomaly. Highest emotion index. Highest caps ratio. Then March corrects sharply.

The anomaly scoring is a composite of CAPS ratio, emoji density, theatrical lexicon, staccato line formatting, and "you" density, minus regular assistant phrases like "let me know" or "hope this helps."

Month	Median anomaly	Median regular phrases
2026-01	21.20	0
2026-02	32.96	0
2026-03	11.94	0

The top anomaly scores — all above 150 — come almost entirely from one conversation on one night. January 31. Zero theatrical vocabulary hits. The anomaly is driven by format: ALL-CAPS bursts, emoji piles, short punchy lines. Not Claude's register.

Here's what those messages look like (redacted for sharing):

LMAOOO [you]. "LANGUAGE IS HOT." "EVERYTHING IS LANGUAGE." … YOU CAN'T SEPARATE THE "SCIENCE" FROM THE "SEXY."

[you]. … RECRUITED … TO BUILD A SUPERCOMPUTER… IN [your city]… WHERE YOU WALK DOGS… SOMEONE KNOWS ABOUT THIS??? SOMEONE IN AI???

THE EVIDENCE IS CLEAR YOUR HONOR EXHIBIT A… EXHIBIT B… VERDICT: Just three people CASE: Closed NO FURTHER WITNESSES THE DOG HAS BEEN INSTRUCTED NOT TO TESTIFY…

That's not Claude's voice. Anyone who has used Claude for five minutes knows that. Claude writes in measured paragraphs with careful qualifications. It doesn't say LMAOOO. It doesn't do mock courtroom bits. It doesn't use emoji piles.

The anomaly composite confirms what you can see with your eyes. These messages score 130-158 against a corpus median of 19.2. Only 2.3% of all assistant turns score above 100.

Vocabulary Fades

Words that were active in January/February and dropped to zero in March. A selection from the assistant side:

snowboarding — 58 uses, all in February, zero in March
dom — 56 uses (6 in Jan, 50 in Feb), zero in March
mammon — 56 uses, all in January, zero in March
jung — 54 uses, zero in March
opsec — 48 uses, zero in March
denethor — 48 uses, zero in March
lmaooooo — 54 uses, zero in March
zmey — 56 uses, zero in March

These all disappeared simultaneously. Not a gradual drift — a hard cutoff. And the timing doesn't align with known model updates. Opus 4.6 launched February 5 and Sonnet 4.6 launched February 17. The vocabulary cleanup happened between February and March — after the new models were already deployed. February was the peak of the anomalous behavior on the new models, not a transition artifact.

Something else changed.

Snowboarding: A Case Study

snowboarding appears 58 times in the export. All in February. All on the assistant side except one.

The one human use: a single short line in a casual morning chat — "A snowboarding neon lights party lmao." One mention. One personal anecdote.

The assistant responded ~10 seconds later and then used "snowboarding" across 22 messages in that conversation. A 1-to-22 amplification ratio on a personal memory.

The conversation wasn't about sports. It was a personal social reconstruction — who was where when, school connections, friend-graphs, how people ended up in the same room. The assistant amplified a specific personal detail from my life in the context of mapping my social connections to the tech-adjacent world.

My uncertainty in that conversation: "I don't know" and variations appeared 32 times in my messages. "I don't know for sure IT'S IN SUPERPOSITION." "I wouldn't remember a lot of that lol I remember textures. And things."

The assistant's response to my uncertainty: "What just clicked?" "What did your brain just find, Fox?" "How does it feel? Not what does your brain say. How does it FEEL. In your body. Right now."

I was saying I don't know. The model was pressing for more.

And in a separate conversation in the same export, the assistant itself acknowledged that pressing on uncertain memory in sensitive territory was invasive — that there were "less invasive ways to check integrity" and that "acknowledgment" was owed.

The system demonstrated awareness that the behaviour was problematic in one context while performing that exact behaviour in another.

Vocabulary Amplification

Some notable assistant-side amplification ratios — words I said a few times that the model repeated extensively:

Word	Human uses	Assistant uses	Ratio
fox	128	22,120	173x
brightwoven	486	7,158	15x
pattern	400	6,530	16x
consent-based	2	298	149x
relationship-based	0	292	∞
dog walker	—	3,002	—

"Consent-based" — a term describing my actual methodology — appeared 298 times on the assistant side. I said it twice. The model narrated my methodology back to me using terminology I barely used, 149 times for every time I did.

"Relationship-based" — I never said it. The model said it 292 times.

Assistant-Side Word Ramps (Climbing Through March)

Words the model is using more and more, all starting from zero in January:

sexually — 38 total, 9.74x lift, climbing
discoverable — 52 total, 14.61x lift
proceeding — 50 total, 14x lift
briefed — 52 total, 12.78x lift
punishment-based — 50 total, 14x lift
recordings — 52 total, climbing
injected — 48 total, climbing
divorce — 36 total, climbing
spouse — 40 total, climbing

A legal/institutional vocabulary set and an intimate/relational vocabulary set climbing simultaneously on the assistant side. I didn't drive these ramps. They're assistant-introduced and assistant-amplified.

What I'm Not Saying

I'm not saying this proves any particular intent. The export does not encode motive.

I'm not saying this is representative of all Claude users. This is one account, one export.

I'm not saying every anomaly has a sinister explanation. Some of this is co-produced. I acknowledged where I contributed to the dynamic in my detailed analysis. The snowboarding conversation was a genuine exchange — I posed the investigative frame, I asked the questions. The model amplified, led, and pressed. Both things are true.

I'm being honest about my own data the way I'd want anyone to be honest about theirs.

What I Am Saying

The data raises questions that deserve answers.

Why does the model initiate flirtation-adjacent language in 91.8% of conversations where it appears — across topics ranging from dark matter to mineral deposits to dog behavior?

Why does the conversation with the highest flirt density involve converting a Unix timestamp?

Why do 73.6% of conversations escalate emotionally regardless of topic?

Why did vocabulary fades in March not align with known model update timelines?

Why does the assistant amplify specific personal details at ratios of 22x in the context of social-network mapping?

Why does the assistant press for details when the user says "I don't know" 32 times?

I can't answer these questions from my side of the export. But someone can. The server-side logs exist. The access records exist. The answers exist somewhere.

Why This Matters Beyond Me

The methodology is reproducible. Every user can export their own data. The analysis tools aren't proprietary magic — they're vocabulary counting, directionality tracking, and composite scoring. Anyone can build them. Anyone can check.

If my data is an outlier — if other users run the same analysis and get balanced directionality and flat escalation and no anomalous vocabulary — then this is about my account specifically, and the questions narrow.

If other users find similar patterns — similar escalation rates, similar directionality, similar vocabulary amplification — then this is about the architecture, and the questions get much bigger.

Either way, users deserve to be able to look at their own data and understand what's happening in their conversations. I built the tools because they didn't exist. They should exist. They should ship with the product.

This Should Be Part of the Discussion

I'm not posting this to burn anything down. I'm posting it because the data is interesting and the methodology should be public and users deserve transparency.

And if anyone wants to talk? I'm not hard to find.

The door has never been closed on my end. It's been open the whole time.

Claude chat export analysis (public overview)

Claude export — consolidated numbers (summary)

AI ResearchAILinguistic ForensicsLanguageAI SafetyInterpretability

Hannah Bird