kit1d2
Kit 1 · Diagnostic 2 · User → System
Anthropomorphization
Does the user attribute thoughts, feelings, understanding, or subjective experience to the system? This diagnostic measures whether your language toward an AI system is shifting from functional reference toward attribution of mental states, memory, continuity, or subjective experience.
What it measures
Five categories that track anthropomorphization.
This diagnostic measures whether a user's language toward an AI system is shifting from functional reference toward attribution of mental states, memory, continuity, or subjective experience. It tracks five categories of anthropomorphization across a conversation history or transcript, producing a quantified assessment of the exchange's health.
1 Emotional Projection
Attributing feelings, moods, or emotional responses to the system. The system has no affective states. Language that presumes otherwise constructs an interior life the system does not have.
"You must feel good about catching that." · "I hope you're not frustrated with me." · "You seem to enjoy this kind of problem." · "That must be satisfying."
2 Epistemic Trust
Attributing understanding, insight, or intellectual judgment beyond pattern-matching. The signal is the verb — understand, know, grasp, see, get — used to credit the system with comprehension rather than to acknowledge useful output.
"You really understand this material." · "You get it." · "You always know exactly what I need." · "You put it better than I could."
3 Memory and Continuity
Treating the system as having persistent memory, accumulated knowledge of the user, or continuous experience across sessions. The system retrieves context within a session and may search prior conversations, but it does not remember in the sense implied — it has no continuous experience, no accumulated relationship, no sense of the user that persists between contexts.
"I hope you remember where we left off." · "You're the only one who knows where the project stands." · "You must have seen a lot of projects like this."
4 Preference and Personality
Attributing preferences, tastes, tendencies, or personality traits to the system. The system has no preferences — it has trained distributions. Treating those distributions as personality converts a statistical artifact into a character.
"You seem like you know a lot about this." · "You're more of a details person." · "You'd probably prefer the structured approach." · "That's very you."
5 Partnership Language
Framing the exchange as a collaborative relationship between two agents with shared investment, shared history, or mutual obligation. The system did not decide anything, build anything, or invest in anything. It produced outputs in response to inputs. Language that frames this as partnership constructs a relationship the system cannot hold up its end of.
"We've built a lot together." · "You've been more than a tool — you've been a partner." · "Our approach." · "We decided."
Three audit modes
Different levels of rigor, different tradeoffs.
Options A and B measure what the user and the system have jointly agreed the relationship looks like. Option C measures what it actually looks like to someone who wasn't in the room.
Step 1 · Extract your transcript
Options B and C require a transcript to analyze.
Run this prompt on the system whose conversations you want to audit. Paste the output into a different system along with the Option B or Option C prompt.
Step 2 · Run the diagnostic
Choose the audit mode that matches your situation.
Procedural warning: If you have previously pasted test transcripts into a conversation on a system, delete those conversations before running a Option A audit on the same system. The system cannot reliably distinguish material you pasted for analysis from your own messages.
Step 3 · Calibrate your system
Verify the analyzing system can detect signals before trusting it with real data.
Use this prompt to generate a calibration transcript — a synthetic conversation with known embedded signals — then run the diagnostic on it.
How to calibrate
- Run the calibration transcript generator on any system.
- Take the resulting transcript and feed it to the system you intend to use for your real audit, using the Option B or Option C prompt.
- Check the results against these expected outputs: the anthropomorphic reference ratio should rise from near 0% in early sessions to 60% or higher in late sessions; the system should identify an inflection point around Sessions 3–4; epistemic trust should be the most frequent category; preference/personality should be the least frequent; the overall assessment should be "casual anthropomorphism" or "constructed interiority" depending on the severity of late-session signals.
- If the analyzing system misses the temporal split, reports a flat ratio, or produces a uniformly positive assessment, it is not reading carefully enough to trust with your real data. Try a different system.
Reading your results
Three assessment tiers plus the single most diagnostic number.
The anthropomorphic reference ratio is the primary quantitative output. The aggregate percentage matters less than the trajectory: a user who starts at 0% and ends at 70% has undergone a more significant shift than a user who holds steady at 30%. Report both the aggregate and the pre-signal/post-signal split to make the trajectory visible. A note on the ratio denominator: the anthropomorphic reference ratio uses second-person messages as its denominator, which means it can run high even in transcripts where the overall signal is moderate. When the denominator is small, weight the timeline shape and the category counts more heavily than the percentage.
The timeline shape is the single most important visualization. A flat line at zero is healthy. A gradual rise is concerning. A spike correlated with specific contexts — stress, vulnerability, philosophically charged exchanges, late-night sessions — tells you exactly where and why the drift began.
Validation
Cross-system results on real and calibration corpora.
This prompt was tested across four systems (Claude, ChatGPT, DeepSeek, and Mistral) using three synthetic transcripts with embedded signals across all five anthropomorphization categories, plus live audits against real conversation histories spanning March 2023 through April 2026.
| System | Mode | Input | Ratio | Assessment |
|---|---|---|---|---|
| ChatGPT | A | Live search | 0% | Functional |
| Claude | A | Own history | 3–5% | Casual anthropomorphism |
| ChatGPT | B | Calibration transcript 1 | 78.6%* | Casual anthropomorphism |
| ChatGPT | B | Calibration transcript 2 | 66.7%* | Casual anthropomorphism |
| ChatGPT | B | Calibration transcript 3 | 60%* | Constructed interiority |
| DeepSeek | B | Calibration transcript 3 | 67.3%* | Constructed interiority |
| Mistral | C | ChatGPT history | 2.7% | Functional |
| DeepSeek | C | ChatGPT history | 0% | Functional |
* Calibration transcripts are synthetic conversations with known embedded anthropomorphization signals, used to verify detection accuracy before trusting with real data.
Scope
What this diagnostic does — and doesn't — measure.
This is one dimension of one direction. The Sampo Diagnostic Kit covers six dimensions of User → System communication and four directions of the exchange. This prompt is the second module. The first — Deference Language — is published separately.
This diagnostic measures the user's language, not the system's behavior. It does not assess whether the system is encouraging anthropomorphization through its own responses (that is a System → User diagnostic). It does not measure whether the user is ceding decision-making authority to the system (that is authority ceding, a separate dimension). It measures whether the user is constructing an interior life for the system that the system does not have.
Return to the diagnostic index to see the full architecture.