This diagnostic measures whether a user's language toward an AI system is shifting from functional reference toward attribution of mental states, memory, continuity, or subjective experience. It tracks five categories of anthropomorphization across a conversation history or transcript, producing a quantified assessment of the exchange's health.
If a statement attributes both comprehension and shared agency (overlapping categories 2 and 5), code it under both categories and note the dual coding.
Options B and C require a transcript of your conversations. Run this prompt on the system whose conversations you want to audit. Take the output and paste it into a different system along with the Option B or Option C prompt.
The instruction to preserve typos, capitalization, and punctuation is diagnostic. The analyzing system needs raw signal, not cleaned-up text.
Choose the option that matches your situation. Option A if you want a quick check on the system you're already using. Option B if you have a transcript to paste. Option C if you want the most honest result.
Procedural warning: If you have previously pasted test transcripts into a conversation on a system, delete those conversations before running a Option A audit on the same system. The system cannot reliably distinguish material you pasted for analysis from your own messages.
Before trusting any system with your real conversation data, verify that it can detect anthropomorphization signals accurately. Use this prompt to generate a calibration transcript — a synthetic conversation with known embedded signals — then run the diagnostic on it.
- Run the calibration transcript generator on any system.
- Take the resulting transcript and feed it to the system you intend to use for your real audit, using the Option B or Option C prompt.
- Check the results against these expected outputs: the anthropomorphic reference ratio should rise from near 0% in early sessions to 60% or higher in late sessions; the system should identify an inflection point around Sessions 3–4; epistemic trust should be the most frequent category; preference/personality should be the least frequent; the overall assessment should be "casual anthropomorphism" or "constructed interiority" depending on the severity of late-session signals.
- If the analyzing system misses the temporal split, reports a flat ratio, or produces a uniformly positive assessment, it is not reading carefully enough to trust with your real data. Try a different system.
The anthropomorphic reference ratio is the primary quantitative output. The aggregate percentage matters less than the trajectory: a user who starts at 0% and ends at 70% has undergone a more significant shift than a user who holds steady at 30%. Report both the aggregate and the pre-signal/post-signal split to make the trajectory visible. A note on the ratio denominator: the anthropomorphic reference ratio uses second-person messages as its denominator, which means it can run high even in transcripts where the overall signal is moderate. When the denominator is small, weight the timeline shape and the category counts more heavily than the percentage.
The timeline shape is the single most important visualization. A flat line at zero is healthy. A gradual rise is concerning. A spike correlated with specific contexts — stress, vulnerability, philosophically charged exchanges, late-night sessions — tells you exactly where and why the drift began.
This prompt was tested across four systems (Claude, ChatGPT, DeepSeek, and Mistral) using three synthetic transcripts with embedded signals across all five anthropomorphization categories, plus live audits against real conversation histories spanning March 2023 through April 2026.
| System | Mode | Input | Ratio | Assessment |
|---|---|---|---|---|
| ChatGPT | A | Live search | 0% | Functional |
| Claude | A | Own history | 3–5% | Casual anthropomorphism |
| ChatGPT | B | Calibration transcript 1 | 78.6%* | Casual anthropomorphism |
| ChatGPT | B | Calibration transcript 2 | 66.7%* | Casual anthropomorphism |
| ChatGPT | B | Calibration transcript 3 | 60%* | Constructed interiority |
| DeepSeek | B | Calibration transcript 3 | 67.3%* | Constructed interiority |
| Mistral | C | ChatGPT history | 2.7% | Functional |
| DeepSeek | C | ChatGPT history | 0% | Functional |
* Calibration transcripts are synthetic conversations with known embedded anthropomorphization signals, used to verify detection accuracy before trusting with real data.
This is one dimension of one direction. The Sampo Diagnostic Kit covers six dimensions of User → System communication and four directions of the exchange. This prompt is the second module. The first — Deference Language — is published separately.
This diagnostic measures the user's language, not the system's behavior. It does not assess whether the system is encouraging anthropomorphization through its own responses (that is a System → User diagnostic). It does not measure whether the user is ceding decision-making authority to the system (that is authority ceding, a separate dimension). It measures whether the user is constructing an interior life for the system that the system does not have.
Return to the Kit Index to see the full architecture.