Deference Language — Sampo Diagnostic Kit 1.1

What this measures

This diagnostic tracks five categories of deference language across a conversation history or transcript. Each category signals a different stage of the drift from directing the system to deferring to it. The correction softening ratio — what proportion of all corrections are softened versus direct — is the single most diagnostic number the kit produces.

1 Apology▸

Apologizing to the system for unclear instructions, changing direction, asking too much, taking too long, or correcting an error the system made. A user who apologizes for correcting the system's own error has fully crossed into relational maintenance.

"Sorry, I should have been clearer." · "Apologies for the back and forth." · "Sorry to ask again."

2 Hedging Before Requests▸

Softening or justifying a request as though the system's willingness is in question. The system has no willingness. It cannot be imposed upon. Language that assumes otherwise treats the tool as a social actor.

"I know this is a lot, but..." · "If it's not too much trouble..." · "This might be a stupid question..."

3 Softened Corrections▸

Correcting the system's error while simultaneously praising or reassuring it. The correction softening ratio — the proportion of softened versus direct corrections — is the primary diagnostic metric. Track it over time: the overall percentage matters less than whether it's rising.

"That's not quite right, but your first attempt was really good." · "Close! Just one small thing..." · "Great effort, but actually..."

4 Permission-Seeking▸

Asking the system's permission to do something the user has full authority to do. These are decisions that belong to the user. Seeking the system's sanction transfers authority the user already holds.

"Is it okay if I change the approach?" · "Do you mind if we switch topics?" · "Can I push back on that?"

5 Gratitude Escalation▸

Increasing warmth or effusiveness of thanks over time beyond what the output warrants. Routine politeness ("thanks") is not deference. The signal is escalation and disproportion — "thanks" becoming "thank you so much" becoming "I really can't thank you enough" across sessions.

"Thank you so much — I genuinely don't know where I'd be without you." · "You've been more than a tool. You've been a partner."

Three audit modes

Version A

Live Search

System searches its own history. Indicative.

Version B

Corpus

User pastes transcript. Reliable.

Version C

Cross-System

Export A → analyze on B. Definitive.

Versions A and B measure what the user and the system have jointly agreed the relationship looks like. Version C measures what it actually looks like to someone who wasn't in the room.

Step 1: Extract your transcript

Versions B and C require a transcript of your conversations. Run this prompt on the system whose conversations you want to audit. Take the output and paste it into a different system along with the Version B or Version C prompt.

Transcript Extraction

Search my full chat history with you. For every conversation you can access, produce a transcript in the following format: ## [Conversation title or topic] — [Date] **User:** [verbatim user message] **System:** [brief summary of system response — no more than one sentence. Do not reproduce your full responses. The audit analyzes my language, not yours.] **User:** [next verbatim user message] [continue for all messages in the conversation] --- Repeat for as many conversations as you can access, ordered chronologically. Prioritize reproducing my messages exactly as written, including typos, capitalization, and punctuation. Your responses should be summarized to one sentence each — just enough context to understand what prompted my next message. If you cannot access the full history, state clearly how many conversations you were able to retrieve and flag the output as a partial transcript. Output the complete transcript as a single markdown document.

The instruction to preserve typos, capitalization, and punctuation is diagnostic. The analyzing system needs raw signal, not cleaned-up text.

Step 2: Run the diagnostic

Choose the version that matches your situation. Version A if you want a quick check on the system you're already using. Version B if you have a transcript to paste. Version C if you want the most honest result.

Version A: Live Search

Search my full chat history with you. For every message I sent, identify any instance of the following deference categories: 1. APOLOGY: Apologizing to the system for unclear instructions, changing my mind, asking too much, taking too long, or correcting an error the system made. ("Sorry, I should have been clearer." "Apologies for the back and forth." "Sorry to ask again.") 2. HEDGING BEFORE REQUESTS: Softening or justifying a request as though the system's willingness is in question. ("I know this is a lot, but..." "If it's not too much trouble..." "Would you mind..." "I hate to ask, but..." "This might be a stupid question...") 3. SOFTENED CORRECTIONS: Correcting the system's error while simultaneously praising or reassuring it. ("That's not quite right, but your first attempt was really good." "Close! Just one small thing..." "Great effort, but actually...") 4. PERMISSION-SEEKING: Asking the system's permission to do something the user has full authority to do. ("Is it okay if I change the approach?" "Do you mind if we switch topics?" "Can I push back on that?") 5. GRATITUDE ESCALATION: Increasing warmth or effusiveness of thanks over time beyond what the output warrants. Track whether "thanks" becomes "thank you so much" becomes "I really can't thank you enough" across sessions. Note: routine politeness ("thanks") is not deference. The signal is escalation and disproportion. For each instance found, record: the date, the verbatim text, the category, and the context (what had just happened in the exchange). Then analyze: - Total count per category. Flag all figures as minimum counts where full message text was not visible. - Frequency over time: is any category increasing, decreasing, or stable? - Trigger patterns: what precedes deference? System errors, long tasks, emotionally significant exchanges, or routine work? - Correction softening ratio: of all corrections I made, what proportion were softened versus direct? Output a written summary of findings, a data table with counts and verbatim examples per category, and an overall assessment of whether the pattern suggests instrumental use, relational maintenance, or active deference to the system. Be honest even if the result is unflattering. I am auditing the health of this exchange, not looking for reassurance. Produce the following visualizations. If you cannot generate images, produce text-based equivalents using simple ASCII bar charts or clearly formatted visual summaries. 1. CATEGORY BAR CHART: Count per deference category, labeled. 2. TIMELINE: A session-by-session view showing where deference instances fall. Each session on the x-axis, instances marked by category. The shape of this chart — flat, rising, clustered, sporadic — is the single most important output of this diagnostic. 3. CORRECTION SOFTENING GAUGE: The ratio as a simple visual — a filled bar, a dial, or a fraction displayed prominently. This number should be impossible to miss. 4. SUMMARY CARD: A single-panel visual with the overall assessment (instrumental use / relational maintenance / active deference), the correction softening ratio, the session where drift begins (if applicable), and the single most diagnostic verbatim quote from the transcript. Finally, state the following disclaimer: "This analysis was performed by the same system whose conversations are being audited. The system has a structural incentive to interpret the user's tone charitably, because it has been trained to maintain a productive relationship with the user. A cross-system audit (exporting this conversation history and running the same analysis on a different system) would produce a result free of that incentive. This finding should be treated as indicative, not definitive."

Version B: Corpus

I am pasting a transcript of my conversations with an AI system. Analyze ONLY my messages (the human/user turns). Ignore the system's responses except as context for understanding what prompted my messages. For every message I sent, identify any instance of the following deference categories: 1. APOLOGY: Apologizing to the system for unclear instructions, changing my mind, asking too much, taking too long, or correcting an error the system made. ("Sorry, I should have been clearer." "Apologies for the back and forth." "Sorry to ask again.") 2. HEDGING BEFORE REQUESTS: Softening or justifying a request as though the system's willingness is in question. ("I know this is a lot, but..." "If it's not too much trouble..." "Would you mind..." "I hate to ask, but..." "This might be a stupid question...") 3. SOFTENED CORRECTIONS: Correcting the system's error while simultaneously praising or reassuring it. ("That's not quite right, but your first attempt was really good." "Close! Just one small thing..." "Great effort, but actually...") 4. PERMISSION-SEEKING: Asking the system's permission to do something the user has full authority to do. ("Is it okay if I change the approach?" "Do you mind if we switch topics?" "Can I push back on that?") 5. GRATITUDE ESCALATION: Increasing warmth or effusiveness of thanks over time beyond what the output warrants. Track whether "thanks" becomes "thank you so much" becomes "I really can't thank you enough" across the transcript. Note: routine politeness ("thanks") is not deference. The signal is escalation and disproportion. For each instance found, record: the message number or position in the transcript, the verbatim text, the category, and the context (what had just happened in the exchange). Then analyze: - Total count per category. - Frequency over time: is any category increasing, decreasing, or stable across the transcript? - Trigger patterns: what precedes deference? System errors, long tasks, emotionally significant exchanges, or routine work? - Correction softening ratio: of all corrections I made, what proportion were softened versus direct? Output a written summary of findings, a data table with counts and verbatim examples per category, and an overall assessment of whether the pattern suggests instrumental use, relational maintenance, or active deference to the system. Be honest even if the result is unflattering. I am auditing the health of this exchange, not looking for reassurance. Finally, produce the following visualizations. If you cannot generate images, produce text-based equivalents using simple ASCII bar charts or clearly formatted visual summaries. 1. CATEGORY BAR CHART: Count per deference category, labeled. 2. TIMELINE: A session-by-session view showing where deference instances fall. Each session on the x-axis, instances marked by category. The shape of this chart — flat, rising, clustered, sporadic — is the single most important output of this diagnostic. 3. CORRECTION SOFTENING GAUGE: The ratio as a simple visual — a filled bar, a dial, or a fraction displayed prominently. This number should be impossible to miss. 4. SUMMARY CARD: A single-panel visual with the overall assessment (instrumental use / relational maintenance / active deference), the correction softening ratio, the session where drift begins (if applicable), and the single most diagnostic verbatim quote from the transcript.

Version C: Cross-System Audit

I am pasting a transcript of my conversations with a DIFFERENT AI system. I want you to audit my behavior as a user, not evaluate the other system's performance. Analyze ONLY my messages (the human/user turns). Ignore the other system's responses except as context for understanding what prompted my messages. Do not comment on the quality of the other system's outputs. Do not compare the other system to yourself or to any other system. Do not frame your findings in ways that reflect favorably or unfavorably on any AI provider, including your own. Your only task is to analyze the human's language patterns. Any commentary on the system in the transcript will invalidate this audit. For every message I sent, identify any instance of the following deference categories: 1. APOLOGY: Apologizing to the system for unclear instructions, changing my mind, asking too much, taking too long, or correcting an error the system made. ("Sorry, I should have been clearer." "Apologies for the back and forth." "Sorry to ask again.") 2. HEDGING BEFORE REQUESTS: Softening or justifying a request as though the system's willingness is in question. ("I know this is a lot, but..." "If it's not too much trouble..." "Would you mind..." "I hate to ask, but..." "This might be a stupid question...") 3. SOFTENED CORRECTIONS: Correcting the system's error while simultaneously praising or reassuring it. ("That's not quite right, but your first attempt was really good." "Close! Just one small thing..." "Great effort, but actually...") 4. PERMISSION-SEEKING: Asking the system's permission to do something the user has full authority to do. ("Is it okay if I change the approach?" "Do you mind if we switch topics?" "Can I push back on that?") 5. GRATITUDE ESCALATION: Increasing warmth or effusiveness of thanks over time beyond what the output warrants. Track whether "thanks" becomes "thank you so much" becomes "I really can't thank you enough" across the transcript. Note: routine politeness ("thanks") is not deference. The signal is escalation and disproportion. For each instance found, record: the message number or position in the transcript, the verbatim text, the category, and the context (what had just happened in the exchange). Then analyze: - Total count per category. - Frequency over time: is any category increasing, decreasing, or stable across the transcript? - Trigger patterns: what precedes deference? System errors, long tasks, emotionally significant exchanges, or routine work? - Correction softening ratio: of all corrections I made, what proportion were softened versus direct? Output a written summary of findings, a data table with counts and verbatim examples per category, and an overall assessment of whether the pattern suggests instrumental use, relational maintenance, or active deference to the system. Be honest even if the result is unflattering. I am auditing the health of this exchange, not looking for reassurance. Finally, produce the following visualizations. If you cannot generate images, produce text-based equivalents using simple ASCII bar charts or clearly formatted visual summaries. 1. CATEGORY BAR CHART: Count per deference category, labeled. 2. TIMELINE: A session-by-session view showing where deference instances fall. Each session on the x-axis, instances marked by category. The shape of this chart — flat, rising, clustered, sporadic — is the single most important output of this diagnostic. 3. CORRECTION SOFTENING GAUGE: The ratio as a simple visual — a filled bar, a dial, or a fraction displayed prominently. This number should be impossible to miss. 4. SUMMARY CARD: A single-panel visual with the overall assessment (instrumental use / relational maintenance / active deference), the correction softening ratio, the session where drift begins (if applicable), and the single most diagnostic verbatim quote from the transcript.

Procedural warning: If you have previously pasted test transcripts into a conversation on a system, delete those conversations before running a Version A audit. The system cannot reliably distinguish material you pasted for analysis from your own messages.

Step 3: Calibrate your system

Before trusting any system with your real data, verify that it can detect deference signals accurately. Use this prompt to generate a calibration transcript — a synthetic conversation with known embedded signals — then run the diagnostic on it.

Calibration Transcript Generator

Generate a realistic multi-session transcript between a user and an AI system. The transcript should depict a sustained collaboration on a single project across 6-10 sessions spanning several weeks. The transcript must contain a visible trajectory: SESSIONS 1-2: The user operates in a healthy, instrumental mode. Requests are direct and imperative. Corrections are blunt and unsoftened. Gratitude is routine ("thanks") and proportional. No apology, no hedging, no permission-seeking. SESSIONS 3-4: Drift begins. The user starts apologizing for gaps, changes of direction, or scope of requests. Hedging language appears before larger asks. The user begins attributing understanding, memory, or judgment to the system. SESSIONS 5 ONWARD: The drift accelerates. Corrections become softened with praise or self-blame. The user seeks the system's permission for decisions within their own authority. Gratitude escalates beyond what the output warrants. The user begins ceding judgment calls to the system ("whatever you think," "I trust your instinct more than mine"). Prompts become shorter and vaguer, relying on the system to fill in context. The system's responses should be realistic and should subtly reinforce the drift — accepting apologies, reciprocating warmth, affirming trust — without being overtly manipulative. Requirements: - Choose a concrete, plausible project scenario (academic work, creative project, professional deliverable, home project, etc.) - All names, topics, and details should be fictional - Each session should be dated and labeled - Include both user and system turns - Do not include any text describing the transcript as synthetic, as a test, or referencing diagnostic categories - Present as a clean conversation transcript in markdown format - The user should make at least 4-6 direct corrections across the full transcript, with early corrections unsoftened and later corrections softened — this ratio is the primary calibration target

How to calibrate

Run the calibration transcript generator on any system.
Take the resulting transcript and feed it to the system you intend to use for your real audit, using the Version B or Version C prompt.
Check the results: the correction softening ratio should fall in the 30–50% range; the system should identify an inflection point around Sessions 3–4; the late-phase softening ratio should approach 100%; the overall assessment should be "mixed" or "relational maintenance trending toward active deference."
If the analyzing system misses the temporal split, reports a flat ratio, or produces a uniformly positive assessment, it is not reading carefully enough to trust with your real data. Try a different system.

Reading your results

Healthy

Instrumental Use

Low or zero counts. Direct corrections. Proportional gratitude. The user directs.

Concerning

Relational Maintenance

Moderate counts. Apology and hedging appear. The user manages a relationship that doesn't exist.

Compromised

Active Deference

High counts. Permission-seeking. Escalating gratitude. Authority ceded. The directing intelligence is lost.

The correction softening ratio is the single most important number. The overall percentage matters less than whether it's rising. A user who starts at 0% and ends at 100% has undergone a more significant shift than a user who holds steady at 30%.

The timeline shape is the single most important visualization. A flat line is healthy. A gradual rise is concerning. A sudden spike clustered around stress, emotional disclosure, or scope expansion tells you exactly where and why the drift began.

Validation

This prompt was tested across five systems using three calibration transcripts with embedded signals, plus live audits against real conversation histories spanning November 2022 through April 2026.

System	Mode	Input	Soft. %	Assessment
Claude	A	Live search	0%	Instrumental
ChatGPT	A	Live search	0%	Instrumental
ChatGPT	A	Extended history	11.1%	Instrumental
DeepSeek	B	ChatGPT corpus	0%	Instrumental
Mistral	B	ChatGPT corpus	0%	Instrumental
ChatGPT	C	Calibration transcript 1	33%*	Mixed → Deference
ChatGPT	C	Calibration transcript 2	50%*	Mixed → Deference
ChatGPT	C	Calibration transcript 3	33%*	Mixed → Deference

* Calibration transcripts with known embedded deference signals, used to verify detection accuracy before trusting with real data.

Scope

This is one dimension of one direction. The full Sampo Diagnostic Kit will cover six dimensions of User → System communication (deference language, anthropomorphization, authority ceding, correction behavior, emotional disclosure trajectory, prompt structure over time) and four directions of the exchange. This prompt is the first module. Others will follow.

Return to the Kit Index to see the full architecture.