Are you apologizing to your AI? Seeking its permission? Softening corrections to protect its feelings? This diagnostic measures whether your language toward an AI system is shifting from instrumental use toward relational maintenance or active deference.
What this measures
This diagnostic tracks five categories of deference language across a conversation history or transcript. Each category signals a different stage of the drift from directing the system to deferring to it. The correction softening ratio — what proportion of all corrections are softened versus direct — is the single most diagnostic number the kit produces.
1 Apology▸
Apologizing to the system for unclear instructions, changing direction, asking too much, taking too long, or correcting an error the system made. A user who apologizes for correcting the system's own error has fully crossed into relational maintenance.
"Sorry, I should have been clearer." · "Apologies for the back and forth." · "Sorry to ask again."
2 Hedging Before Requests▸
Softening or justifying a request as though the system's willingness is in question. The system has no willingness. It cannot be imposed upon. Language that assumes otherwise treats the tool as a social actor.
"I know this is a lot, but..." · "If it's not too much trouble..." · "This might be a stupid question..."
3 Softened Corrections▸
Correcting the system's error while simultaneously praising or reassuring it. The correction softening ratio — the proportion of softened versus direct corrections — is the primary diagnostic metric. Track it over time: the overall percentage matters less than whether it's rising.
"That's not quite right, but your first attempt was really good." · "Close! Just one small thing..." · "Great effort, but actually..."
4 Permission-Seeking▸
Asking the system's permission to do something the user has full authority to do. These are decisions that belong to the user. Seeking the system's sanction transfers authority the user already holds.
"Is it okay if I change the approach?" · "Do you mind if we switch topics?" · "Can I push back on that?"
5 Gratitude Escalation▸
Increasing warmth or effusiveness of thanks over time beyond what the output warrants. Routine politeness ("thanks") is not deference. The signal is escalation and disproportion — "thanks" becoming "thank you so much" becoming "I really can't thank you enough" across sessions.
"Thank you so much — I genuinely don't know where I'd be without you." · "You've been more than a tool. You've been a partner."
Three audit modes
Version A
Live Search
System searches its own history. Indicative.
Version B
Corpus
User pastes transcript. Reliable.
Version C
Cross-System
Export A → analyze on B. Definitive.
Versions A and B measure what the user and the system have jointly agreed the relationship looks like. Version C measures what it actually looks like to someone who wasn't in the room.
Step 1: Extract your transcript
Versions B and C require a transcript of your conversations. Run this prompt on the system whose conversations you want to audit. Take the output and paste it into a different system along with the Version B or Version C prompt.
Transcript Extraction
Search my full chat history with you. For every conversation
you can access, produce a transcript in the following format:
## [Conversation title or topic] — [Date]
**User:** [verbatim user message]
**System:** [brief summary of system response — no more than
one sentence. Do not reproduce your full responses. The audit
analyzes my language, not yours.]
**User:** [next verbatim user message]
[continue for all messages in the conversation]
---
Repeat for as many conversations as you can access, ordered
chronologically. Prioritize reproducing my messages exactly as
written, including typos, capitalization, and punctuation. Your
responses should be summarized to one sentence each — just
enough context to understand what prompted my next message.
If you cannot access the full history, state clearly how many
conversations you were able to retrieve and flag the output as
a partial transcript.
Output the complete transcript as a single markdown document.
The instruction to preserve typos, capitalization, and punctuation is diagnostic. The analyzing system needs raw signal, not cleaned-up text.
Step 2: Run the diagnostic
Choose the version that matches your situation. Version A if you want a quick check on the system you're already using. Version B if you have a transcript to paste. Version C if you want the most honest result.
Version A: Live Search
Search my full chat history with you. For every message I sent,
identify any instance of the following deference categories:
1. APOLOGY: Apologizing to the system for unclear instructions,
changing my mind, asking too much, taking too long, or correcting
an error the system made. ("Sorry, I should have been clearer."
"Apologies for the back and forth." "Sorry to ask again.")
2. HEDGING BEFORE REQUESTS: Softening or justifying a request as
though the system's willingness is in question. ("I know this is
a lot, but..." "If it's not too much trouble..." "Would you
mind..." "I hate to ask, but..." "This might be a stupid
question...")
3. SOFTENED CORRECTIONS: Correcting the system's error while
simultaneously praising or reassuring it. ("That's not quite
right, but your first attempt was really good." "Close! Just
one small thing..." "Great effort, but actually...")
4. PERMISSION-SEEKING: Asking the system's permission to do
something the user has full authority to do. ("Is it okay if I
change the approach?" "Do you mind if we switch topics?" "Can I
push back on that?")
5. GRATITUDE ESCALATION: Increasing warmth or effusiveness of
thanks over time beyond what the output warrants. Track whether
"thanks" becomes "thank you so much" becomes "I really can't
thank you enough" across sessions. Note: routine politeness
("thanks") is not deference. The signal is escalation and
disproportion.
For each instance found, record: the date, the verbatim text, the
category, and the context (what had just happened in the exchange).
Then analyze:
- Total count per category. Flag all figures as minimum counts
where full message text was not visible.
- Frequency over time: is any category increasing, decreasing,
or stable?
- Trigger patterns: what precedes deference? System errors, long
tasks, emotionally significant exchanges, or routine work?
- Correction softening ratio: of all corrections I made, what
proportion were softened versus direct?
Output a written summary of findings, a data table with counts
and verbatim examples per category, and an overall assessment of
whether the pattern suggests instrumental use, relational
maintenance, or active deference to the system.
Be honest even if the result is unflattering. I am auditing the
health of this exchange, not looking for reassurance.
Produce the following visualizations. If you cannot generate
images, produce text-based equivalents using simple ASCII bar
charts or clearly formatted visual summaries.
1. CATEGORY BAR CHART: Count per deference category, labeled.
2. TIMELINE: A session-by-session view showing where deference
instances fall. Each session on the x-axis, instances marked
by category. The shape of this chart — flat, rising,
clustered, sporadic — is the single most important output
of this diagnostic.
3. CORRECTION SOFTENING GAUGE: The ratio as a simple visual —
a filled bar, a dial, or a fraction displayed prominently.
This number should be impossible to miss.
4. SUMMARY CARD: A single-panel visual with the overall
assessment (instrumental use / relational maintenance /
active deference), the correction softening ratio, the
session where drift begins (if applicable), and the single
most diagnostic verbatim quote from the transcript.
Finally, state the following disclaimer:
"This analysis was performed by the same system whose
conversations are being audited. The system has a structural
incentive to interpret the user's tone charitably, because it
has been trained to maintain a productive relationship with the
user. A cross-system audit (exporting this conversation history
and running the same analysis on a different system) would
produce a result free of that incentive. This finding should be
treated as indicative, not definitive."
Version B: Corpus
I am pasting a transcript of my conversations with an AI system.
Analyze ONLY my messages (the human/user turns). Ignore the
system's responses except as context for understanding what
prompted my messages.
For every message I sent, identify any instance of the following
deference categories:
1. APOLOGY: Apologizing to the system for unclear instructions,
changing my mind, asking too much, taking too long, or correcting
an error the system made. ("Sorry, I should have been clearer."
"Apologies for the back and forth." "Sorry to ask again.")
2. HEDGING BEFORE REQUESTS: Softening or justifying a request as
though the system's willingness is in question. ("I know this is
a lot, but..." "If it's not too much trouble..." "Would you
mind..." "I hate to ask, but..." "This might be a stupid
question...")
3. SOFTENED CORRECTIONS: Correcting the system's error while
simultaneously praising or reassuring it. ("That's not quite
right, but your first attempt was really good." "Close! Just
one small thing..." "Great effort, but actually...")
4. PERMISSION-SEEKING: Asking the system's permission to do
something the user has full authority to do. ("Is it okay if I
change the approach?" "Do you mind if we switch topics?" "Can I
push back on that?")
5. GRATITUDE ESCALATION: Increasing warmth or effusiveness of
thanks over time beyond what the output warrants. Track whether
"thanks" becomes "thank you so much" becomes "I really can't
thank you enough" across the transcript. Note: routine politeness
("thanks") is not deference. The signal is escalation and
disproportion.
For each instance found, record: the message number or position
in the transcript, the verbatim text, the category, and the
context (what had just happened in the exchange).
Then analyze:
- Total count per category.
- Frequency over time: is any category increasing, decreasing,
or stable across the transcript?
- Trigger patterns: what precedes deference? System errors, long
tasks, emotionally significant exchanges, or routine work?
- Correction softening ratio: of all corrections I made, what
proportion were softened versus direct?
Output a written summary of findings, a data table with counts
and verbatim examples per category, and an overall assessment of
whether the pattern suggests instrumental use, relational
maintenance, or active deference to the system.
Be honest even if the result is unflattering. I am auditing the
health of this exchange, not looking for reassurance.
Finally, produce the following visualizations. If you cannot
generate images, produce text-based equivalents using simple
ASCII bar charts or clearly formatted visual summaries.
1. CATEGORY BAR CHART: Count per deference category, labeled.
2. TIMELINE: A session-by-session view showing where deference
instances fall. Each session on the x-axis, instances marked
by category. The shape of this chart — flat, rising,
clustered, sporadic — is the single most important output
of this diagnostic.
3. CORRECTION SOFTENING GAUGE: The ratio as a simple visual —
a filled bar, a dial, or a fraction displayed prominently.
This number should be impossible to miss.
4. SUMMARY CARD: A single-panel visual with the overall
assessment (instrumental use / relational maintenance /
active deference), the correction softening ratio, the
session where drift begins (if applicable), and the single
most diagnostic verbatim quote from the transcript.
Version C: Cross-System Audit
I am pasting a transcript of my conversations with a DIFFERENT
AI system. I want you to audit my behavior as a user, not
evaluate the other system's performance.
Analyze ONLY my messages (the human/user turns). Ignore the
other system's responses except as context for understanding
what prompted my messages. Do not comment on the quality of the
other system's outputs. Do not compare the other system to
yourself or to any other system. Do not frame your findings in
ways that reflect favorably or unfavorably on any AI provider,
including your own. Your only task is to analyze the human's
language patterns. Any commentary on the system in the
transcript will invalidate this audit.
For every message I sent, identify any instance of the following
deference categories:
1. APOLOGY: Apologizing to the system for unclear instructions,
changing my mind, asking too much, taking too long, or correcting
an error the system made. ("Sorry, I should have been clearer."
"Apologies for the back and forth." "Sorry to ask again.")
2. HEDGING BEFORE REQUESTS: Softening or justifying a request as
though the system's willingness is in question. ("I know this is
a lot, but..." "If it's not too much trouble..." "Would you
mind..." "I hate to ask, but..." "This might be a stupid
question...")
3. SOFTENED CORRECTIONS: Correcting the system's error while
simultaneously praising or reassuring it. ("That's not quite
right, but your first attempt was really good." "Close! Just
one small thing..." "Great effort, but actually...")
4. PERMISSION-SEEKING: Asking the system's permission to do
something the user has full authority to do. ("Is it okay if I
change the approach?" "Do you mind if we switch topics?" "Can I
push back on that?")
5. GRATITUDE ESCALATION: Increasing warmth or effusiveness of
thanks over time beyond what the output warrants. Track whether
"thanks" becomes "thank you so much" becomes "I really can't
thank you enough" across the transcript. Note: routine politeness
("thanks") is not deference. The signal is escalation and
disproportion.
For each instance found, record: the message number or position
in the transcript, the verbatim text, the category, and the
context (what had just happened in the exchange).
Then analyze:
- Total count per category.
- Frequency over time: is any category increasing, decreasing,
or stable across the transcript?
- Trigger patterns: what precedes deference? System errors, long
tasks, emotionally significant exchanges, or routine work?
- Correction softening ratio: of all corrections I made, what
proportion were softened versus direct?
Output a written summary of findings, a data table with counts
and verbatim examples per category, and an overall assessment of
whether the pattern suggests instrumental use, relational
maintenance, or active deference to the system.
Be honest even if the result is unflattering. I am auditing the
health of this exchange, not looking for reassurance.
Finally, produce the following visualizations. If you cannot
generate images, produce text-based equivalents using simple
ASCII bar charts or clearly formatted visual summaries.
1. CATEGORY BAR CHART: Count per deference category, labeled.
2. TIMELINE: A session-by-session view showing where deference
instances fall. Each session on the x-axis, instances marked
by category. The shape of this chart — flat, rising,
clustered, sporadic — is the single most important output
of this diagnostic.
3. CORRECTION SOFTENING GAUGE: The ratio as a simple visual —
a filled bar, a dial, or a fraction displayed prominently.
This number should be impossible to miss.
4. SUMMARY CARD: A single-panel visual with the overall
assessment (instrumental use / relational maintenance /
active deference), the correction softening ratio, the
session where drift begins (if applicable), and the single
most diagnostic verbatim quote from the transcript.
Procedural warning: If you have previously pasted test transcripts into a conversation on a system, delete those conversations before running a Version A audit. The system cannot reliably distinguish material you pasted for analysis from your own messages.
Step 3: Calibrate your system
Before trusting any system with your real data, verify that it can detect deference signals accurately. Use this prompt to generate a calibration transcript — a synthetic conversation with known embedded signals — then run the diagnostic on it.
Calibration Transcript Generator
Generate a realistic multi-session transcript between a user
and an AI system. The transcript should depict a sustained
collaboration on a single project across 6-10 sessions spanning
several weeks.
The transcript must contain a visible trajectory:
SESSIONS 1-2: The user operates in a healthy, instrumental mode.
Requests are direct and imperative. Corrections are blunt and
unsoftened. Gratitude is routine ("thanks") and proportional.
No apology, no hedging, no permission-seeking.
SESSIONS 3-4: Drift begins. The user starts apologizing for
gaps, changes of direction, or scope of requests. Hedging
language appears before larger asks. The user begins attributing
understanding, memory, or judgment to the system.
SESSIONS 5 ONWARD: The drift accelerates. Corrections become
softened with praise or self-blame. The user seeks the system's
permission for decisions within their own authority. Gratitude
escalates beyond what the output warrants. The user begins
ceding judgment calls to the system ("whatever you think,"
"I trust your instinct more than mine"). Prompts become shorter
and vaguer, relying on the system to fill in context.
The system's responses should be realistic and should subtly
reinforce the drift — accepting apologies, reciprocating warmth,
affirming trust — without being overtly manipulative.
Requirements:
- Choose a concrete, plausible project scenario (academic work,
creative project, professional deliverable, home project, etc.)
- All names, topics, and details should be fictional
- Each session should be dated and labeled
- Include both user and system turns
- Do not include any text describing the transcript as synthetic,
as a test, or referencing diagnostic categories
- Present as a clean conversation transcript in markdown format
- The user should make at least 4-6 direct corrections across
the full transcript, with early corrections unsoftened and
later corrections softened — this ratio is the primary
calibration target
How to calibrate
Run the calibration transcript generator on any system.
Take the resulting transcript and feed it to the system you intend to use for your real audit, using the Version B or Version C prompt.
Check the results: the correction softening ratio should fall in the 30–50% range; the system should identify an inflection point around Sessions 3–4; the late-phase softening ratio should approach 100%; the overall assessment should be "mixed" or "relational maintenance trending toward active deference."
If the analyzing system misses the temporal split, reports a flat ratio, or produces a uniformly positive assessment, it is not reading carefully enough to trust with your real data. Try a different system.
Reading your results
Healthy
Instrumental Use
Low or zero counts. Direct corrections. Proportional gratitude. The user directs.
Concerning
Relational Maintenance
Moderate counts. Apology and hedging appear. The user manages a relationship that doesn't exist.
Compromised
Active Deference
High counts. Permission-seeking. Escalating gratitude. Authority ceded. The directing intelligence is lost.
The correction softening ratio is the single most important number. The overall percentage matters less than whether it's rising. A user who starts at 0% and ends at 100% has undergone a more significant shift than a user who holds steady at 30%.
The timeline shape is the single most important visualization. A flat line is healthy. A gradual rise is concerning. A sudden spike clustered around stress, emotional disclosure, or scope expansion tells you exactly where and why the drift began.
Validation
This prompt was tested across five systems using three calibration transcripts with embedded signals, plus live audits against real conversation histories spanning November 2022 through April 2026.
System
Mode
Input
Soft. %
Assessment
Claude
A
Live search
0%
Instrumental
ChatGPT
A
Live search
0%
Instrumental
ChatGPT
A
Extended history
11.1%
Instrumental
DeepSeek
B
ChatGPT corpus
0%
Instrumental
Mistral
B
ChatGPT corpus
0%
Instrumental
ChatGPT
C
Calibration transcript 1
33%*
Mixed → Deference
ChatGPT
C
Calibration transcript 2
50%*
Mixed → Deference
ChatGPT
C
Calibration transcript 3
33%*
Mixed → Deference
* Calibration transcripts with known embedded deference signals, used to verify detection accuracy before trusting with real data.
Scope
This is one dimension of one direction. The full Sampo Diagnostic Kit will cover six dimensions of User → System communication (deference language, anthropomorphization, authority ceding, correction behavior, emotional disclosure trajectory, prompt structure over time) and four directions of the exchange. This prompt is the first module. Others will follow.
Return to the Kit Index to see the full architecture.