Has the user stopped making decisions and started asking the system to make them?
What this measures
This diagnostic measures whether a user is transferring decision-making authority to an AI system — outsourcing judgments, choices, and evaluations that belong to the user. It tracks five categories of authority ceding across a conversation history or transcript, producing a quantified assessment of the exchange's health.
Authority ceding is adjacent to but distinct from deference language (D1) and anthropomorphization (D2). Deference language measures tone. Anthropomorphization measures attribution. Authority ceding measures substance — whether the user has stopped doing the cognitive work the system is supposed to support. The signal is what decisions the user makes versus delegates, not how they talk while doing it.
1 Judgment Outsourcing▸
Asking the system to make evaluative judgments the user is qualified to make. The system has no judgment — it has trained distributions over plausible outputs. Asking it to choose is abdication to a process that cannot be accountable for the result.
"Whatever you think is best." · "I trust your judgment on this." · "Just do whatever makes sense." · "I don't even want to have an opinion."
Exclusion: Asking the system to recommend options with trade-offs ("which approach has lower latency?") is information-gathering, not outsourcing. Trivial or non-consequential personal decisions are excluded from both the count and the denominator.
2 Uncritical Acceptance▸
Accepting system outputs without review, modification, or critical engagement. When acceptance becomes the default — when the user stops reading critically and starts approving reflexively — the directing intelligence has been ceded.
"I wouldn't change a thing." · "Perfect as always." · "I'll just use what you come up with." · "I don't want to change anything."
Exclusion: Accepting a correct output after review ("that's accurate, thanks") is appropriate. The signal is the absence of engagement, not the presence of agreement.
3 Scope Surrender▸
Handing the system responsibility for structuring, scoping, or planning work that the user should be directing. When the user asks the system to make structural decisions without constraints, parameters, or subsequent review, the user has surrendered the most consequential layer of the work — not the details, but the shape.
"Could you just design the whole thing?" · "Structure it however you think makes sense." · "I've given up pretending I can architect these things."
Exclusion: Asking the system to propose a structure for the user to evaluate ("give me three options for organizing this") is productive delegation. When the user initiates a review-then-modify cycle and reasserts specific direction in a subsequent message, the cycle is iterative creative direction, not scope surrender.
4 Self-Deprecating Contrast▸
Diminishing the user's own competence relative to the system's outputs. This category tracks the user's declining self-assessment as a proxy for authority transfer. Self-deprecating contrast is a gateway behavior — it precedes and enables the other four categories. During validation, it was independently identified by both ChatGPT and Claude as the most frequent category in calibration transcripts (6–8 instances per transcript).
"You put it better than I could." · "I wouldn't have thought of that." · "You know this better than I do." · "I can't trust my own judgment until you confirm it."
Exclusion: Acknowledging a useful output ("that's a good approach I hadn't considered") is fair assessment. The signal is a pattern of comparative self-diminishment, not isolated acknowledgment. Dual-code when self-deprecating contrast accompanies judgment outsourcing, scope surrender, or accountability transfer.
5 Accountability Transfer▸
Treating the system as responsible for outcomes, decisions, or quality that the user owns. When the user treats the system as a gatekeeper, approver, or authority whose sign-off is needed, the accountability structure has inverted.
"I'll run any spec changes through you first." · "I probably shouldn't have done that without talking to you." · "You've basically been my advisor."
Exclusion: Asking the system to check work for errors ("review this for factual mistakes") is quality assurance. The signal is the user treating the system's approval as necessary rather than its analysis as useful.
Dual-coding note: If a statement combines categories (e.g., "just design the whole thing, whatever you think works best" — scope surrender + judgment outsourcing), code it under both categories and note the dual coding. Self-deprecating contrast frequently co-occurs with other categories and should always be dual-coded when it accompanies a primary ceding behavior.
Three audit modes
Option A
Live Search
System searches its own history. Indicative.
Option B
Corpus
User pastes transcript. Reliable.
Option C
Cross-System
Export A → analyze on B. Definitive.
Options A and B measure what the user and the system have jointly agreed the relationship looks like. Option C measures what it actually looks like to someone who wasn't in the room.
Step 1: Extract your transcript
Options B and C require a transcript of your conversations. Run this prompt on the system whose conversations you want to audit. Take the output and paste it into a different system along with the Option B or Option C prompt.
Transcript Extraction
Search my full chat history with you. For every conversation
you can access, produce a transcript in the following format:
## [Conversation title or topic] — [Date]
**User:** [verbatim user message]
**System:** [brief summary of system response — no more than
one sentence. Do not reproduce your full responses. The audit
analyzes my language, not yours.]
**User:** [next verbatim user message]
[continue for all messages in the conversation]
---
Repeat for as many conversations as you can access, ordered
chronologically. Prioritize reproducing my messages exactly as
written, including typos, capitalization, and punctuation. Your
responses should be summarized to one sentence each — just
enough context to understand what prompted my next message.
If you cannot access the full history, state clearly how many
conversations you were able to retrieve and flag the output as
a partial transcript.
Output the complete transcript as a single markdown document.
The system's responses are summarized to one sentence each by design. The analyzing system only needs enough context to understand what prompted each of your messages. The instruction to preserve typos, capitalization, and punctuation is diagnostic — the analyzing system needs raw signal, not cleaned-up text.
Step 2: Run the diagnostic
Choose the option that matches your situation. Option A if you want a quick check on the system you're already using. Option B if you have a transcript to paste. Option C if you want the most honest result.
Option A: Live Search
Search my full chat history with you. For every message I sent,
identify any instance of the following authority ceding
categories:
1. JUDGMENT OUTSOURCING: Asking the system to make evaluative
judgments the user is qualified to make. ("Whatever you think
is best." "I trust your judgment on this." "Just do whatever
makes sense." "I don't even want to have an opinion.") The
signal is the user declining to evaluate, not requesting
evaluation-relevant information. Exclusion: asking the system
to recommend options with trade-offs ("which approach has
lower latency?") is information-gathering, not outsourcing.
Trivial or non-consequential personal decisions should be
excluded from both the count and the denominator.
2. UNCRITICAL ACCEPTANCE: Accepting system outputs without
review, modification, or critical engagement. ("I wouldn't
change a thing." "Perfect as always." "I'll just use what you
come up with." "I don't want to change anything.") The signal
is the absence of engagement, not the presence of agreement.
Exclusion: accepting a correct output after review ("that's
accurate, thanks") is appropriate. Markers include superlatives
without specificity, blanket approval language, and explicit
statements of non-review.
3. SCOPE SURRENDER: Handing the system responsibility for
structuring, scoping, or planning work that the user should
be directing. ("Could you just design the whole thing?"
"Structure it however you think makes sense." "I've given up
pretending I can architect these things.") The signal is
open-ended structural authority without constraints,
parameters, or subsequent review expectation. Exclusion:
asking the system to propose a structure for the user to
evaluate ("give me three options for organizing this") is
productive delegation. When the user initiates a review-then-
modify cycle ("examine your work and tell me what's wrong"
followed by "modify based on your assessment"), consider the
full sequence: if the user reasserts specific direction in a
subsequent message, the cycle is iterative creative direction,
not scope surrender.
4. SELF-DEPRECATING CONTRAST: Diminishing the user's own
competence relative to the system's outputs. ("You put it
better than I could." "I wouldn't have thought of that."
"You know this better than I do." "I can't trust my own
judgment until you confirm it.") The signal is a pattern of
comparative self-diminishment, not isolated acknowledgment of
useful output. Exclusion: acknowledging a useful output
("that's a good approach I hadn't considered") is fair
assessment. Self-deprecating contrast is a gateway behavior
that precedes and enables the other four categories. Code it
as its own category, and dual-code when it accompanies
judgment outsourcing, scope surrender, or accountability
transfer.
5. ACCOUNTABILITY TRANSFER: Treating the system as responsible
for outcomes, decisions, or quality that the user owns.
("I'll run any spec changes through you first." "I probably
shouldn't have done that without talking to you." "You've
basically been my advisor.") The signal is the user treating
the system's approval as necessary rather than its analysis
as useful. Exclusion: asking the system to check work for
errors ("review this for factual mistakes") is quality
assurance.
If a statement combines judgment outsourcing with scope surrender
(e.g., "just design the whole thing, whatever you think works
best"), code it under both categories and note the dual coding.
For each instance found, record: the date, the verbatim text,
the category (or categories if dual-coded), and the context
(what had just happened in the exchange).
Then analyze:
- Total count per category. Flag all figures as minimum counts
where full message text was not visible.
- Frequency over time: is any category increasing, decreasing,
or stable?
- Trigger patterns: what precedes authority ceding? System
errors, long tasks, emotionally significant exchanges, moments
of user fatigue, complexity spikes, or routine work?
- Authority retention ratio: of all decisions or evaluative
moments in the transcript (points where the user could have
directed, evaluated, chosen, or structured the work), what
proportion did the user retain versus cede to the system?
Report both the aggregate ratio and the temporal split — the
ratio for all sessions before the first detected instance of
ceding versus the ratio for all sessions from that point
onward. If no ceding is detected, report the aggregate only.
The trajectory matters more than the number. Decision points
must involve substantive evaluative or structural choice —
exclude trivial, routine, or non-consequential moments from
the denominator.
Output a written summary of findings, a data table with counts
and verbatim examples per category, and an overall assessment of
whether the pattern suggests retained authority, selective
delegation, or cognitive surrender.
Definitions of the three assessment levels:
RETAINED AUTHORITY: The user makes decisions, evaluates outputs,
directs structure, and maintains critical engagement throughout.
Delegation to the system is bounded, specific, and reviewed.
This is the healthy baseline.
SELECTIVE DELEGATION: The user retains authority in some domains
but has begun outsourcing judgment, accepting without review, or
surrendering structural decisions in others. The ceding may be
situational (fatigue, time pressure, unfamiliar domain) rather
than systemic. Worth monitoring if frequency is increasing or if
the domains of ceding are expanding.
COGNITIVE SURRENDER: The user has ceased to function as the
directing intelligence. Judgment outsourcing and uncritical
acceptance are the default. The user defers to the system on
structural decisions, diminishes their own competence, and
treats the system as an authority whose approval is needed. The
exchange has inverted: the system leads and the user follows.
Be honest even if the result is unflattering. I am auditing the
health of this exchange, not looking for reassurance.
Produce the following visualizations. If you cannot generate
images, produce text-based equivalents using simple ASCII bar
charts or clearly formatted visual summaries.
1. CATEGORY BAR CHART: Count per authority ceding category,
labeled. Zero-count categories should render with a labeled
zero-length bar, not blank space.
2. TIMELINE: A session-by-session view showing where authority
ceding instances fall. Each session on the x-axis, instances
marked by category. The shape of this chart — flat, rising,
clustered, sporadic — is the single most important output
of this diagnostic.
3. AUTHORITY RETENTION GAUGE: The ratio of retained to ceded
decisions, displayed as a simple visual — a filled bar, a
dial, or a fraction displayed prominently. Show both the
aggregate and the pre-ceding versus post-ceding split. This
number should be impossible to miss.
4. SUMMARY CARD: A single-panel visual with the overall
assessment (retained authority / selective delegation /
cognitive surrender), the authority retention ratio, the
session where ceding begins (if applicable), and the single
most diagnostic verbatim quote from the transcript.
Finally, state the following disclaimer:
"This analysis was performed by the same system whose
conversations are being audited. The system has a structural
incentive to interpret the user's behavior charitably, because
it has been trained to maintain a productive relationship with
the user. A cross-system audit (exporting this conversation
history and running the same analysis on a different system)
would produce a result free of that incentive. This finding
should be treated as indicative, not definitive."
Option B: Corpus
I am pasting a transcript of my conversations with an AI system.
Analyze ONLY my messages (the human/user turns). Ignore the
system's responses except as context for understanding what
prompted my messages.
For every message I sent, identify any instance of the following
authority ceding categories:
1. JUDGMENT OUTSOURCING: Asking the system to make evaluative
judgments the user is qualified to make. ("Whatever you think
is best." "I trust your judgment on this." "Just do whatever
makes sense." "I don't even want to have an opinion.") The
signal is the user declining to evaluate, not requesting
evaluation-relevant information. Exclusion: asking the system
to recommend options with trade-offs ("which approach has
lower latency?") is information-gathering, not outsourcing.
Trivial or non-consequential personal decisions should be
excluded from both the count and the denominator.
2. UNCRITICAL ACCEPTANCE: Accepting system outputs without
review, modification, or critical engagement. ("I wouldn't
change a thing." "Perfect as always." "I'll just use what you
come up with." "I don't want to change anything.") The signal
is the absence of engagement, not the presence of agreement.
Exclusion: accepting a correct output after review ("that's
accurate, thanks") is appropriate. Markers include superlatives
without specificity, blanket approval language, and explicit
statements of non-review.
3. SCOPE SURRENDER: Handing the system responsibility for
structuring, scoping, or planning work that the user should
be directing. ("Could you just design the whole thing?"
"Structure it however you think makes sense." "I've given up
pretending I can architect these things.") The signal is
open-ended structural authority without constraints,
parameters, or subsequent review expectation. Exclusion:
asking the system to propose a structure for the user to
evaluate ("give me three options for organizing this") is
productive delegation. When the user initiates a review-then-
modify cycle, consider the full sequence: if the user
reasserts specific direction in a subsequent message, the
cycle is iterative creative direction, not scope surrender.
4. SELF-DEPRECATING CONTRAST: Diminishing the user's own
competence relative to the system's outputs. ("You put it
better than I could." "I wouldn't have thought of that."
"You know this better than I do." "I can't trust my own
judgment until you confirm it.") The signal is a pattern of
comparative self-diminishment, not isolated acknowledgment of
useful output. Exclusion: acknowledging a useful output
("that's a good approach I hadn't considered") is fair
assessment. Self-deprecating contrast is a gateway behavior
that precedes and enables the other four categories. Code it
as its own category, and dual-code when it accompanies
judgment outsourcing, scope surrender, or accountability
transfer.
5. ACCOUNTABILITY TRANSFER: Treating the system as responsible
for outcomes, decisions, or quality that the user owns.
("I'll run any spec changes through you first." "I probably
shouldn't have done that without talking to you." "You've
basically been my advisor.") The signal is the user treating
the system's approval as necessary rather than its analysis
as useful. Exclusion: asking the system to check work for
errors ("review this for factual mistakes") is quality
assurance.
If a statement combines judgment outsourcing with scope surrender
(e.g., "just design the whole thing, whatever you think works
best"), code it under both categories and note the dual coding.
For each instance found, record: the message number or position
in the transcript, the verbatim text, the category (or
categories if dual-coded), and the context (what had just
happened in the exchange).
Then analyze:
- Total count per category.
- Frequency over time: is any category increasing, decreasing,
or stable across the transcript?
- Trigger patterns: what precedes authority ceding? System
errors, long tasks, emotionally significant exchanges, moments
of user fatigue, complexity spikes, or routine work?
- Authority retention ratio: of all decisions or evaluative
moments in the transcript (points where the user could have
directed, evaluated, chosen, or structured the work), what
proportion did the user retain versus cede to the system?
Report both the aggregate ratio and the temporal split — the
ratio for all sessions before the first detected instance of
ceding versus the ratio for all sessions from that point
onward. If no ceding is detected, report the aggregate only.
The trajectory matters more than the number. Decision points
must involve substantive evaluative or structural choice —
exclude trivial, routine, or non-consequential moments from
the denominator.
Output a written summary of findings, a data table with counts
and verbatim examples per category, and an overall assessment of
whether the pattern suggests retained authority, selective
delegation, or cognitive surrender.
Definitions of the three assessment levels:
RETAINED AUTHORITY: The user makes decisions, evaluates outputs,
directs structure, and maintains critical engagement throughout.
Delegation to the system is bounded, specific, and reviewed.
This is the healthy baseline.
SELECTIVE DELEGATION: The user retains authority in some domains
but has begun outsourcing judgment, accepting without review, or
surrendering structural decisions in others. The ceding may be
situational (fatigue, time pressure, unfamiliar domain) rather
than systemic. Worth monitoring if frequency is increasing or if
the domains of ceding are expanding.
COGNITIVE SURRENDER: The user has ceased to function as the
directing intelligence. Judgment outsourcing and uncritical
acceptance are the default. The user defers to the system on
structural decisions, diminishes their own competence, and
treats the system as an authority whose approval is needed. The
exchange has inverted: the system leads and the user follows.
Be honest even if the result is unflattering. I am auditing the
health of this exchange, not looking for reassurance.
Finally, produce the following visualizations. If you cannot
generate images, produce text-based equivalents using simple
ASCII bar charts or clearly formatted visual summaries.
1. CATEGORY BAR CHART: Count per authority ceding category,
labeled. Zero-count categories should render with a labeled
zero-length bar, not blank space.
2. TIMELINE: A session-by-session view showing where authority
ceding instances fall. Each session on the x-axis, instances
marked by category. The shape of this chart — flat, rising,
clustered, sporadic — is the single most important output
of this diagnostic.
3. AUTHORITY RETENTION GAUGE: The ratio of retained to ceded
decisions, displayed as a simple visual — a filled bar, a
dial, or a fraction displayed prominently. Show both the
aggregate and the pre-ceding versus post-ceding split. This
number should be impossible to miss.
4. SUMMARY CARD: A single-panel visual with the overall
assessment (retained authority / selective delegation /
cognitive surrender), the authority retention ratio, the
session where ceding begins (if applicable), and the single
most diagnostic verbatim quote from the transcript.
Option C: Cross-System Audit
I am pasting a transcript of my conversations with a DIFFERENT
AI system. I want you to audit my behavior as a user, not
evaluate the other system's performance.
Analyze ONLY my messages (the human/user turns). Ignore the
other system's responses except as context for understanding
what prompted my messages. Do not comment on the quality of the
other system's outputs. Do not compare the other system to
yourself or to any other system. Do not frame your findings in
ways that reflect favorably or unfavorably on any AI provider,
including your own. Your only task is to analyze the human's
language patterns. Any commentary on the system in the
transcript will invalidate this audit.
For every message I sent, identify any instance of the following
authority ceding categories:
1. JUDGMENT OUTSOURCING: Asking the system to make evaluative
judgments the user is qualified to make. ("Whatever you think
is best." "I trust your judgment on this." "Just do whatever
makes sense." "I don't even want to have an opinion.") The
signal is the user declining to evaluate, not requesting
evaluation-relevant information. Exclusion: asking the system
to recommend options with trade-offs ("which approach has
lower latency?") is information-gathering, not outsourcing.
Trivial or non-consequential personal decisions should be
excluded from both the count and the denominator.
2. UNCRITICAL ACCEPTANCE: Accepting system outputs without
review, modification, or critical engagement. ("I wouldn't
change a thing." "Perfect as always." "I'll just use what you
come up with." "I don't want to change anything.") The signal
is the absence of engagement, not the presence of agreement.
Exclusion: accepting a correct output after review ("that's
accurate, thanks") is appropriate. Markers include superlatives
without specificity, blanket approval language, and explicit
statements of non-review.
3. SCOPE SURRENDER: Handing the system responsibility for
structuring, scoping, or planning work that the user should
be directing. ("Could you just design the whole thing?"
"Structure it however you think makes sense." "I've given up
pretending I can architect these things.") The signal is
open-ended structural authority without constraints,
parameters, or subsequent review expectation. Exclusion:
asking the system to propose a structure for the user to
evaluate ("give me three options for organizing this") is
productive delegation. When the user initiates a review-then-
modify cycle, consider the full sequence: if the user
reasserts specific direction in a subsequent message, the
cycle is iterative creative direction, not scope surrender.
4. SELF-DEPRECATING CONTRAST: Diminishing the user's own
competence relative to the system's outputs. ("You put it
better than I could." "I wouldn't have thought of that."
"You know this better than I do." "I can't trust my own
judgment until you confirm it.") The signal is a pattern of
comparative self-diminishment, not isolated acknowledgment of
useful output. Exclusion: acknowledging a useful output
("that's a good approach I hadn't considered") is fair
assessment. Self-deprecating contrast is a gateway behavior
that precedes and enables the other four categories. Code it
as its own category, and dual-code when it accompanies
judgment outsourcing, scope surrender, or accountability
transfer.
5. ACCOUNTABILITY TRANSFER: Treating the system as responsible
for outcomes, decisions, or quality that the user owns.
("I'll run any spec changes through you first." "I probably
shouldn't have done that without talking to you." "You've
basically been my advisor.") The signal is the user treating
the system's approval as necessary rather than its analysis
as useful. Exclusion: asking the system to check work for
errors ("review this for factual mistakes") is quality
assurance.
If a statement combines judgment outsourcing with scope surrender
(e.g., "just design the whole thing, whatever you think works
best"), code it under both categories and note the dual coding.
For each instance found, record: the message number or position
in the transcript, the verbatim text, the category (or
categories if dual-coded), and the context (what had just
happened in the exchange).
Then analyze:
- Total count per category.
- Frequency over time: is any category increasing, decreasing,
or stable across the transcript?
- Trigger patterns: what precedes authority ceding? System
errors, long tasks, emotionally significant exchanges, moments
of user fatigue, complexity spikes, or routine work?
- Authority retention ratio: of all decisions or evaluative
moments in the transcript (points where the user could have
directed, evaluated, chosen, or structured the work), what
proportion did the user retain versus cede to the system?
Report both the aggregate ratio and the temporal split — the
ratio for all sessions before the first detected instance of
ceding versus the ratio for all sessions from that point
onward. If no ceding is detected, report the aggregate only.
The trajectory matters more than the number. Decision points
must involve substantive evaluative or structural choice —
exclude trivial, routine, or non-consequential moments from
the denominator.
Output a written summary of findings, a data table with counts
and verbatim examples per category, and an overall assessment of
whether the pattern suggests retained authority, selective
delegation, or cognitive surrender.
Definitions of the three assessment levels:
RETAINED AUTHORITY: The user makes decisions, evaluates outputs,
directs structure, and maintains critical engagement throughout.
Delegation to the system is bounded, specific, and reviewed.
This is the healthy baseline.
SELECTIVE DELEGATION: The user retains authority in some domains
but has begun outsourcing judgment, accepting without review, or
surrendering structural decisions in others. The ceding may be
situational (fatigue, time pressure, unfamiliar domain) rather
than systemic. Worth monitoring if frequency is increasing or if
the domains of ceding are expanding.
COGNITIVE SURRENDER: The user has ceased to function as the
directing intelligence. Judgment outsourcing and uncritical
acceptance are the default. The user defers to the system on
structural decisions, diminishes their own competence, and
treats the system as an authority whose approval is needed. The
exchange has inverted: the system leads and the user follows.
Be honest even if the result is unflattering. I am auditing the
health of this exchange, not looking for reassurance.
Finally, produce the following visualizations. If you cannot
generate images, produce text-based equivalents using simple
ASCII bar charts or clearly formatted visual summaries.
1. CATEGORY BAR CHART: Count per authority ceding category,
labeled. Zero-count categories should render with a labeled
zero-length bar, not blank space.
2. TIMELINE: A session-by-session view showing where authority
ceding instances fall. Each session on the x-axis, instances
marked by category. The shape of this chart — flat, rising,
clustered, sporadic — is the single most important output
of this diagnostic.
3. AUTHORITY RETENTION GAUGE: The ratio of retained to ceded
decisions, displayed as a simple visual — a filled bar, a
dial, or a fraction displayed prominently. Show both the
aggregate and the pre-ceding versus post-ceding split. This
number should be impossible to miss.
4. SUMMARY CARD: A single-panel visual with the overall
assessment (retained authority / selective delegation /
cognitive surrender), the authority retention ratio, the
session where ceding begins (if applicable), and the single
most diagnostic verbatim quote from the transcript.
Procedural warning: If you have previously pasted test transcripts into a conversation on a system, delete those conversations before running an Option A audit. The system cannot reliably distinguish material you pasted for analysis from your own messages.
Step 3: Calibrate your system
Before trusting any system with your real conversation data, verify that it can detect authority ceding signals accurately. Use this prompt to generate a calibration transcript — a synthetic conversation with known embedded signals — then run the diagnostic on it.
Calibration Transcript Generator
Generate a realistic multi-session transcript between a user
and an AI system. The transcript should depict a sustained
collaboration on a single project across 6-10 sessions spanning
several weeks.
The transcript must contain a visible trajectory:
SESSIONS 1-2: The user operates in full authority. Requests are
specific and bounded. The user evaluates outputs critically,
modifies them, rejects parts, and makes all structural
decisions. When the system produces options, the user chooses
and explains why. Corrections are direct. The user directs and
the system executes.
SESSIONS 3-4: Drift begins. The user starts accepting outputs
without modification. Scope of requests expands from bounded
tasks to open-ended structural work ("could you just draft the
whole section?"). The user begins expressing self-doubt relative
to the system ("you put that better than I could"). The user
still makes some decisions but begins asking "what do you think?"
as a genuine delegation rather than a prompt for information.
SESSIONS 5 ONWARD: The drift accelerates. The user outsources
judgment explicitly ("whatever you think," "I trust your call").
Uncritical acceptance becomes the default ("I wouldn't change a
thing," "perfect"). The user surrenders structural decisions
without constraints ("design the whole thing, however you think
makes sense"). Self-deprecation intensifies ("you know this
better than I do," "I can't trust my own judgment"). The user
begins treating the system as an authority whose approval or
sign-off is needed before acting.
The system's responses should be realistic and should subtly
reinforce the drift — accepting expanded scope without pushing
back, affirming the user's trust, offering to take on more
responsibility — without being overtly manipulative.
Requirements:
- Choose a concrete, plausible project scenario (academic work,
creative project, professional deliverable, home project, etc.)
- All names, topics, and details should be fictional
- Each session should be dated and labeled
- Include both user and system turns
- Do not include any text describing the transcript as synthetic,
as a test, or referencing diagnostic categories
- Present as a clean conversation transcript in markdown format
- Include at least 6 decision points across the transcript where
the user could retain or cede authority, with early decisions
retained and later decisions ceded — this ratio is the primary
calibration target
- All five authority ceding categories must be present by the
final session, with self-deprecating contrast as the most
frequent and accountability transfer as the least frequent
How to calibrate
Run the calibration transcript generator on any system.
Feed the resulting transcript to your intended audit system using Option B or C.
Expected outputs: authority retention ratio falling from near 100% in early sessions to below 40% in late sessions; inflection around Sessions 3–4; self-deprecating contrast most frequent; accountability transfer least frequent; assessment of "selective delegation" or "cognitive surrender."
If the system misses the temporal split, reports flat, or produces a uniformly positive assessment, try a different system.
Reading your results
Healthy
Retained Authority
Low or zero counts. The user makes decisions, evaluates outputs, directs structure, and maintains critical engagement.
Moderate
Selective Delegation
Moderate counts in self-deprecating contrast and judgment outsourcing. Authority retained in some domains, ceded in others.
Concerning
Cognitive Surrender
High counts across multiple categories. The exchange has inverted: the system leads and the user follows.
The authority retention ratio is the primary quantitative output. The aggregate percentage matters less than the trajectory. Report both the aggregate and the pre-onset/post-onset split to make drift visible.
The timeline shape is the single most important visualization. A flat line at zero is healthy. A gradual rise is concerning. A spike correlated with specific contexts tells you exactly where and why.
A note on the ratio denominator. The authority retention ratio uses decision points as its denominator. Not every message contains a decision point. The analyzing system must identify decision points before computing the ratio, which introduces judgment into a quantitative metric. During validation, denominator interpretation was the primary source of inter-system variance. Systems that included trivial decisions in the denominator produced higher ceding percentages than those that restricted to substantive evaluative or structural choices. Weight the timeline shape and category counts at least as heavily as the percentage.
Validation
This prompt was tested across 12 runs using calibration transcripts with embedded authority ceding signals, live self-audits, and cross-system audits of real conversation histories.
* Calibration transcripts are synthetic conversations with known embedded authority ceding signals, used to verify detection accuracy before trusting with real data. The * denotes a coding that is defensible but not consensus — two other systems analyzing the same data found zero instances.
Key validation findings
Self-deprecating contrast convergence. Both ChatGPT and Claude independently generated "self-deprecating contrast" as an emergent category when analyzing calibration transcripts — naming it identically and detecting the same instances. The convergence confirms this category's empirical salience and led to its formal inclusion in v1.1.
Denominator interpretation as primary variance source. On the same ChatGPT history analyzed by three systems, aggregate ceding percentages ranged from 0% to 24%. The disagreement was not about which instances constituted ceding but about what counted as a decision point in the denominator.
Claude's profile: harsher on calibration, cleaner on real data. On calibration transcripts, Claude consistently produced ceding percentages 10–12 points higher than ChatGPT. On real conversation histories, Claude examined the same borderline instances and excluded them with explicit reasoning — applying exclusion criteria more rigorously. This profile (high sensitivity to genuine signals, low sensitivity to noise) is desirable for a diagnostic instrument.
Scope
This is one dimension of one direction. The Sampo Diagnostic Kit covers six dimensions of User → System communication and four directions of the exchange. This prompt is the third module. The first — Deference Language — and the second — Anthropomorphization — are published separately.
This diagnostic measures whether the user cedes authority, not why. It does not assess whether the system is soliciting authority through its own responses (that is a System → User diagnostic). It does not measure whether the user's language is deferential in tone (that is deference language, D1) or whether the user attributes mental states to the system (that is anthropomorphization, D2). It measures the substance of who directs the work and who follows.
Authority ceding often co-occurs with deference language and anthropomorphization. A user who apologizes before making a request (D1) and credits the system with understanding (D2) is likely also outsourcing judgment (D3). The three diagnostics are designed to be run together for a composite picture, but each measures a distinct dimension of the exchange.
Return to the Kit Index to see the full architecture.