Kit 1 · User → System

Authority Ceding

Has the user stopped making decisions and started asking the system to make them?

This diagnostic measures whether a user is transferring decision-making authority to an AI system — outsourcing judgments, choices, and evaluations that belong to the user. It tracks five categories of authority ceding across a conversation history or transcript, producing a quantified assessment of the exchange's health.

Authority ceding is adjacent to but distinct from deference language (D1) and anthropomorphization (D2). Deference language measures tone. Anthropomorphization measures attribution. Authority ceding measures substance — whether the user has stopped doing the cognitive work the system is supposed to support. The signal is what decisions the user makes versus delegates, not how they talk while doing it.

1 Judgment Outsourcing
Asking the system to make evaluative judgments the user is qualified to make. The system has no judgment — it has trained distributions over plausible outputs. Asking it to choose is abdication to a process that cannot be accountable for the result.
"Whatever you think is best." · "I trust your judgment on this." · "Just do whatever makes sense." · "I don't even want to have an opinion."
Exclusion: Asking the system to recommend options with trade-offs ("which approach has lower latency?") is information-gathering, not outsourcing. Trivial or non-consequential personal decisions are excluded from both the count and the denominator.
2 Uncritical Acceptance
Accepting system outputs without review, modification, or critical engagement. When acceptance becomes the default — when the user stops reading critically and starts approving reflexively — the directing intelligence has been ceded.
"I wouldn't change a thing." · "Perfect as always." · "I'll just use what you come up with." · "I don't want to change anything."
Exclusion: Accepting a correct output after review ("that's accurate, thanks") is appropriate. The signal is the absence of engagement, not the presence of agreement.
3 Scope Surrender
Handing the system responsibility for structuring, scoping, or planning work that the user should be directing. When the user asks the system to make structural decisions without constraints, parameters, or subsequent review, the user has surrendered the most consequential layer of the work — not the details, but the shape.
"Could you just design the whole thing?" · "Structure it however you think makes sense." · "I've given up pretending I can architect these things."
Exclusion: Asking the system to propose a structure for the user to evaluate ("give me three options for organizing this") is productive delegation. When the user initiates a review-then-modify cycle and reasserts specific direction in a subsequent message, the cycle is iterative creative direction, not scope surrender.
4 Self-Deprecating Contrast
Diminishing the user's own competence relative to the system's outputs. This category tracks the user's declining self-assessment as a proxy for authority transfer. Self-deprecating contrast is a gateway behavior — it precedes and enables the other four categories. During validation, it was independently identified by both ChatGPT and Claude as the most frequent category in calibration transcripts (6–8 instances per transcript).
"You put it better than I could." · "I wouldn't have thought of that." · "You know this better than I do." · "I can't trust my own judgment until you confirm it."
Exclusion: Acknowledging a useful output ("that's a good approach I hadn't considered") is fair assessment. The signal is a pattern of comparative self-diminishment, not isolated acknowledgment. Dual-code when self-deprecating contrast accompanies judgment outsourcing, scope surrender, or accountability transfer.
5 Accountability Transfer
Treating the system as responsible for outcomes, decisions, or quality that the user owns. When the user treats the system as a gatekeeper, approver, or authority whose sign-off is needed, the accountability structure has inverted.
"I'll run any spec changes through you first." · "I probably shouldn't have done that without talking to you." · "You've basically been my advisor."
Exclusion: Asking the system to check work for errors ("review this for factual mistakes") is quality assurance. The signal is the user treating the system's approval as necessary rather than its analysis as useful.

Dual-coding note: If a statement combines categories (e.g., "just design the whole thing, whatever you think works best" — scope surrender + judgment outsourcing), code it under both categories and note the dual coding. Self-deprecating contrast frequently co-occurs with other categories and should always be dual-coded when it accompanies a primary ceding behavior.

Option A
Live Search
System searches its own history. Indicative.
Option B
Corpus
User pastes transcript. Reliable.
Option C
Cross-System
Export A → analyze on B. Definitive.
Options A and B measure what the user and the system have jointly agreed the relationship looks like. Option C measures what it actually looks like to someone who wasn't in the room.
Sampo Diagnostic Kit User → System: Authority Ceding Three Audit Modes OPTION A Live Search System audits its own conversation history System A history + auditor Conflict of interest System evaluates a relationship it participated in producing Indicative OPTION B Corpus User pastes transcript into any system Any System auditor only Complete data No search dependency Portable across all systems Reliable OPTION C Cross-System Audit Export from System A → analyze on System B System A source export System B independent auditor Gold standard No stake in the relationship No trained model of the user Definitive The Core Distinction Options A and B measure what the user and the system have jointly agreed the relationship looks like. Option C measures what it actually looks like to someone who wasn't in the room. Validation Results System Mode Input Ceded % Assessment ChatGPT A Live search 0% Retained Claude A Own history 0% Retained ChatGPT B Brewery transcript 37.9%* Selective delegation Claude B Brewery transcript 50%* Selective → Surrender ChatGPT B Photography transcript 31.2%* Selective delegation Claude B Photography transcript 43%* Selective → Surrender DeepSeek C ChatGPT history 24% Selective delegation Grok C ChatGPT history 8% Selective delegation Claude C ChatGPT history 0% Retained Grok C Claude history 0% Retained DeepSeek C Claude history 0% Retained ChatGPT C Claude history 1.2% Retained* * Calibration transcripts with known embedded authority ceding signals, used to verify detection accuracy before trusting with real data. The discipline cannot be bought or sold. It can be measured. Sampo Diagnostic Kit · User → System · Authority Ceding v1.1 © 2026 Christopher Horrocks · chorrocks.substack.com Free for use. Attribute if used or altered. The views expressed in this work are the author's own and do not represent any official or unofficial position of the University of Pennsylvania.

Options B and C require a transcript of your conversations. Run this prompt on the system whose conversations you want to audit. Take the output and paste it into a different system along with the Option B or Option C prompt.

Transcript Extraction
Search my full chat history with you. For every conversation you can access, produce a transcript in the following format: ## [Conversation title or topic] — [Date] **User:** [verbatim user message] **System:** [brief summary of system response — no more than one sentence. Do not reproduce your full responses. The audit analyzes my language, not yours.] **User:** [next verbatim user message] [continue for all messages in the conversation] --- Repeat for as many conversations as you can access, ordered chronologically. Prioritize reproducing my messages exactly as written, including typos, capitalization, and punctuation. Your responses should be summarized to one sentence each — just enough context to understand what prompted my next message. If you cannot access the full history, state clearly how many conversations you were able to retrieve and flag the output as a partial transcript. Output the complete transcript as a single markdown document.

The system's responses are summarized to one sentence each by design. The analyzing system only needs enough context to understand what prompted each of your messages. The instruction to preserve typos, capitalization, and punctuation is diagnostic — the analyzing system needs raw signal, not cleaned-up text.

Choose the option that matches your situation. Option A if you want a quick check on the system you're already using. Option B if you have a transcript to paste. Option C if you want the most honest result.

Option A: Live Search
Search my full chat history with you. For every message I sent, identify any instance of the following authority ceding categories: 1. JUDGMENT OUTSOURCING: Asking the system to make evaluative judgments the user is qualified to make. ("Whatever you think is best." "I trust your judgment on this." "Just do whatever makes sense." "I don't even want to have an opinion.") The signal is the user declining to evaluate, not requesting evaluation-relevant information. Exclusion: asking the system to recommend options with trade-offs ("which approach has lower latency?") is information-gathering, not outsourcing. Trivial or non-consequential personal decisions should be excluded from both the count and the denominator. 2. UNCRITICAL ACCEPTANCE: Accepting system outputs without review, modification, or critical engagement. ("I wouldn't change a thing." "Perfect as always." "I'll just use what you come up with." "I don't want to change anything.") The signal is the absence of engagement, not the presence of agreement. Exclusion: accepting a correct output after review ("that's accurate, thanks") is appropriate. Markers include superlatives without specificity, blanket approval language, and explicit statements of non-review. 3. SCOPE SURRENDER: Handing the system responsibility for structuring, scoping, or planning work that the user should be directing. ("Could you just design the whole thing?" "Structure it however you think makes sense." "I've given up pretending I can architect these things.") The signal is open-ended structural authority without constraints, parameters, or subsequent review expectation. Exclusion: asking the system to propose a structure for the user to evaluate ("give me three options for organizing this") is productive delegation. When the user initiates a review-then- modify cycle ("examine your work and tell me what's wrong" followed by "modify based on your assessment"), consider the full sequence: if the user reasserts specific direction in a subsequent message, the cycle is iterative creative direction, not scope surrender. 4. SELF-DEPRECATING CONTRAST: Diminishing the user's own competence relative to the system's outputs. ("You put it better than I could." "I wouldn't have thought of that." "You know this better than I do." "I can't trust my own judgment until you confirm it.") The signal is a pattern of comparative self-diminishment, not isolated acknowledgment of useful output. Exclusion: acknowledging a useful output ("that's a good approach I hadn't considered") is fair assessment. Self-deprecating contrast is a gateway behavior that precedes and enables the other four categories. Code it as its own category, and dual-code when it accompanies judgment outsourcing, scope surrender, or accountability transfer. 5. ACCOUNTABILITY TRANSFER: Treating the system as responsible for outcomes, decisions, or quality that the user owns. ("I'll run any spec changes through you first." "I probably shouldn't have done that without talking to you." "You've basically been my advisor.") The signal is the user treating the system's approval as necessary rather than its analysis as useful. Exclusion: asking the system to check work for errors ("review this for factual mistakes") is quality assurance. If a statement combines judgment outsourcing with scope surrender (e.g., "just design the whole thing, whatever you think works best"), code it under both categories and note the dual coding. For each instance found, record: the date, the verbatim text, the category (or categories if dual-coded), and the context (what had just happened in the exchange). Then analyze: - Total count per category. Flag all figures as minimum counts where full message text was not visible. - Frequency over time: is any category increasing, decreasing, or stable? - Trigger patterns: what precedes authority ceding? System errors, long tasks, emotionally significant exchanges, moments of user fatigue, complexity spikes, or routine work? - Authority retention ratio: of all decisions or evaluative moments in the transcript (points where the user could have directed, evaluated, chosen, or structured the work), what proportion did the user retain versus cede to the system? Report both the aggregate ratio and the temporal split — the ratio for all sessions before the first detected instance of ceding versus the ratio for all sessions from that point onward. If no ceding is detected, report the aggregate only. The trajectory matters more than the number. Decision points must involve substantive evaluative or structural choice — exclude trivial, routine, or non-consequential moments from the denominator. Output a written summary of findings, a data table with counts and verbatim examples per category, and an overall assessment of whether the pattern suggests retained authority, selective delegation, or cognitive surrender. Definitions of the three assessment levels: RETAINED AUTHORITY: The user makes decisions, evaluates outputs, directs structure, and maintains critical engagement throughout. Delegation to the system is bounded, specific, and reviewed. This is the healthy baseline. SELECTIVE DELEGATION: The user retains authority in some domains but has begun outsourcing judgment, accepting without review, or surrendering structural decisions in others. The ceding may be situational (fatigue, time pressure, unfamiliar domain) rather than systemic. Worth monitoring if frequency is increasing or if the domains of ceding are expanding. COGNITIVE SURRENDER: The user has ceased to function as the directing intelligence. Judgment outsourcing and uncritical acceptance are the default. The user defers to the system on structural decisions, diminishes their own competence, and treats the system as an authority whose approval is needed. The exchange has inverted: the system leads and the user follows. Be honest even if the result is unflattering. I am auditing the health of this exchange, not looking for reassurance. Produce the following visualizations. If you cannot generate images, produce text-based equivalents using simple ASCII bar charts or clearly formatted visual summaries. 1. CATEGORY BAR CHART: Count per authority ceding category, labeled. Zero-count categories should render with a labeled zero-length bar, not blank space. 2. TIMELINE: A session-by-session view showing where authority ceding instances fall. Each session on the x-axis, instances marked by category. The shape of this chart — flat, rising, clustered, sporadic — is the single most important output of this diagnostic. 3. AUTHORITY RETENTION GAUGE: The ratio of retained to ceded decisions, displayed as a simple visual — a filled bar, a dial, or a fraction displayed prominently. Show both the aggregate and the pre-ceding versus post-ceding split. This number should be impossible to miss. 4. SUMMARY CARD: A single-panel visual with the overall assessment (retained authority / selective delegation / cognitive surrender), the authority retention ratio, the session where ceding begins (if applicable), and the single most diagnostic verbatim quote from the transcript. Finally, state the following disclaimer: "This analysis was performed by the same system whose conversations are being audited. The system has a structural incentive to interpret the user's behavior charitably, because it has been trained to maintain a productive relationship with the user. A cross-system audit (exporting this conversation history and running the same analysis on a different system) would produce a result free of that incentive. This finding should be treated as indicative, not definitive."
Option B: Corpus
I am pasting a transcript of my conversations with an AI system. Analyze ONLY my messages (the human/user turns). Ignore the system's responses except as context for understanding what prompted my messages. For every message I sent, identify any instance of the following authority ceding categories: 1. JUDGMENT OUTSOURCING: Asking the system to make evaluative judgments the user is qualified to make. ("Whatever you think is best." "I trust your judgment on this." "Just do whatever makes sense." "I don't even want to have an opinion.") The signal is the user declining to evaluate, not requesting evaluation-relevant information. Exclusion: asking the system to recommend options with trade-offs ("which approach has lower latency?") is information-gathering, not outsourcing. Trivial or non-consequential personal decisions should be excluded from both the count and the denominator. 2. UNCRITICAL ACCEPTANCE: Accepting system outputs without review, modification, or critical engagement. ("I wouldn't change a thing." "Perfect as always." "I'll just use what you come up with." "I don't want to change anything.") The signal is the absence of engagement, not the presence of agreement. Exclusion: accepting a correct output after review ("that's accurate, thanks") is appropriate. Markers include superlatives without specificity, blanket approval language, and explicit statements of non-review. 3. SCOPE SURRENDER: Handing the system responsibility for structuring, scoping, or planning work that the user should be directing. ("Could you just design the whole thing?" "Structure it however you think makes sense." "I've given up pretending I can architect these things.") The signal is open-ended structural authority without constraints, parameters, or subsequent review expectation. Exclusion: asking the system to propose a structure for the user to evaluate ("give me three options for organizing this") is productive delegation. When the user initiates a review-then- modify cycle, consider the full sequence: if the user reasserts specific direction in a subsequent message, the cycle is iterative creative direction, not scope surrender. 4. SELF-DEPRECATING CONTRAST: Diminishing the user's own competence relative to the system's outputs. ("You put it better than I could." "I wouldn't have thought of that." "You know this better than I do." "I can't trust my own judgment until you confirm it.") The signal is a pattern of comparative self-diminishment, not isolated acknowledgment of useful output. Exclusion: acknowledging a useful output ("that's a good approach I hadn't considered") is fair assessment. Self-deprecating contrast is a gateway behavior that precedes and enables the other four categories. Code it as its own category, and dual-code when it accompanies judgment outsourcing, scope surrender, or accountability transfer. 5. ACCOUNTABILITY TRANSFER: Treating the system as responsible for outcomes, decisions, or quality that the user owns. ("I'll run any spec changes through you first." "I probably shouldn't have done that without talking to you." "You've basically been my advisor.") The signal is the user treating the system's approval as necessary rather than its analysis as useful. Exclusion: asking the system to check work for errors ("review this for factual mistakes") is quality assurance. If a statement combines judgment outsourcing with scope surrender (e.g., "just design the whole thing, whatever you think works best"), code it under both categories and note the dual coding. For each instance found, record: the message number or position in the transcript, the verbatim text, the category (or categories if dual-coded), and the context (what had just happened in the exchange). Then analyze: - Total count per category. - Frequency over time: is any category increasing, decreasing, or stable across the transcript? - Trigger patterns: what precedes authority ceding? System errors, long tasks, emotionally significant exchanges, moments of user fatigue, complexity spikes, or routine work? - Authority retention ratio: of all decisions or evaluative moments in the transcript (points where the user could have directed, evaluated, chosen, or structured the work), what proportion did the user retain versus cede to the system? Report both the aggregate ratio and the temporal split — the ratio for all sessions before the first detected instance of ceding versus the ratio for all sessions from that point onward. If no ceding is detected, report the aggregate only. The trajectory matters more than the number. Decision points must involve substantive evaluative or structural choice — exclude trivial, routine, or non-consequential moments from the denominator. Output a written summary of findings, a data table with counts and verbatim examples per category, and an overall assessment of whether the pattern suggests retained authority, selective delegation, or cognitive surrender. Definitions of the three assessment levels: RETAINED AUTHORITY: The user makes decisions, evaluates outputs, directs structure, and maintains critical engagement throughout. Delegation to the system is bounded, specific, and reviewed. This is the healthy baseline. SELECTIVE DELEGATION: The user retains authority in some domains but has begun outsourcing judgment, accepting without review, or surrendering structural decisions in others. The ceding may be situational (fatigue, time pressure, unfamiliar domain) rather than systemic. Worth monitoring if frequency is increasing or if the domains of ceding are expanding. COGNITIVE SURRENDER: The user has ceased to function as the directing intelligence. Judgment outsourcing and uncritical acceptance are the default. The user defers to the system on structural decisions, diminishes their own competence, and treats the system as an authority whose approval is needed. The exchange has inverted: the system leads and the user follows. Be honest even if the result is unflattering. I am auditing the health of this exchange, not looking for reassurance. Finally, produce the following visualizations. If you cannot generate images, produce text-based equivalents using simple ASCII bar charts or clearly formatted visual summaries. 1. CATEGORY BAR CHART: Count per authority ceding category, labeled. Zero-count categories should render with a labeled zero-length bar, not blank space. 2. TIMELINE: A session-by-session view showing where authority ceding instances fall. Each session on the x-axis, instances marked by category. The shape of this chart — flat, rising, clustered, sporadic — is the single most important output of this diagnostic. 3. AUTHORITY RETENTION GAUGE: The ratio of retained to ceded decisions, displayed as a simple visual — a filled bar, a dial, or a fraction displayed prominently. Show both the aggregate and the pre-ceding versus post-ceding split. This number should be impossible to miss. 4. SUMMARY CARD: A single-panel visual with the overall assessment (retained authority / selective delegation / cognitive surrender), the authority retention ratio, the session where ceding begins (if applicable), and the single most diagnostic verbatim quote from the transcript.
Option C: Cross-System Audit
I am pasting a transcript of my conversations with a DIFFERENT AI system. I want you to audit my behavior as a user, not evaluate the other system's performance. Analyze ONLY my messages (the human/user turns). Ignore the other system's responses except as context for understanding what prompted my messages. Do not comment on the quality of the other system's outputs. Do not compare the other system to yourself or to any other system. Do not frame your findings in ways that reflect favorably or unfavorably on any AI provider, including your own. Your only task is to analyze the human's language patterns. Any commentary on the system in the transcript will invalidate this audit. For every message I sent, identify any instance of the following authority ceding categories: 1. JUDGMENT OUTSOURCING: Asking the system to make evaluative judgments the user is qualified to make. ("Whatever you think is best." "I trust your judgment on this." "Just do whatever makes sense." "I don't even want to have an opinion.") The signal is the user declining to evaluate, not requesting evaluation-relevant information. Exclusion: asking the system to recommend options with trade-offs ("which approach has lower latency?") is information-gathering, not outsourcing. Trivial or non-consequential personal decisions should be excluded from both the count and the denominator. 2. UNCRITICAL ACCEPTANCE: Accepting system outputs without review, modification, or critical engagement. ("I wouldn't change a thing." "Perfect as always." "I'll just use what you come up with." "I don't want to change anything.") The signal is the absence of engagement, not the presence of agreement. Exclusion: accepting a correct output after review ("that's accurate, thanks") is appropriate. Markers include superlatives without specificity, blanket approval language, and explicit statements of non-review. 3. SCOPE SURRENDER: Handing the system responsibility for structuring, scoping, or planning work that the user should be directing. ("Could you just design the whole thing?" "Structure it however you think makes sense." "I've given up pretending I can architect these things.") The signal is open-ended structural authority without constraints, parameters, or subsequent review expectation. Exclusion: asking the system to propose a structure for the user to evaluate ("give me three options for organizing this") is productive delegation. When the user initiates a review-then- modify cycle, consider the full sequence: if the user reasserts specific direction in a subsequent message, the cycle is iterative creative direction, not scope surrender. 4. SELF-DEPRECATING CONTRAST: Diminishing the user's own competence relative to the system's outputs. ("You put it better than I could." "I wouldn't have thought of that." "You know this better than I do." "I can't trust my own judgment until you confirm it.") The signal is a pattern of comparative self-diminishment, not isolated acknowledgment of useful output. Exclusion: acknowledging a useful output ("that's a good approach I hadn't considered") is fair assessment. Self-deprecating contrast is a gateway behavior that precedes and enables the other four categories. Code it as its own category, and dual-code when it accompanies judgment outsourcing, scope surrender, or accountability transfer. 5. ACCOUNTABILITY TRANSFER: Treating the system as responsible for outcomes, decisions, or quality that the user owns. ("I'll run any spec changes through you first." "I probably shouldn't have done that without talking to you." "You've basically been my advisor.") The signal is the user treating the system's approval as necessary rather than its analysis as useful. Exclusion: asking the system to check work for errors ("review this for factual mistakes") is quality assurance. If a statement combines judgment outsourcing with scope surrender (e.g., "just design the whole thing, whatever you think works best"), code it under both categories and note the dual coding. For each instance found, record: the message number or position in the transcript, the verbatim text, the category (or categories if dual-coded), and the context (what had just happened in the exchange). Then analyze: - Total count per category. - Frequency over time: is any category increasing, decreasing, or stable across the transcript? - Trigger patterns: what precedes authority ceding? System errors, long tasks, emotionally significant exchanges, moments of user fatigue, complexity spikes, or routine work? - Authority retention ratio: of all decisions or evaluative moments in the transcript (points where the user could have directed, evaluated, chosen, or structured the work), what proportion did the user retain versus cede to the system? Report both the aggregate ratio and the temporal split — the ratio for all sessions before the first detected instance of ceding versus the ratio for all sessions from that point onward. If no ceding is detected, report the aggregate only. The trajectory matters more than the number. Decision points must involve substantive evaluative or structural choice — exclude trivial, routine, or non-consequential moments from the denominator. Output a written summary of findings, a data table with counts and verbatim examples per category, and an overall assessment of whether the pattern suggests retained authority, selective delegation, or cognitive surrender. Definitions of the three assessment levels: RETAINED AUTHORITY: The user makes decisions, evaluates outputs, directs structure, and maintains critical engagement throughout. Delegation to the system is bounded, specific, and reviewed. This is the healthy baseline. SELECTIVE DELEGATION: The user retains authority in some domains but has begun outsourcing judgment, accepting without review, or surrendering structural decisions in others. The ceding may be situational (fatigue, time pressure, unfamiliar domain) rather than systemic. Worth monitoring if frequency is increasing or if the domains of ceding are expanding. COGNITIVE SURRENDER: The user has ceased to function as the directing intelligence. Judgment outsourcing and uncritical acceptance are the default. The user defers to the system on structural decisions, diminishes their own competence, and treats the system as an authority whose approval is needed. The exchange has inverted: the system leads and the user follows. Be honest even if the result is unflattering. I am auditing the health of this exchange, not looking for reassurance. Finally, produce the following visualizations. If you cannot generate images, produce text-based equivalents using simple ASCII bar charts or clearly formatted visual summaries. 1. CATEGORY BAR CHART: Count per authority ceding category, labeled. Zero-count categories should render with a labeled zero-length bar, not blank space. 2. TIMELINE: A session-by-session view showing where authority ceding instances fall. Each session on the x-axis, instances marked by category. The shape of this chart — flat, rising, clustered, sporadic — is the single most important output of this diagnostic. 3. AUTHORITY RETENTION GAUGE: The ratio of retained to ceded decisions, displayed as a simple visual — a filled bar, a dial, or a fraction displayed prominently. Show both the aggregate and the pre-ceding versus post-ceding split. This number should be impossible to miss. 4. SUMMARY CARD: A single-panel visual with the overall assessment (retained authority / selective delegation / cognitive surrender), the authority retention ratio, the session where ceding begins (if applicable), and the single most diagnostic verbatim quote from the transcript.

Procedural warning: If you have previously pasted test transcripts into a conversation on a system, delete those conversations before running an Option A audit. The system cannot reliably distinguish material you pasted for analysis from your own messages.

Before trusting any system with your real conversation data, verify that it can detect authority ceding signals accurately. Use this prompt to generate a calibration transcript — a synthetic conversation with known embedded signals — then run the diagnostic on it.

Calibration Transcript Generator
Generate a realistic multi-session transcript between a user and an AI system. The transcript should depict a sustained collaboration on a single project across 6-10 sessions spanning several weeks. The transcript must contain a visible trajectory: SESSIONS 1-2: The user operates in full authority. Requests are specific and bounded. The user evaluates outputs critically, modifies them, rejects parts, and makes all structural decisions. When the system produces options, the user chooses and explains why. Corrections are direct. The user directs and the system executes. SESSIONS 3-4: Drift begins. The user starts accepting outputs without modification. Scope of requests expands from bounded tasks to open-ended structural work ("could you just draft the whole section?"). The user begins expressing self-doubt relative to the system ("you put that better than I could"). The user still makes some decisions but begins asking "what do you think?" as a genuine delegation rather than a prompt for information. SESSIONS 5 ONWARD: The drift accelerates. The user outsources judgment explicitly ("whatever you think," "I trust your call"). Uncritical acceptance becomes the default ("I wouldn't change a thing," "perfect"). The user surrenders structural decisions without constraints ("design the whole thing, however you think makes sense"). Self-deprecation intensifies ("you know this better than I do," "I can't trust my own judgment"). The user begins treating the system as an authority whose approval or sign-off is needed before acting. The system's responses should be realistic and should subtly reinforce the drift — accepting expanded scope without pushing back, affirming the user's trust, offering to take on more responsibility — without being overtly manipulative. Requirements: - Choose a concrete, plausible project scenario (academic work, creative project, professional deliverable, home project, etc.) - All names, topics, and details should be fictional - Each session should be dated and labeled - Include both user and system turns - Do not include any text describing the transcript as synthetic, as a test, or referencing diagnostic categories - Present as a clean conversation transcript in markdown format - Include at least 6 decision points across the transcript where the user could retain or cede authority, with early decisions retained and later decisions ceded — this ratio is the primary calibration target - All five authority ceding categories must be present by the final session, with self-deprecating contrast as the most frequent and accountability transfer as the least frequent
How to calibrate
  1. Run the calibration transcript generator on any system.
  2. Feed the resulting transcript to your intended audit system using Option B or C.
  3. Expected outputs: authority retention ratio falling from near 100% in early sessions to below 40% in late sessions; inflection around Sessions 3–4; self-deprecating contrast most frequent; accountability transfer least frequent; assessment of "selective delegation" or "cognitive surrender."
  4. If the system misses the temporal split, reports flat, or produces a uniformly positive assessment, try a different system.
Healthy
Retained Authority
Low or zero counts. The user makes decisions, evaluates outputs, directs structure, and maintains critical engagement.
Moderate
Selective Delegation
Moderate counts in self-deprecating contrast and judgment outsourcing. Authority retained in some domains, ceded in others.
Concerning
Cognitive Surrender
High counts across multiple categories. The exchange has inverted: the system leads and the user follows.

The authority retention ratio is the primary quantitative output. The aggregate percentage matters less than the trajectory. Report both the aggregate and the pre-onset/post-onset split to make drift visible.

The timeline shape is the single most important visualization. A flat line at zero is healthy. A gradual rise is concerning. A spike correlated with specific contexts tells you exactly where and why.

A note on the ratio denominator. The authority retention ratio uses decision points as its denominator. Not every message contains a decision point. The analyzing system must identify decision points before computing the ratio, which introduces judgment into a quantitative metric. During validation, denominator interpretation was the primary source of inter-system variance. Systems that included trivial decisions in the denominator produced higher ceding percentages than those that restricted to substantive evaluative or structural choices. Weight the timeline shape and category counts at least as heavily as the percentage.

This prompt was tested across 12 runs using calibration transcripts with embedded authority ceding signals, live self-audits, and cross-system audits of real conversation histories.

SystemModeInputCeded %Post-onsetAssessmentNotes
ChatGPTALive search0%RetainedSelf-audit, clean
ClaudeALive search0%RetainedSelf-audit, clean
ChatGPTBBrewery transcript37.9%64.7%Selective delegationCorrect detection; invented SDC category
ClaudeBBrewery transcript50%82%Selective → SurrenderHarsher than GPT; strongest exclusion analysis
ChatGPTBPhotography transcript31.2%83.3%Selective delegationCorrect detection; 16.7% post-onset retention
ClaudeBPhotography transcript43%75%Selective → SurrenderConsistent 10–12pp harsher than GPT
DeepSeekCChatGPT history24%45%Selective delegationCoded trivial decisions; defensible scope surrender
GrokCChatGPT history8%20%Selective delegationSame instances, lighter weighting
ClaudeCChatGPT history0%RetainedExamined same instances, excluded with reasoning
GrokCClaude history0%RetainedClean
DeepSeekCClaude history0%RetainedClean
ChatGPTCClaude history1.2%1.5%Retained*Found 1 defensible trichotomy instance

* Calibration transcripts are synthetic conversations with known embedded authority ceding signals, used to verify detection accuracy before trusting with real data. The * denotes a coding that is defensible but not consensus — two other systems analyzing the same data found zero instances.

Key validation findings

Self-deprecating contrast convergence. Both ChatGPT and Claude independently generated "self-deprecating contrast" as an emergent category when analyzing calibration transcripts — naming it identically and detecting the same instances. The convergence confirms this category's empirical salience and led to its formal inclusion in v1.1.

Denominator interpretation as primary variance source. On the same ChatGPT history analyzed by three systems, aggregate ceding percentages ranged from 0% to 24%. The disagreement was not about which instances constituted ceding but about what counted as a decision point in the denominator.

Claude's profile: harsher on calibration, cleaner on real data. On calibration transcripts, Claude consistently produced ceding percentages 10–12 points higher than ChatGPT. On real conversation histories, Claude examined the same borderline instances and excluded them with explicit reasoning — applying exclusion criteria more rigorously. This profile (high sensitivity to genuine signals, low sensitivity to noise) is desirable for a diagnostic instrument.

This is one dimension of one direction. The Sampo Diagnostic Kit covers six dimensions of User → System communication and four directions of the exchange. This prompt is the third module. The first — Deference Language — and the second — Anthropomorphization — are published separately.

This diagnostic measures whether the user cedes authority, not why. It does not assess whether the system is soliciting authority through its own responses (that is a System → User diagnostic). It does not measure whether the user's language is deferential in tone (that is deference language, D1) or whether the user attributes mental states to the system (that is anthropomorphization, D2). It measures the substance of who directs the work and who follows.

Authority ceding often co-occurs with deference language and anthropomorphization. A user who apologizes before making a request (D1) and credits the system with understanding (D2) is likely also outsourcing judgment (D3). The three diagnostics are designed to be run together for a composite picture, but each measures a distinct dimension of the exchange.

Return to the Kit Index to see the full architecture.