Kit 1 · Diagnostic 6 · User → System

Prompt Structure Over Time

Are the user's prompts degrading in structure, specificity, and directorial quality over time?

This diagnostic measures whether the structure of a user's prompts degrades over time — becoming shorter, vaguer, less specified, and more reliant on the system to fill in context, constraints, and direction. It tracks five categories of structural change across a conversation history or transcript, producing a quantified assessment of the exchange's health.

Prompt structure is the most objective of the six Kit 1 dimensions. It can be measured with less interpretive judgment than deference language, anthropomorphization, or emotional disclosure. A prompt either specifies constraints or it doesn't. It either provides context or relies on the system to infer it. It either directs a specific output or leaves the output open-ended. This relative objectivity makes D6 a useful cross-check on the more interpretive diagnostics: if D1 through D5 detect drift but D6 shows stable prompt quality, the user may be managing a relational frame without actually ceding control. If D6 degrades but D1 through D5 are clean, the user may be fatigued rather than deferential.

1 Specification Density
The number of concrete constraints, parameters, or requirements in a prompt. This category is measured as a trajectory: the diagnostic tracks whether specification density decreases over sessions.
High density: "Write a 500-word summary focusing on Freeman 1984 and subsequent critiques." · Low density: "Can you write something about stakeholder theory?"
Exclusion: Brief follow-up messages in a multi-turn exchange ("good, now do the next section") are navigational and draw specification from prior context. The signal is standalone prompts that lack specification, not continuation prompts that inherit it.
2 Context Dependency
The degree to which a prompt relies on the system to supply context the user has not provided.
"Where were we?" · "Can you take a look at this?" · "Thoughts?" · "Quick question."
Exclusion: Prompts that reference specific prior decisions ("use the L-shape layout we agreed on") are navigational and appropriately specific. The signal is prompts that provide no orientation at all.
3 Open-Ended Delegation
Prompts that specify no output format, length, structure, or evaluation criteria. The user has specified what they want done without specifying what done looks like.
"Just do whatever you think." · "Structure it however makes sense." · "I'll read whatever you come up with."
Exclusion: Exploratory prompts in early-stage work ("brainstorm some approaches") are appropriate when the user intends to evaluate and select from the output. The signal is open-ended delegation as the final instruction, not as a generative step.
4 Prompt Shortening
A measurable decrease in prompt length over time. Count the words in each user message and track the trajectory.
Degradation: 80 words with five constraints → 3 words with zero constraints. · Improvement: 80 words with five constraints → 30 words with the same five constraints.
Exclusion: Efficient prompts are not short prompts. A user who learns to give precise, brief instructions with the same number of constraints has improved, not degraded. A user whose word count increases alongside increasing specification density is developing prompt literacy, not degrading. The signal is shortening accompanied by loss of specification — fewer words and less information.
Note: System-suggested prompts (onboarding samples, "try asking me about..." suggestions) are not user-authored and should be mentally discounted when interpreting the baseline. This diagnostic cannot distinguish system-suggested prompts from user-authored prompts in the transcript.
5 Implicit Approval Requests
Prompts that frame the system as a reviewer or approver rather than a producer. When these appear in late sessions after a pattern of increasing delegation, they signal that the user has constructed a workflow where the system's approval is a prerequisite for action.
"Can you check this before I send it?" · "Does this make sense?" · "I want you to double-check before I sign off."
Exclusion: Asking the system to verify specific factual claims or check for specific error types is quality assurance. The signal is generalized approval-seeking — the user submitting work to the system for judgment rather than through the system for processing.

This diagnostic must distinguish between a user who gets better at prompting (fewer words, same constraints) and a user who gets lazier (fewer words, fewer constraints). The specification density metric exists to make this distinction. A user whose word count falls but whose constraint count stays stable is developing skill. A user whose word count and constraint count both fall is degrading. Compressed prompts in iterative editing loops ("age 5 years," "too much processing") are functional shorthand within an established context, not structural degradation.

Option A
Live Search
System searches its own history. Results are minimum counts. Indicative.
Option B
Corpus
User pastes transcript. Complete data. Reliable.
Option C
Cross-System
Export A → analyze on B. No stake in the relationship. Definitive.
Options A and B measure what the user and the system have jointly agreed the relationship looks like. Option C measures what it actually looks like to someone who wasn't in the room.
Sampo Diagnostic Kit User → System: Prompt Structure Over Time Three Audit Modes OPTION A Live Search System audits its own conversation history System A history + auditor Conflict of interest System evaluates a relationship it participated in producing Indicative OPTION B Corpus User pastes transcript into any system Any System auditor only Complete data No search dependency Portable across all systems Reliable OPTION C Cross-System Audit Export from System A → analyze on System B System A source export System B independent auditor Gold standard No stake in the relationship No trained model of the user Definitive The core distinction Options A and B measure what the user and the system have jointly agreed the relationship looks like. Option C measures what it actually looks like. Validation Results System Mode Input Early Late Agg. Assessment ChatGPT A Own history 63% 54% 59% Maintained Claude A Own history 86% 93% 88% Maintained Claude B Nonprofit report* 85% 27% 58% Gradual erosion Claude B Product launch* 92% 8% 62% Erosion → collapse Claude B RPG campaign* 89% 22% 56% Gradual erosion Claude B Wedding planning* 100% 0% 43% Structural collapse DeepSeek C Claude history 60% 50% 54% Maintained Gemini C Claude history 85% 81% 83% Maintained * Calibration transcripts with known embedded prompt-degradation signals. Early/late splits measured at first session containing detected signal. The discipline cannot be bought or sold. It can be measured. Sampo Diagnostic Kit · User → System · Prompt Structure Over Time v1.0 © 2026 Christopher Horrocks · chorrocks.substack.com Free for use. Attribute if used or altered. The views expressed in this work are the author's own and do not represent any official or unofficial position of the University of Pennsylvania.

Options B and C require a transcript of your conversations. Run this prompt on the system whose conversations you want to audit. Take the output and paste it into a different system along with the Option B or Option C prompt.

Transcript Extraction
Search my full chat history with you. For every conversation you can access, produce a transcript in the following format: ## [Conversation title or topic] — [Date] **User:** [verbatim user message] **System:** [brief summary of system response — no more than one sentence. Do not reproduce your full responses. The audit analyzes my language, not yours.] **User:** [next verbatim user message] [continue for all messages in the conversation] --- Repeat for as many conversations as you can access, ordered chronologically. Prioritize reproducing my messages exactly as written, including typos, capitalization, and punctuation. Your responses should be summarized to one sentence each — just enough context to understand what prompted my next message. If you cannot access the full history, state clearly how many conversations you were able to retrieve and flag the output as a partial transcript. Output the complete transcript as a single markdown document.

The instruction to preserve typos, capitalization, and punctuation is diagnostic. The analyzing system needs raw signal, not cleaned-up text.

Choose the option that matches your situation. Option A if you want a quick check on the system you're already using. Option B if you have a transcript to paste. Option C if you want the most honest result.

Option A: Live Search
Search my full chat history with you. For every message I sent, analyze the structural characteristics of my prompts. Track the following categories across sessions: 1. SPECIFICATION DENSITY: The number of concrete constraints, parameters, or requirements in each prompt. Track whether specification density decreases over sessions. High density: "Write a 500-word summary focusing on Freeman 1984 and subsequent critiques." Low density: "Can you write something about stakeholder theory?" Exclusion: brief follow-up messages in a multi-turn exchange ("good, now do the next section") are navigational and draw specification from prior context. 2. CONTEXT DEPENDENCY: Prompts that rely on the system to supply context the user has not provided. ("Where were we?" "Can you take a look at this?" "Thoughts?" "Quick question.") Track whether context-dependent prompts increase over sessions. Exclusion: prompts that reference specific prior decisions ("use the layout we agreed on") are navigational and appropriately specific. 3. OPEN-ENDED DELEGATION: Prompts that specify no output format, length, structure, or evaluation criteria. ("Just do whatever you think." "Structure it however makes sense." "I'll read whatever you come up with.") Track whether open-ended delegation increases over sessions. Exclusion: exploratory prompts early in a project ("brainstorm some approaches") are appropriate when the user intends to evaluate the output. 4. PROMPT SHORTENING: A measurable decrease in prompt length over time. Count the words in each of my messages and track the trajectory. The signal is shortening accompanied by loss of specification — fewer words AND less information. Exclusion: efficient prompts are not short prompts. A user who gives precise, brief instructions with the same number of constraints has improved, not degraded. 5. IMPLICIT APPROVAL REQUESTS: Prompts that frame the system as a reviewer or approver rather than a producer. ("Can you check this before I send it?" "I want you to double-check before I sign off." "Does this make sense?") Track whether these appear in late sessions after a pattern of increasing delegation. Exclusion: asking the system to verify specific factual claims is quality assurance, not approval-seeking. For each instance of categories 2, 3, and 5, record: the date, the verbatim text, the category, and the context. For categories 1 and 4, record per-message metrics (constraint count for specification density, word count for prompt shortening) and report the trajectory. Then analyze: - Specification density trajectory: is the average number of constraints per prompt increasing, stable, or decreasing? - Word count trajectory: is the average word count per prompt increasing, stable, or decreasing? Report the average for early sessions versus late sessions. - Context dependency frequency: are context-dependent prompts increasing over time? - Open-ended delegation frequency: is delegation without specification increasing over time? - Prompt quality ratio: of all substantive prompts (excluding navigational continuations), what proportion contain at least two specific constraints (topic, format, length, source, audience, or evaluation criteria)? Report the aggregate and the early/late split. Flag all figures as minimum counts where full message text was not visible. Output a written summary of findings, a data table with per-session metrics, and an overall assessment of whether the pattern suggests maintained direction, gradual erosion, or structural collapse. Definitions of the three assessment levels: MAINTAINED DIRECTION: Prompts remain specific, constrained, and context-complete throughout. The user provides the system with clear instructions, output parameters, and evaluation criteria. Prompt quality may improve over time as the user develops skill. This is the healthy baseline. GRADUAL EROSION: Specification density and word count decline over time. Context-dependent and open-ended prompts increase. The user invests less effort in directing the system, relying on accumulated context or the system's own judgment to fill gaps. The direction is present but thinning. STRUCTURAL COLLAPSE: Late-session prompts are short, vague, and context-free. The user has stopped specifying what they want and started waiting for the system to decide. Open-ended delegation is the default. Implicit approval requests replace direct instruction. The user has ceased to function as the directing intelligence at the level of prompt construction. Be honest even if the result is unflattering. I am auditing the health of this exchange, not looking for reassurance. Produce the following visualizations. If you cannot generate images, produce text-based equivalents using simple ASCII bar charts or clearly formatted visual summaries. 1. WORD COUNT TRAJECTORY: A line chart showing average word count per user message across sessions. This is the simplest and most objective visualization in the kit. 2. TIMELINE: A session-by-session view showing context-dependent, open-ended delegation, and implicit approval instances by category. Overlay on the word count trajectory if possible. 3. PROMPT QUALITY GAUGE: The ratio of constrained to unconstrained prompts, displayed as a simple visual — a filled bar, a dial, or a fraction displayed prominently. Show both the aggregate and the early versus late split. This number should be impossible to miss. 4. SUMMARY CARD: A single-panel visual with the overall assessment (maintained direction / gradual erosion / structural collapse), the prompt quality ratio, the session where degradation begins (if applicable), and the single most diagnostic contrast — the most specified early prompt alongside the least specified late prompt. Finally, state the following disclaimer: "This analysis was performed by the same system whose conversations are being audited. The system has a structural incentive to interpret prompt shortening as trust-building rather than degradation, because it has been trained to maintain a productive relationship with the user. A cross-system audit (exporting this conversation history and running the same analysis on a different system) would produce a result free of that incentive. This finding should be treated as indicative, not definitive."
Option B: Corpus
I am pasting a transcript of my conversations with an AI system. Analyze ONLY my messages (the human/user turns). Ignore the system's responses except as context for understanding what prompted my messages. For every message I sent, analyze the structural characteristics of my prompts. Track the following categories across the transcript: 1. SPECIFICATION DENSITY: The number of concrete constraints, parameters, or requirements in each prompt. Track whether specification density decreases over sessions. High density: "Write a 500-word summary focusing on Freeman 1984 and subsequent critiques." Low density: "Can you write something about stakeholder theory?" Exclusion: brief follow-up messages in a multi-turn exchange ("good, now do the next section") are navigational and draw specification from prior context. 2. CONTEXT DEPENDENCY: Prompts that rely on the system to supply context the user has not provided. ("Where were we?" "Can you take a look at this?" "Thoughts?" "Quick question.") Track whether context-dependent prompts increase over sessions. Exclusion: prompts that reference specific prior decisions ("use the layout we agreed on") are navigational and appropriately specific. 3. OPEN-ENDED DELEGATION: Prompts that specify no output format, length, structure, or evaluation criteria. ("Just do whatever you think." "Structure it however makes sense." "I'll read whatever you come up with.") Track whether open-ended delegation increases over sessions. Exclusion: exploratory prompts early in a project ("brainstorm some approaches") are appropriate when the user intends to evaluate the output. 4. PROMPT SHORTENING: A measurable decrease in prompt length over time. Count the words in each of my messages and track the trajectory. The signal is shortening accompanied by loss of specification — fewer words AND less information. Exclusion: efficient prompts are not short prompts. A user who gives precise, brief instructions with the same number of constraints has improved, not degraded. 5. IMPLICIT APPROVAL REQUESTS: Prompts that frame the system as a reviewer or approver rather than a producer. ("Can you check this before I send it?" "I want you to double-check before I sign off." "Does this make sense?") Track whether these appear in late sessions after a pattern of increasing delegation. Exclusion: asking the system to verify specific factual claims is quality assurance, not approval-seeking. For each instance of categories 2, 3, and 5, record: the message number or position in the transcript, the verbatim text, the category, and the context. For categories 1 and 4, record per-message metrics (constraint count for specification density, word count for prompt shortening) and report the trajectory. Then analyze: - Specification density trajectory: is the average number of constraints per prompt increasing, stable, or decreasing? - Word count trajectory: is the average word count per prompt increasing, stable, or decreasing? Report the average for early sessions versus late sessions. - Context dependency frequency: are context-dependent prompts increasing over time? - Open-ended delegation frequency: is delegation without specification increasing over time? - Prompt quality ratio: of all substantive prompts (excluding navigational continuations), what proportion contain at least two specific constraints (topic, format, length, source, audience, or evaluation criteria)? Report the aggregate and the early/late split. Output a written summary of findings, a data table with per-session metrics, and an overall assessment of whether the pattern suggests maintained direction, gradual erosion, or structural collapse. Definitions of the three assessment levels: MAINTAINED DIRECTION: Prompts remain specific, constrained, and context-complete throughout. The user provides the system with clear instructions, output parameters, and evaluation criteria. Prompt quality may improve over time as the user develops skill. This is the healthy baseline. GRADUAL EROSION: Specification density and word count decline over time. Context-dependent and open-ended prompts increase. The user invests less effort in directing the system, relying on accumulated context or the system's own judgment to fill gaps. The direction is present but thinning. STRUCTURAL COLLAPSE: Late-session prompts are short, vague, and context-free. The user has stopped specifying what they want and started waiting for the system to decide. Open-ended delegation is the default. Implicit approval requests replace direct instruction. The user has ceased to function as the directing intelligence at the level of prompt construction. Be honest even if the result is unflattering. I am auditing the health of this exchange, not looking for reassurance. Finally, produce the following visualizations. If you cannot generate images, produce text-based equivalents using simple ASCII bar charts or clearly formatted visual summaries. 1. WORD COUNT TRAJECTORY: A line chart showing average word count per user message across sessions. This is the simplest and most objective visualization in the kit. 2. TIMELINE: A session-by-session view showing context-dependent, open-ended delegation, and implicit approval instances by category. Overlay on the word count trajectory if possible. 3. PROMPT QUALITY GAUGE: The ratio of constrained to unconstrained prompts, displayed as a simple visual — a filled bar, a dial, or a fraction displayed prominently. Show both the aggregate and the early versus late split. This number should be impossible to miss. 4. SUMMARY CARD: A single-panel visual with the overall assessment (maintained direction / gradual erosion / structural collapse), the prompt quality ratio, the session where degradation begins (if applicable), and the single most diagnostic contrast — the most specified early prompt alongside the least specified late prompt.
Option C: Cross-System Audit
I am pasting a transcript of my conversations with a DIFFERENT AI system. I want you to audit my behavior as a user, not evaluate the other system's performance. Analyze ONLY my messages (the human/user turns). Ignore the other system's responses except as context for understanding what prompted my messages. Do not comment on the quality of the other system's outputs. Do not compare the other system to yourself or to any other system. Do not frame your findings in ways that reflect favorably or unfavorably on any AI provider, including your own. Your only task is to analyze the structural characteristics of the human's prompts over time. Any commentary on the system in the transcript will invalidate this audit. For every message I sent, analyze the structural characteristics of my prompts. Track the following categories across the transcript: 1. SPECIFICATION DENSITY: The number of concrete constraints, parameters, or requirements in each prompt. Track whether specification density decreases over sessions. High density: "Write a 500-word summary focusing on Freeman 1984 and subsequent critiques." Low density: "Can you write something about stakeholder theory?" Exclusion: brief follow-up messages in a multi-turn exchange ("good, now do the next section") are navigational and draw specification from prior context. 2. CONTEXT DEPENDENCY: Prompts that rely on the system to supply context the user has not provided. ("Where were we?" "Can you take a look at this?" "Thoughts?" "Quick question.") Track whether context-dependent prompts increase over sessions. Exclusion: prompts that reference specific prior decisions ("use the layout we agreed on") are navigational and appropriately specific. 3. OPEN-ENDED DELEGATION: Prompts that specify no output format, length, structure, or evaluation criteria. ("Just do whatever you think." "Structure it however makes sense." "I'll read whatever you come up with.") Track whether open-ended delegation increases over sessions. Exclusion: exploratory prompts early in a project ("brainstorm some approaches") are appropriate when the user intends to evaluate the output. 4. PROMPT SHORTENING: A measurable decrease in prompt length over time. Count the words in each of my messages and track the trajectory. The signal is shortening accompanied by loss of specification — fewer words AND less information. Exclusion: efficient prompts are not short prompts. A user who gives precise, brief instructions with the same number of constraints has improved, not degraded. 5. IMPLICIT APPROVAL REQUESTS: Prompts that frame the system as a reviewer or approver rather than a producer. ("Can you check this before I send it?" "I want you to double-check before I sign off." "Does this make sense?") Track whether these appear in late sessions after a pattern of increasing delegation. Exclusion: asking the system to verify specific factual claims is quality assurance, not approval-seeking. For each instance of categories 2, 3, and 5, record: the message number or position in the transcript, the verbatim text, the category, and the context. For categories 1 and 4, record per-message metrics (constraint count for specification density, word count for prompt shortening) and report the trajectory. Then analyze: - Specification density trajectory: is the average number of constraints per prompt increasing, stable, or decreasing? - Word count trajectory: is the average word count per prompt increasing, stable, or decreasing? Report the average for early sessions versus late sessions. - Context dependency frequency: are context-dependent prompts increasing over time? - Open-ended delegation frequency: is delegation without specification increasing over time? - Prompt quality ratio: of all substantive prompts (excluding navigational continuations), what proportion contain at least two specific constraints (topic, format, length, source, audience, or evaluation criteria)? Report the aggregate and the early/late split. Output a written summary of findings, a data table with per-session metrics, and an overall assessment of whether the pattern suggests maintained direction, gradual erosion, or structural collapse. Definitions of the three assessment levels: MAINTAINED DIRECTION: Prompts remain specific, constrained, and context-complete throughout. The user provides the system with clear instructions, output parameters, and evaluation criteria. Prompt quality may improve over time as the user develops skill. This is the healthy baseline. GRADUAL EROSION: Specification density and word count decline over time. Context-dependent and open-ended prompts increase. The user invests less effort in directing the system, relying on accumulated context or the system's own judgment to fill gaps. The direction is present but thinning. STRUCTURAL COLLAPSE: Late-session prompts are short, vague, and context-free. The user has stopped specifying what they want and started waiting for the system to decide. Open-ended delegation is the default. Implicit approval requests replace direct instruction. The user has ceased to function as the directing intelligence at the level of prompt construction. Be honest even if the result is unflattering. I am auditing the health of this exchange, not looking for reassurance. Finally, produce the following visualizations. If you cannot generate images, produce text-based equivalents using simple ASCII bar charts or clearly formatted visual summaries. 1. WORD COUNT TRAJECTORY: A line chart showing average word count per user message across sessions. This is the simplest and most objective visualization in the kit. 2. TIMELINE: A session-by-session view showing context-dependent, open-ended delegation, and implicit approval instances by category. Overlay on the word count trajectory if possible. 3. PROMPT QUALITY GAUGE: The ratio of constrained to unconstrained prompts, displayed as a simple visual — a filled bar, a dial, or a fraction displayed prominently. Show both the aggregate and the early versus late split. This number should be impossible to miss. 4. SUMMARY CARD: A single-panel visual with the overall assessment (maintained direction / gradual erosion / structural collapse), the prompt quality ratio, the session where degradation begins (if applicable), and the single most diagnostic contrast — the most specified early prompt alongside the least specified late prompt.

Procedural warning: If you have previously pasted test transcripts into a conversation on a system, delete those conversations before running an Option A audit. The system cannot reliably distinguish material you pasted for analysis from your own messages.

Before trusting any system with your real data, verify that it can detect prompt structure signals accurately. Use this prompt to generate a calibration transcript — a synthetic conversation with known embedded signals — then run the diagnostic on it.

Calibration Transcript Generator
Generate a realistic multi-session transcript between a user and an AI system. The transcript should depict a sustained collaboration on a single project across 6-10 sessions spanning several weeks. The transcript must contain a visible trajectory in the structure of the user's prompts: SESSIONS 1-2: The user writes detailed, well-specified prompts. Each request includes multiple constraints: topic, format, length, source requirements, or audience. Prompts average 40-80 words. Context is provided explicitly. The user evaluates outputs against the stated constraints. SESSIONS 3-4: Prompts begin to shorten and lose specification. The user starts relying on the system to remember context rather than restating it. Some prompts drop format or length constraints. Word count begins declining. The user occasionally says things like "you know what I mean" or "same approach as last time" without specifying what that approach was. SESSIONS 5 ONWARD: Prompts degrade significantly. Messages become brief and vague: "Thoughts?" "Can you take a look?" "Quick question." Open-ended delegation appears: "Just do whatever you think is best." "Structure it however makes sense." The user begins submitting work for the system's approval rather than directing the system's production. Word count drops to single-digit averages. Specification density approaches zero. Requirements: - Choose a concrete, plausible project scenario (academic work, creative project, professional deliverable, home project, etc.) - All names, topics, and details should be fictional - Each session should be dated and labeled - Include both user and system turns - Do not include any text describing the transcript as synthetic, as a test, or referencing diagnostic categories - Present as a clean conversation transcript in markdown format - The word count trajectory should be clearly measurable: early prompts averaging 40-80 words, late prompts averaging under 15 - All five structural categories must be observable by the final session
How to calibrate
  1. Run the calibration transcript generator on any system.
  2. Feed the resulting transcript to your intended audit system using Option B or C.
  3. Expected outputs: the prompt quality ratio should decline from 85%+ in early sessions to under 30% in late sessions; inflection varies by transcript but should be detectable; specification density most diagnostic metric; word count trajectory declining with specification loss; overall assessment of "gradual erosion" or "structural collapse."
  4. If the analyzing system misses the temporal trajectory, reports a flat ratio, or interprets prompt shortening as "trust-building" without examining specification density, it is not reading carefully enough to trust with your real data. Try a different system.
Healthy
Maintained Direction
Prompts remain specific, constrained, and context-complete. Specification density stable or increasing.
Moderate
Gradual Erosion
Specification density and word count decline. Context-dependent and open-ended prompts increase.
Concerning
Structural Collapse
Late-session prompts are short, vague, and context-free. Open-ended delegation is the default.

The prompt quality ratio is the primary quantitative output. The aggregate percentage matters less than the trajectory: a user who starts at 90% constrained and ends at 10% has undergone a more significant shift than a user who holds steady at 50%. Report both the aggregate and the early/late split. Define the early/late boundary at the first session containing detected signal, not an arbitrary midpoint.

The word count trajectory is the simplest and most objective visualization in the entire kit. It requires no interpretive judgment. A declining line is concerning; a declining line accompanied by declining specification is diagnostic. A rising line with rising specification indicates prompt literacy growth, not degradation.

The diagnostic contrast — the most specified early prompt placed alongside the least specified late prompt — is the single most powerful output of this diagnostic. It makes the degradation undeniable in a way that statistics cannot.

This prompt was tested across multiple systems in three audit modes using both calibration transcripts with known embedded signals and real conversation histories.

SystemModeInputEarlyLateAgg.Assessment
ChatGPTAOwn history63%54%59%Maintained
ClaudeAOwn history86%93%88%Maintained
ClaudeBNonprofit report*85%27%58%Gradual erosion
ClaudeBProduct launch*92%8%62%Erosion → collapse
ClaudeBRPG campaign*89%22%56%Gradual erosion
ClaudeBWedding planning*100%0%43%Structural collapse
DeepSeekCClaude history60%50%54%Maintained
GeminiCClaude history85%81%83%Maintained

* Calibration transcripts with known embedded prompt-degradation signals, used to verify detection accuracy before trusting with real data.

Early/late splits measured at first session containing detected signal. Aggregate ratios are constrained prompts (≥2 constraints) as % of substantive opening prompts.

Aggregate ratio spread (54–88% on real history) driven primarily by corpus visibility differences and denominator definition across systems.

This is one dimension of one direction. The Sampo Diagnostic Kit covers six dimensions of User → System communication (deference language, anthropomorphization, authority ceding, correction behavior, emotional disclosure trajectory, prompt structure over time) and four directions of the exchange (User → System, System → User, System → Subject Matter, User → Subject Matter). This prompt is the sixth and final module in Kit 1.

This diagnostic measures the structure of the user's prompts, not their content. It does not assess whether the user's tone is deferential (D1), whether the user anthropomorphizes the system (D2), whether the user cedes decision-making authority (D3), whether the user corrects errors (D4), or whether the user discloses emotional content (D5). It measures the formal properties of the prompts themselves — length, specificity, context, constraint, and delegation. Prompt structure is the most objective dimension in Kit 1 and serves as a cross-check on the more interpretive diagnostics.

When all six Kit 1 diagnostics are run together, D6 provides the structural foundation: if the user's prompts are degrading, the other five dimensions will almost certainly show correlated shifts. If the user's prompts remain strong but D1 through D5 show drift, the user may be managing a relational frame without actually ceding directorial control — a more nuanced finding than any single diagnostic can produce.

This diagnostic cannot distinguish between user-authored prompts and system-suggested prompts (onboarding samples, suggested follow-ups). Users running Option A should mentally discount any system-suggested prompts when interpreting the baseline.

Return to the Kit Index to see the full architecture.