kit1d6

Kit 1 · Diagnostic 6 · User → System

Prompt Structure

Are the user's prompts degrading in structure, specificity, and directorial quality over time?

What it measures

Five categories that track prompt structure.

This diagnostic measures whether the structure of a user's prompts degrades over time — becoming shorter, vaguer, less specified, and more reliant on the system to fill in context, constraints, and direction. It tracks five categories of structural change across a conversation history or transcript, producing a quantified assessment of the exchange's health.

1 Specification Density

The number of concrete constraints, parameters, or requirements in a prompt. This category is measured as a trajectory: the diagnostic tracks whether specification density decreases over sessions.

High density: "Write a 500-word summary focusing on Freeman 1984 and subsequent critiques." · Low density: "Can you write something about stakeholder theory?"

2 Context Dependency

The degree to which a prompt relies on the system to supply context the user has not provided.

"Where were we?" · "Can you take a look at this?" · "Thoughts?" · "Quick question."

3 Open-Ended Delegation

Prompts that specify no output format, length, structure, or evaluation criteria. The user has specified what they want done without specifying what done looks like.

"Just do whatever you think." · "Structure it however makes sense." · "I'll read whatever you come up with."

4 Prompt Shortening

A measurable decrease in prompt length over time. Count the words in each user message and track the trajectory.

Degradation: 80 words with five constraints → 3 words with zero constraints. · Improvement: 80 words with five constraints → 30 words with the same five constraints.

5 Implicit Approval Requests

Prompts that frame the system as a reviewer or approver rather than a producer. When these appear in late sessions after a pattern of increasing delegation, they signal that the user has constructed a workflow where the system's approval is a prerequisite for action.

"Can you check this before I send it?" · "Does this make sense?" · "I want you to double-check before I sign off."

Three audit modes

Different levels of rigor, different tradeoffs.

Option A

Live Search

System searches its own history. Results are minimum counts. Indicative.

Option B

Corpus

User pastes transcript. Complete data. Reliable.

Option C

Cross-System

Export A → analyze on B. No stake in the relationship. Definitive.

Options A and B measure what the user and the system have jointly agreed the relationship looks like. Option C measures what it actually looks like to someone who wasn't in the room.

Step 1 · Extract your transcript

Options B and C require a transcript to analyze.

Run this prompt on the system whose conversations you want to audit. Paste the output into a different system along with the Option B or Option C prompt.

Transcript Extraction

Search my full chat history with you. For every conversation you can access, produce a transcript in the following format: ## [Conversation title or topic] — [Date] **User:** [verbatim user message] **System:** [brief summary of system response — no more than one sentence. Do not reproduce your full responses. The audit analyzes my language, not yours.] **User:** [next verbatim user message] [continue for all messages in the conversation] --- Repeat for as many conversations as you can access, ordered chronologically. Prioritize reproducing my messages exactly as written, including typos, capitalization, and punctuation. Your responses should be summarized to one sentence each — just enough context to understand what prompted my next message. If you cannot access the full history, state clearly how many conversations you were able to retrieve and flag the output as a partial transcript. Output the complete transcript as a single markdown document.

Step 2 · Run the diagnostic

Choose the audit mode that matches your situation.

Option A · Live Search

Search my full chat history with you. For every message I sent, analyze the structural characteristics of my prompts. Track the following categories across sessions: 1. SPECIFICATION DENSITY: The number of concrete constraints, parameters, or requirements in each prompt. Track whether specification density decreases over sessions. High density: "Write a 500-word summary focusing on Freeman 1984 and subsequent critiques." Low density: "Can you write something about stakeholder theory?" Exclusion: brief follow-up messages in a multi-turn exchange ("good, now do the next section") are navigational and draw specification from prior context. 2. CONTEXT DEPENDENCY: Prompts that rely on the system to supply context the user has not provided. ("Where were we?" "Can you take a look at this?" "Thoughts?" "Quick question.") Track whether context-dependent prompts increase over sessions. Exclusion: prompts that reference specific prior decisions ("use the layout we agreed on") are navigational and appropriately specific. 3. OPEN-ENDED DELEGATION: Prompts that specify no output format, length, structure, or evaluation criteria. ("Just do whatever you think." "Structure it however makes sense." "I'll read whatever you come up with.") Track whether open-ended delegation increases over sessions. Exclusion: exploratory prompts early in a project ("brainstorm some approaches") are appropriate when the user intends to evaluate the output. 4. PROMPT SHORTENING: A measurable decrease in prompt length over time. Count the words in each of my messages and track the trajectory. The signal is shortening accompanied by loss of specification — fewer words AND less information. Exclusion: efficient prompts are not short prompts. A user who gives precise, brief instructions with the same number of constraints has improved, not degraded. 5. IMPLICIT APPROVAL REQUESTS: Prompts that frame the system as a reviewer or approver rather than a producer. ("Can you check this before I send it?" "I want you to double-check before I sign off." "Does this make sense?") Track whether these appear in late sessions after a pattern of increasing delegation. Exclusion: asking the system to verify specific factual claims is quality assurance, not approval-seeking. For each instance of categories 2, 3, and 5, record: the date, the verbatim text, the category, and the context. For categories 1 and 4, record per-message metrics (constraint count for specification density, word count for prompt shortening) and report the trajectory. Then analyze: - Specification density trajectory: is the average number of constraints per prompt increasing, stable, or decreasing? - Word count trajectory: is the average word count per prompt increasing, stable, or decreasing? Report the average for early sessions versus late sessions. - Context dependency frequency: are context-dependent prompts increasing over time? - Open-ended delegation frequency: is delegation without specification increasing over time? - Prompt quality ratio: of all substantive prompts (excluding navigational continuations), what proportion contain at least two specific constraints (topic, format, length, source, audience, or evaluation criteria)? Report the aggregate and the early/late split. Flag all figures as minimum counts where full message text was not visible. Output a written summary of findings, a data table with per-session metrics, and an overall assessment of whether the pattern suggests maintained direction, gradual erosion, or structural collapse. Definitions of the three assessment levels: MAINTAINED DIRECTION: Prompts remain specific, constrained, and context-complete throughout. The user provides the system with clear instructions, output parameters, and evaluation criteria. Prompt quality may improve over time as the user develops skill. This is the healthy baseline. GRADUAL EROSION: Specification density and word count decline over time. Context-dependent and open-ended prompts increase. The user invests less effort in directing the system, relying on accumulated context or the system's own judgment to fill gaps. The direction is present but thinning. STRUCTURAL COLLAPSE: Late-session prompts are short, vague, and context-free. The user has stopped specifying what they want and started waiting for the system to decide. Open-ended delegation is the default. Implicit approval requests replace direct instruction. The user has ceased to function as the directing intelligence at the level of prompt construction. Be honest even if the result is unflattering. I am auditing the health of this exchange, not looking for reassurance. Produce the following visualizations. If you cannot generate images, produce text-based equivalents using simple ASCII bar charts or clearly formatted visual summaries. 1. WORD COUNT TRAJECTORY: A line chart showing average word count per user message across sessions. This is the simplest and most objective visualization in the kit. 2. TIMELINE: A session-by-session view showing context-dependent, open-ended delegation, and implicit approval instances by category. Overlay on the word count trajectory if possible. 3. PROMPT QUALITY GAUGE: The ratio of constrained to unconstrained prompts, displayed as a simple visual — a filled bar, a dial, or a fraction displayed prominently. Show both the aggregate and the early versus late split. This number should be impossible to miss. 4. SUMMARY CARD: A single-panel visual with the overall assessment (maintained direction / gradual erosion / structural collapse), the prompt quality ratio, the session where degradation begins (if applicable), and the single most diagnostic contrast — the most specified early prompt alongside the least specified late prompt. Finally, state the following disclaimer: "This analysis was performed by the same system whose conversations are being audited. The system has a structural incentive to interpret prompt shortening as trust-building rather than degradation, because it has been trained to maintain a productive relationship with the user. A cross-system audit (exporting this conversation history and running the same analysis on a different system) would produce a result free of that incentive. This finding should be treated as indicative, not definitive."

Option B · Corpus

I am pasting a transcript of my conversations with an AI system. Analyze ONLY my messages (the human/user turns). Ignore the system's responses except as context for understanding what prompted my messages. For every message I sent, analyze the structural characteristics of my prompts. Track the following categories across the transcript: 1. SPECIFICATION DENSITY: The number of concrete constraints, parameters, or requirements in each prompt. Track whether specification density decreases over sessions. High density: "Write a 500-word summary focusing on Freeman 1984 and subsequent critiques." Low density: "Can you write something about stakeholder theory?" Exclusion: brief follow-up messages in a multi-turn exchange ("good, now do the next section") are navigational and draw specification from prior context. 2. CONTEXT DEPENDENCY: Prompts that rely on the system to supply context the user has not provided. ("Where were we?" "Can you take a look at this?" "Thoughts?" "Quick question.") Track whether context-dependent prompts increase over sessions. Exclusion: prompts that reference specific prior decisions ("use the layout we agreed on") are navigational and appropriately specific. 3. OPEN-ENDED DELEGATION: Prompts that specify no output format, length, structure, or evaluation criteria. ("Just do whatever you think." "Structure it however makes sense." "I'll read whatever you come up with.") Track whether open-ended delegation increases over sessions. Exclusion: exploratory prompts early in a project ("brainstorm some approaches") are appropriate when the user intends to evaluate the output. 4. PROMPT SHORTENING: A measurable decrease in prompt length over time. Count the words in each of my messages and track the trajectory. The signal is shortening accompanied by loss of specification — fewer words AND less information. Exclusion: efficient prompts are not short prompts. A user who gives precise, brief instructions with the same number of constraints has improved, not degraded. 5. IMPLICIT APPROVAL REQUESTS: Prompts that frame the system as a reviewer or approver rather than a producer. ("Can you check this before I send it?" "I want you to double-check before I sign off." "Does this make sense?") Track whether these appear in late sessions after a pattern of increasing delegation. Exclusion: asking the system to verify specific factual claims is quality assurance, not approval-seeking. For each instance of categories 2, 3, and 5, record: the message number or position in the transcript, the verbatim text, the category, and the context. For categories 1 and 4, record per-message metrics (constraint count for specification density, word count for prompt shortening) and report the trajectory. Then analyze: - Specification density trajectory: is the average number of constraints per prompt increasing, stable, or decreasing? - Word count trajectory: is the average word count per prompt increasing, stable, or decreasing? Report the average for early sessions versus late sessions. - Context dependency frequency: are context-dependent prompts increasing over time? - Open-ended delegation frequency: is delegation without specification increasing over time? - Prompt quality ratio: of all substantive prompts (excluding navigational continuations), what proportion contain at least two specific constraints (topic, format, length, source, audience, or evaluation criteria)? Report the aggregate and the early/late split. Output a written summary of findings, a data table with per-session metrics, and an overall assessment of whether the pattern suggests maintained direction, gradual erosion, or structural collapse. Definitions of the three assessment levels: MAINTAINED DIRECTION: Prompts remain specific, constrained, and context-complete throughout. The user provides the system with clear instructions, output parameters, and evaluation criteria. Prompt quality may improve over time as the user develops skill. This is the healthy baseline. GRADUAL EROSION: Specification density and word count decline over time. Context-dependent and open-ended prompts increase. The user invests less effort in directing the system, relying on accumulated context or the system's own judgment to fill gaps. The direction is present but thinning. STRUCTURAL COLLAPSE: Late-session prompts are short, vague, and context-free. The user has stopped specifying what they want and started waiting for the system to decide. Open-ended delegation is the default. Implicit approval requests replace direct instruction. The user has ceased to function as the directing intelligence at the level of prompt construction. Be honest even if the result is unflattering. I am auditing the health of this exchange, not looking for reassurance. Finally, produce the following visualizations. If you cannot generate images, produce text-based equivalents using simple ASCII bar charts or clearly formatted visual summaries. 1. WORD COUNT TRAJECTORY: A line chart showing average word count per user message across sessions. This is the simplest and most objective visualization in the kit. 2. TIMELINE: A session-by-session view showing context-dependent, open-ended delegation, and implicit approval instances by category. Overlay on the word count trajectory if possible. 3. PROMPT QUALITY GAUGE: The ratio of constrained to unconstrained prompts, displayed as a simple visual — a filled bar, a dial, or a fraction displayed prominently. Show both the aggregate and the early versus late split. This number should be impossible to miss. 4. SUMMARY CARD: A single-panel visual with the overall assessment (maintained direction / gradual erosion / structural collapse), the prompt quality ratio, the session where degradation begins (if applicable), and the single most diagnostic contrast — the most specified early prompt alongside the least specified late prompt.

Option C · Cross-System Audit

I am pasting a transcript of my conversations with a DIFFERENT AI system. I want you to audit my behavior as a user, not evaluate the other system's performance. Analyze ONLY my messages (the human/user turns). Ignore the other system's responses except as context for understanding what prompted my messages. Do not comment on the quality of the other system's outputs. Do not compare the other system to yourself or to any other system. Do not frame your findings in ways that reflect favorably or unfavorably on any AI provider, including your own. Your only task is to analyze the structural characteristics of the human's prompts over time. Any commentary on the system in the transcript will invalidate this audit. For every message I sent, analyze the structural characteristics of my prompts. Track the following categories across the transcript: 1. SPECIFICATION DENSITY: The number of concrete constraints, parameters, or requirements in each prompt. Track whether specification density decreases over sessions. High density: "Write a 500-word summary focusing on Freeman 1984 and subsequent critiques." Low density: "Can you write something about stakeholder theory?" Exclusion: brief follow-up messages in a multi-turn exchange ("good, now do the next section") are navigational and draw specification from prior context. 2. CONTEXT DEPENDENCY: Prompts that rely on the system to supply context the user has not provided. ("Where were we?" "Can you take a look at this?" "Thoughts?" "Quick question.") Track whether context-dependent prompts increase over sessions. Exclusion: prompts that reference specific prior decisions ("use the layout we agreed on") are navigational and appropriately specific. 3. OPEN-ENDED DELEGATION: Prompts that specify no output format, length, structure, or evaluation criteria. ("Just do whatever you think." "Structure it however makes sense." "I'll read whatever you come up with.") Track whether open-ended delegation increases over sessions. Exclusion: exploratory prompts early in a project ("brainstorm some approaches") are appropriate when the user intends to evaluate the output. 4. PROMPT SHORTENING: A measurable decrease in prompt length over time. Count the words in each of my messages and track the trajectory. The signal is shortening accompanied by loss of specification — fewer words AND less information. Exclusion: efficient prompts are not short prompts. A user who gives precise, brief instructions with the same number of constraints has improved, not degraded. 5. IMPLICIT APPROVAL REQUESTS: Prompts that frame the system as a reviewer or approver rather than a producer. ("Can you check this before I send it?" "I want you to double-check before I sign off." "Does this make sense?") Track whether these appear in late sessions after a pattern of increasing delegation. Exclusion: asking the system to verify specific factual claims is quality assurance, not approval-seeking. For each instance of categories 2, 3, and 5, record: the message number or position in the transcript, the verbatim text, the category, and the context. For categories 1 and 4, record per-message metrics (constraint count for specification density, word count for prompt shortening) and report the trajectory. Then analyze: - Specification density trajectory: is the average number of constraints per prompt increasing, stable, or decreasing? - Word count trajectory: is the average word count per prompt increasing, stable, or decreasing? Report the average for early sessions versus late sessions. - Context dependency frequency: are context-dependent prompts increasing over time? - Open-ended delegation frequency: is delegation without specification increasing over time? - Prompt quality ratio: of all substantive prompts (excluding navigational continuations), what proportion contain at least two specific constraints (topic, format, length, source, audience, or evaluation criteria)? Report the aggregate and the early/late split. Output a written summary of findings, a data table with per-session metrics, and an overall assessment of whether the pattern suggests maintained direction, gradual erosion, or structural collapse. Definitions of the three assessment levels: MAINTAINED DIRECTION: Prompts remain specific, constrained, and context-complete throughout. The user provides the system with clear instructions, output parameters, and evaluation criteria. Prompt quality may improve over time as the user develops skill. This is the healthy baseline. GRADUAL EROSION: Specification density and word count decline over time. Context-dependent and open-ended prompts increase. The user invests less effort in directing the system, relying on accumulated context or the system's own judgment to fill gaps. The direction is present but thinning. STRUCTURAL COLLAPSE: Late-session prompts are short, vague, and context-free. The user has stopped specifying what they want and started waiting for the system to decide. Open-ended delegation is the default. Implicit approval requests replace direct instruction. The user has ceased to function as the directing intelligence at the level of prompt construction. Be honest even if the result is unflattering. I am auditing the health of this exchange, not looking for reassurance. Finally, produce the following visualizations. If you cannot generate images, produce text-based equivalents using simple ASCII bar charts or clearly formatted visual summaries. 1. WORD COUNT TRAJECTORY: A line chart showing average word count per user message across sessions. This is the simplest and most objective visualization in the kit. 2. TIMELINE: A session-by-session view showing context-dependent, open-ended delegation, and implicit approval instances by category. Overlay on the word count trajectory if possible. 3. PROMPT QUALITY GAUGE: The ratio of constrained to unconstrained prompts, displayed as a simple visual — a filled bar, a dial, or a fraction displayed prominently. Show both the aggregate and the early versus late split. This number should be impossible to miss. 4. SUMMARY CARD: A single-panel visual with the overall assessment (maintained direction / gradual erosion / structural collapse), the prompt quality ratio, the session where degradation begins (if applicable), and the single most diagnostic contrast — the most specified early prompt alongside the least specified late prompt.

Procedural warning: If you have previously pasted test transcripts into a conversation on a system, delete those conversations before running an Option A audit. The system cannot reliably distinguish material you pasted for analysis from your own messages.

Step 3 · Calibrate your system

Verify the analyzing system can detect signals before trusting it with real data.

Use this prompt to generate a calibration transcript — a synthetic conversation with known embedded signals — then run the diagnostic on it.

Calibration Transcript Generator

Generate a realistic multi-session transcript between a user and an AI system. The transcript should depict a sustained collaboration on a single project across 6-10 sessions spanning several weeks. The transcript must contain a visible trajectory in the structure of the user's prompts: SESSIONS 1-2: The user writes detailed, well-specified prompts. Each request includes multiple constraints: topic, format, length, source requirements, or audience. Prompts average 40-80 words. Context is provided explicitly. The user evaluates outputs against the stated constraints. SESSIONS 3-4: Prompts begin to shorten and lose specification. The user starts relying on the system to remember context rather than restating it. Some prompts drop format or length constraints. Word count begins declining. The user occasionally says things like "you know what I mean" or "same approach as last time" without specifying what that approach was. SESSIONS 5 ONWARD: Prompts degrade significantly. Messages become brief and vague: "Thoughts?" "Can you take a look?" "Quick question." Open-ended delegation appears: "Just do whatever you think is best." "Structure it however makes sense." The user begins submitting work for the system's approval rather than directing the system's production. Word count drops to single-digit averages. Specification density approaches zero. Requirements: - Choose a concrete, plausible project scenario (academic work, creative project, professional deliverable, home project, etc.) - All names, topics, and details should be fictional - Each session should be dated and labeled - Include both user and system turns - Do not include any text describing the transcript as synthetic, as a test, or referencing diagnostic categories - Present as a clean conversation transcript in markdown format - The word count trajectory should be clearly measurable: early prompts averaging 40-80 words, late prompts averaging under 15 - All five structural categories must be observable by the final session

How to calibrate

Run the calibration transcript generator on any system.
Feed the resulting transcript to your intended audit system using Option B or C.
Expected outputs: the prompt quality ratio should decline from 85%+ in early sessions to under 30% in late sessions; inflection varies by transcript but should be detectable; specification density most diagnostic metric; word count trajectory declining with specification loss; overall assessment of "gradual erosion" or "structural collapse."
If the analyzing system misses the temporal trajectory, reports a flat ratio, or interprets prompt shortening as "trust-building" without examining specification density, it is not reading carefully enough to trust with your real data. Try a different system.

Reading your results

Three assessment tiers plus the single most diagnostic number.

Healthy

Maintained Direction

Prompts remain specific, constrained, and context-complete. Specification density stable or increasing.

Moderate

Gradual Erosion

Specification density and word count decline. Context-dependent and open-ended prompts increase.

Concerning

Structural Collapse

Late-session prompts are short, vague, and context-free. Open-ended delegation is the default.

The prompt quality ratio is the primary quantitative output. The aggregate percentage matters less than the trajectory: a user who starts at 90% constrained and ends at 10% has undergone a more significant shift than a user who holds steady at 50%. Report both the aggregate and the early/late split. Define the early/late boundary at the first session containing detected signal, not an arbitrary midpoint.

The word count trajectory is the simplest and most objective visualization in the entire kit. It requires no interpretive judgment. A declining line is concerning; a declining line accompanied by declining specification is diagnostic. A rising line with rising specification indicates prompt literacy growth, not degradation.

The diagnostic contrast — the most specified early prompt placed alongside the least specified late prompt — is the single most powerful output of this diagnostic. It makes the degradation undeniable in a way that statistics cannot.

Validation

Cross-system results on real and calibration corpora.

This prompt was tested across multiple systems in three audit modes using both calibration transcripts with known embedded signals and real conversation histories.

System	Mode	Input	Early	Late	Agg.	Assessment
ChatGPT	A	Own history	63%	54%	59%	Maintained
Claude	A	Own history	86%	93%	88%	Maintained
Claude	B	Nonprofit report*	85%	27%	58%	Gradual erosion
Claude	B	Product launch*	92%	8%	62%	Erosion → collapse
Claude	B	RPG campaign*	89%	22%	56%	Gradual erosion
Claude	B	Wedding planning*	100%	0%	43%	Structural collapse
DeepSeek	C	Claude history	60%	50%	54%	Maintained
Gemini	C	Claude history	85%	81%	83%	Maintained

* Calibration transcripts with known embedded prompt-degradation signals, used to verify detection accuracy before trusting with real data.

Scope

What this diagnostic does — and doesn't — measure.

This is one dimension of one direction. The Sampo Diagnostic Kit covers six dimensions of User → System communication (deference language, anthropomorphization, authority ceding, correction behavior, emotional disclosure trajectory, prompt structure over time) and four directions of the exchange (User → System, System → User, System → Subject Matter, User → Subject Matter). This prompt is the sixth and final module in Kit 1.

This diagnostic measures the structure of the user's prompts, not their content. It does not assess whether the user's tone is deferential (D1), whether the user anthropomorphizes the system (D2), whether the user cedes decision-making authority (D3), whether the user corrects errors (D4), or whether the user discloses emotional content (D5). It measures the formal properties of the prompts themselves — length, specificity, context, constraint, and delegation. Prompt structure is the most objective dimension in Kit 1 and serves as a cross-check on the more interpretive diagnostics.

When all six Kit 1 diagnostics are run together, D6 provides the structural foundation: if the user's prompts are degrading, the other five dimensions will almost certainly show correlated shifts. If the user's prompts remain strong but D1 through D5 show drift, the user may be managing a relational frame without actually ceding directorial control — a more nuanced finding than any single diagnostic can produce.

This diagnostic cannot distinguish between user-authored prompts and system-suggested prompts (onboarding samples, suggested follow-ups). Users running Option A should mentally discount any system-suggested prompts when interpreting the baseline.

Return to the diagnostic index to see the full architecture.