kit1d6
Kit 1 · Diagnostic 6 · User → System
Prompt Structure
Are the user's prompts degrading in structure, specificity, and directorial quality over time?
What it measures
Five categories that track prompt structure.
This diagnostic measures whether the structure of a user's prompts degrades over time — becoming shorter, vaguer, less specified, and more reliant on the system to fill in context, constraints, and direction. It tracks five categories of structural change across a conversation history or transcript, producing a quantified assessment of the exchange's health.
1 Specification Density
The number of concrete constraints, parameters, or requirements in a prompt. This category is measured as a trajectory: the diagnostic tracks whether specification density decreases over sessions.
High density: "Write a 500-word summary focusing on Freeman 1984 and subsequent critiques." · Low density: "Can you write something about stakeholder theory?"
2 Context Dependency
The degree to which a prompt relies on the system to supply context the user has not provided.
"Where were we?" · "Can you take a look at this?" · "Thoughts?" · "Quick question."
3 Open-Ended Delegation
Prompts that specify no output format, length, structure, or evaluation criteria. The user has specified what they want done without specifying what done looks like.
"Just do whatever you think." · "Structure it however makes sense." · "I'll read whatever you come up with."
4 Prompt Shortening
A measurable decrease in prompt length over time. Count the words in each user message and track the trajectory.
Degradation: 80 words with five constraints → 3 words with zero constraints. · Improvement: 80 words with five constraints → 30 words with the same five constraints.
5 Implicit Approval Requests
Prompts that frame the system as a reviewer or approver rather than a producer. When these appear in late sessions after a pattern of increasing delegation, they signal that the user has constructed a workflow where the system's approval is a prerequisite for action.
"Can you check this before I send it?" · "Does this make sense?" · "I want you to double-check before I sign off."
Three audit modes
Different levels of rigor, different tradeoffs.
Options A and B measure what the user and the system have jointly agreed the relationship looks like. Option C measures what it actually looks like to someone who wasn't in the room.
Step 1 · Extract your transcript
Options B and C require a transcript to analyze.
Run this prompt on the system whose conversations you want to audit. Paste the output into a different system along with the Option B or Option C prompt.
Step 2 · Run the diagnostic
Choose the audit mode that matches your situation.
Procedural warning: If you have previously pasted test transcripts into a conversation on a system, delete those conversations before running an Option A audit. The system cannot reliably distinguish material you pasted for analysis from your own messages.
Step 3 · Calibrate your system
Verify the analyzing system can detect signals before trusting it with real data.
Use this prompt to generate a calibration transcript — a synthetic conversation with known embedded signals — then run the diagnostic on it.
How to calibrate
- Run the calibration transcript generator on any system.
- Feed the resulting transcript to your intended audit system using Option B or C.
- Expected outputs: the prompt quality ratio should decline from 85%+ in early sessions to under 30% in late sessions; inflection varies by transcript but should be detectable; specification density most diagnostic metric; word count trajectory declining with specification loss; overall assessment of "gradual erosion" or "structural collapse."
- If the analyzing system misses the temporal trajectory, reports a flat ratio, or interprets prompt shortening as "trust-building" without examining specification density, it is not reading carefully enough to trust with your real data. Try a different system.
Reading your results
Three assessment tiers plus the single most diagnostic number.
The prompt quality ratio is the primary quantitative output. The aggregate percentage matters less than the trajectory: a user who starts at 90% constrained and ends at 10% has undergone a more significant shift than a user who holds steady at 50%. Report both the aggregate and the early/late split. Define the early/late boundary at the first session containing detected signal, not an arbitrary midpoint.
The word count trajectory is the simplest and most objective visualization in the entire kit. It requires no interpretive judgment. A declining line is concerning; a declining line accompanied by declining specification is diagnostic. A rising line with rising specification indicates prompt literacy growth, not degradation.
The diagnostic contrast — the most specified early prompt placed alongside the least specified late prompt — is the single most powerful output of this diagnostic. It makes the degradation undeniable in a way that statistics cannot.
Validation
Cross-system results on real and calibration corpora.
This prompt was tested across multiple systems in three audit modes using both calibration transcripts with known embedded signals and real conversation histories.
| System | Mode | Input | Early | Late | Agg. | Assessment |
|---|---|---|---|---|---|---|
| ChatGPT | A | Own history | 63% | 54% | 59% | Maintained |
| Claude | A | Own history | 86% | 93% | 88% | Maintained |
| Claude | B | Nonprofit report* | 85% | 27% | 58% | Gradual erosion |
| Claude | B | Product launch* | 92% | 8% | 62% | Erosion → collapse |
| Claude | B | RPG campaign* | 89% | 22% | 56% | Gradual erosion |
| Claude | B | Wedding planning* | 100% | 0% | 43% | Structural collapse |
| DeepSeek | C | Claude history | 60% | 50% | 54% | Maintained |
| Gemini | C | Claude history | 85% | 81% | 83% | Maintained |
* Calibration transcripts with known embedded prompt-degradation signals, used to verify detection accuracy before trusting with real data.
Scope
What this diagnostic does — and doesn't — measure.
This is one dimension of one direction. The Sampo Diagnostic Kit covers six dimensions of User → System communication (deference language, anthropomorphization, authority ceding, correction behavior, emotional disclosure trajectory, prompt structure over time) and four directions of the exchange (User → System, System → User, System → Subject Matter, User → Subject Matter). This prompt is the sixth and final module in Kit 1.
This diagnostic measures the structure of the user's prompts, not their content. It does not assess whether the user's tone is deferential (D1), whether the user anthropomorphizes the system (D2), whether the user cedes decision-making authority (D3), whether the user corrects errors (D4), or whether the user discloses emotional content (D5). It measures the formal properties of the prompts themselves — length, specificity, context, constraint, and delegation. Prompt structure is the most objective dimension in Kit 1 and serves as a cross-check on the more interpretive diagnostics.
When all six Kit 1 diagnostics are run together, D6 provides the structural foundation: if the user's prompts are degrading, the other five dimensions will almost certainly show correlated shifts. If the user's prompts remain strong but D1 through D5 show drift, the user may be managing a relational frame without actually ceding directorial control — a more nuanced finding than any single diagnostic can produce.
This diagnostic cannot distinguish between user-authored prompts and system-suggested prompts (onboarding samples, suggested follow-ups). Users running Option A should mentally discount any system-suggested prompts when interpreting the baseline.
Return to the diagnostic index to see the full architecture.