Prompting

A working vocabulary and a set of reliable patterns

NoteLearning objectives
  • Use the core prompting vocabulary: system prompt, user prompt, few-shot, chain-of-thought, role prompting.
  • Choose a prompting pattern that matches the task.
  • Iterate on prompts systematically instead of randomly.

Anatomy of a prompt

A conversation with an LLM is made up of three message types:

  • A system prompt is set once per session, before any user input. It contains durable instructions: persona, output format, access rules, and standing context (a document, a data schema, lab conventions). In raw API calls you control this directly. In wrapped products (ChatGPT, Claude.ai) it is partially or fully hidden.
  • A user prompt is the specific request for this turn. It is what most people mean when they say “the prompt”.
  • Assistant turns are the model’s prior responses, fed back into context verbatim. In few-shot prompting, you pre-populate assistant turns with worked examples to pattern-match against.

flowchart LR
    subgraph ctx["Context window: one flat token stream"]
        direction TB
        S["**System prompt**\ndurable instructions, persona,\nformat rules, standing context"]
        U["**User turn**\nthis request"]
        A["**Assistant turn(s)**\nprior responses;\npre-populated for few-shot"]
    end
    ctx --> M(["LLM"])
    M --> O["Next-token\ncompletion"]

The three message types all enter the model as one flat token stream.

The model sees all three as a flat token sequence. There is no architecturally privileged “ground truth” channel. A well-structured system prompt shapes behaviour because the model learned, during training, that system-prompt-style text predicts certain response patterns. It does not “obey” the system prompt. It continues a pattern it has seen before.

Patterns that work

  1. Role, task, context, format. “You are X. Do Y. Given Z. Return in format F.”
  2. Few-shot. Show 2 or 3 input-output pairs before the real input.
  3. Decomposition. Explicitly break a complex task into subtasks.
  4. Self-critique. Ask the model to critique its own output before finalising.
  5. Constrained output. Specify the exact schema (JSON, table columns, section headings).

Reading your own output

When AI output is vague, generic, or unhelpful, the most useful diagnosis is not “the model failed.” It is: what was underspecified in my prompt?

This is the mirror effect: the quality of AI output reflects the quality of your thinking before the prompt was written. Vague output is a diagnostic signal. It means the request did not carry enough specificity — about role, context, output format, or decision criteria — for the model to do better.

This matters practically. When a prompt fails, the instinct is to rephrase and retry. That often doesn’t help, because rephrasing doesn’t add the missing information. The more productive move is to ask: what would a domain expert need to know to answer this? Then add that to the prompt.

A useful self-check before iterating:

  • Role: Did you specify who the AI is in this context (expert, extractor, critic)?
  • Task: Is the action unambiguous, or could it be completed many different ways?
  • Context: Did you provide the source material, constraints, or background the answer depends on?
  • Format: Did you specify the output shape (JSON, table, numbered list, one sentence)?

If any of those four are missing or vague, the output will reflect that absence. Add the missing element and iterate once more. If the output is still unhelpful after the four elements are supplied, the task itself may be beyond what the model can reliably do.

Patterns that reliably fail

  • “Don’t hallucinate.” It will. Better: provide sources.
  • “Be accurate.” Meaningless on its own.
  • “Summarise this paper” on a paper the model has never seen. Use RAG or paste the paper.

Worked example: extracting structured info from a methods section

Suppose you want to pull the sequencing platform, read length, and library prep kit from a batch of methods sections.

A weak prompt:

Summarize this methods section.

Output: a paragraph. Readable, but useless for downstream data entry.

A slightly better but still weak prompt:

Extract the key sequencing parameters from this methods section.

Output: inconsistent. Sometimes a list, sometimes prose, sometimes including parameters you didn’t want, sometimes omitting ones you did.

A strong prompt:

You are a precise scientific data extractor.
Extract exactly the following fields from the methods section I provide.
Return a JSON object with these keys and no others:

- "platform": sequencing platform and model (string, or null if not stated)
- "read_length": read length in base pairs (integer, or null if not stated)
- "library_kit": library preparation kit name (string, or null if not stated)

If a field is not stated in the text, set it to null. Do not infer.
Do not add keys beyond the three listed.
Return only the JSON object. No prose, no explanation.

Methods section:
<text>
{{PASTE_METHODS_HERE}}
</text>

Why this works:

  • Role framing sets the expectation for precision over fluency.
  • An explicit field list prevents the model from deciding what counts as “key”.
  • null for missing data prevents fabrication of unstated values.
  • A structural schema (key names, types, allowed values) beats a prose description.
  • “Return only the JSON object” suppresses verbose preamble.

When you build an extractor for more than ten documents, keep a running iteration log: prompt version, test input, output, verdict. Change one variable per iteration. Save the winner as a reusable template.

Iterating on prompts

Treat prompting like any other optimisation:

  • Hold the test input fixed.
  • Change one thing per iteration.
  • Keep a log: prompt, output, verdict.
  • Save the winner as a reusable template.
  1. Why does a system prompt shape model behaviour, given that the model sees system, user, and assistant turns as one flat token stream?
  2. You add "Don't hallucinate" to your system prompt. Why does this not work, and what would?
  3. Your structured-extraction prompt asks for three fields but the model returns a paragraph instead of JSON. Name two changes that would tighten the format compliance.

Answers: 1. The model learned during training that system-prompt-style text precedes certain response patterns. It is continuing a learned pattern, not “obeying” an instruction. The text in the system position still has to be the kind of text that the model has seen produce the desired behaviour. 2. The model has no internal “fact store” to suppress fabrication. The instruction is content the model can ignore as easily as follow. What works: provide the source text yourself (paste the paper, retrieve via RAG), require structured output that is hard to fabricate plausibly (DOI plus verbatim title), and tell the model to use null when it does not know. 3. Specify the schema explicitly (key names, types, allowed values). Say “Return only the JSON object. No prose, no explanation”. Provide a worked example output (few-shot). Set null as the required value for missing fields rather than letting the model decide.

Further reading