Code assistance

Using AI for bioinformatics scripting and debugging

NoteLearning objectives
  • Use an AI coding assistant for common bioinformatics scripting tasks.
  • Debug with AI by providing minimal reproducible context.
  • Recognise AI-generated code smells and correct them.

Scope

This page is about interactive coding assistance: chat-based or in-IDE AI help for scripts you write and own. Autonomous agents are covered in Tool use & agents.

Where AI helps and where it fails

AI helps with: boilerplate I/O (fastq, vcf, bam, h5ad, AnnData, Seurat, SingleCellExperiment); refactoring (splitting a long script into functions, adding type hints); translating (R to Python, pandas to polars, Scanpy to Seurat); and explaining unfamiliar code blocks.

AI falls short on: choosing the right clustering resolution or normalisation for your data; matching tool versions and API signatures when those changed recently; recalling lesser-known file-format schemas; and organism conventions. AI defaults to human (MT- prefix, TP53 casing) and is silently wrong on mouse (mt-, Trp53).

A minimum viable workflow

  1. Write the function signature and docstring yourself.
  2. Ask the AI to fill in the body.
  3. Run on a minimal test case you prepared.
  4. Diff against what you would have written.
  5. Commit with a short prompt-provenance note if the logic is non-trivial.

Debugging with AI

The single most effective move is giving the AI everything needed to reproduce the failure in one message, not just the error. The four-part pattern:

  1. Full traceback. Complete stack trace, not just the last line.
  2. Minimal failing code. The smallest block that triggers the error, not the whole script.
  3. One paragraph of intent. What the function is supposed to do.
  4. Environment. Python version, package versions (scanpy.__version__, and so on).

Same bug, two prompts

A weak prompt: “Getting a KeyError on ‘mt’. How do I fix it?”

The AI guesses. A typical reply: wrap the access in if 'mt' in adata.var.columns. That compiles, but it hides the actual problem (the flag was never computed).

A strong prompt uses the full pattern: traceback, minimal code, intent, environment.

sc.pp.calculate_qc_metrics is failing on PBMC 3k.

Traceback:
  KeyError: 'mt'
  File "qc.py", line 7, in <module>
    sc.pp.calculate_qc_metrics(adata, qc_vars=['mt'], ...)

Code:
  adata = sc.datasets.pbmc3k(); adata.var_names_make_unique()
  sc.pp.calculate_qc_metrics(adata, qc_vars=['mt'], percent_top=None,
                             log1p=False, inplace=True)

Intent: qc_vars=['mt'] should compute mitochondrial-fraction metrics.
Environment: Python 3.11, scanpy 1.10.1, anndata 0.10.7.

With the strong prompt the AI immediately identifies that qc_vars=['mt'] requires a column called mt in adata.var, and that you have not added it. Fix: adata.var['mt'] = adata.var_names.str.startswith('MT-') before the metrics call.

The AI is doing next-token prediction, not causal reasoning. A vague prompt produces a generic completion. A specific prompt produces a specific one. The four-part structure is the Description skill applied to debugging. The prompt library has a fill-in-the-blank version of it.

Worked example: AI-assisted test writing for validate_qc_outputs

After Module 3’s QC walkthrough produces a clean PBMC 3k AnnData, you want to lock that QC contract in code so the next student picking up the notebook does not silently break it.

Step 1: write the contract yourself

import anndata as ad

def validate_qc_outputs(adata: ad.AnnData) -> None:
    """
    Raise AssertionError if the post-QC AnnData fails any contract check.

    Checks:
    - obs has n_genes_by_counts, total_counts, pct_counts_mt.
    - var has the boolean 'mt' flag and at least one mitochondrial gene was detected.
    - 1,000 < n_obs < 3,000 (plausible for PBMC 3k after QC).
    - Counts are log-normalised (max value < 15 on a dense slice).
    """

The docstring is the contract. Do not let the AI invent it. That is the Description step.

Step 2: ask the AI to draft pytest test cases

Function validate_qc_outputs has the signature and docstring below. The PBMC 3k dataset (sc.datasets.pbmc3k) starts at 2,700 × 32,738 and after standard QC has about 2,638 cells. Environment: Python 3.11, scanpy 1.10.1, anndata 0.10.7, pytest 8. Draft pytest tests covering the contract.

The AI returns a fixture that runs the QC pipeline end-to-end, then four tests:

  • test_passes_on_valid_qc(qc_adata). The happy path.
  • test_missing_obs_column_raises(qc_adata). Drops pct_counts_mt, expects AssertionError.
  • test_no_mt_genes_raises(qc_adata). Sets var['mt'] = False, expects AssertionError.
  • test_unnormalised_counts_raise(qc_adata). Restores raw counts, expects AssertionError.

This is sound scaffolding. The Delegation move covers the fixture, pytest.raises patterns, and the boilerplate test bodies. That is exactly where AI adds speed without risk.

Step 3: prune and extend with Discernment

The AI also proposes tests that qc_adata.obs.shape[0] == 2638 exactly, that qc_adata.var.shape[0] >= 13000, and that the donor is healthy.

Reject these. They validate dataset identity, not the QC contract. Re-run with slightly different thresholds, or on a different PBMC sample, and they break even though the function is correct. The Discernment move: keep tests that validate structure your downstream code depends on, and discard tests that validate data identity.

One test the AI misses is that mt is boolean, not int or string. A user who writes adata.var['mt'] = adata.var_names.str.startswith('MT-').astype(int) produces a column that looks right, and qc_vars=['mt'] even accepts it, but downstream code that assumes a boolean mask fails silently. Add it yourself:

def test_mt_flag_is_boolean(qc_adata):
    assert qc_adata.var['mt'].dtype == bool

R and Seurat note

The scRNA-seq workflow has an R and Seurat equivalent at every step: CreateSeuratObject, PercentageFeatureSet(pattern = "^MT-"), subset, NormalizeData, ScaleData, RunPCA, RunUMAP, FindClusters. The testthat workflow for the same contract is identical. Write the validator in R/validate.R and drive it with testthat::expect_error. The pattern transfers.

TipWhere the 4 D’s showed up
  • Description: writing the signature and docstring yourself gives the AI the contract it needs.
  • Delegation: fixture, pytest.raises, and boilerplate test bodies. AI adds speed without risk.
  • Discernment: rejecting brittle dataset-identity tests, and catching the boolean-dtype test the AI missed.
  • Diligence: you own the contract. If validate_qc_outputs passes every test and downstream clustering still breaks, the test suite was incomplete. That is on you, not the AI.

Common failure modes

  • Asking the AI to generate the contract. “Write tests for my QC function” produces tests for the AI’s imagined function, not yours.
  • Wrong organism conventions. AI defaults to human: MT- (not mt-), TP53 (not Trp53), MS4A1 (not Ms4a1). Specify organism in every prompt and verify the resulting flag is non-empty.
  • Package hallucination. AI sometimes suggests imports from packages that do not exist or do not export the named function. Run pip show <package> before trusting a new import.
  • Stale API signatures. AI training data lags by months to years. For fast-moving libraries (scanpy, anndata, scrublet), verify in the current docs, not the AI’s output.
  • Operating on AnnData views. adata = adata[mask, :] returns a view, and in-place modification triggers warnings that often become silent inconsistencies. Add .copy() after subsetting if the next step writes.

Exercises

  1. Take a 50- to 100-line bioinformatics script you wrote from scratch. Ask an AI to refactor it. Keep what is better and reject what is worse.
  2. Take a bug you fixed recently. Reconstruct the error state and ask an AI to debug it cold. Apply the four-part pattern. How much does context quality change the answer?
  3. Implement validate_qc_outputs for PBMC 3k. Run the AI-generated test suite. Add at least one test the AI missed (consider organism conventions, dtype, or view vs. copy). Bring both to the Week 3 session.
  1. What four pieces of context turn a weak debugging prompt into a strong one?
  2. Why is “write tests for my QC function” the wrong delegation prompt? What is the right one?
  3. Your AI-generated test asserts qc_adata.obs.shape[0] == 2638. Is this a good test? Why or why not?

Answers: 1. Full traceback, minimal failing code, one paragraph of intent, and environment (versions, organism, platform). 2. It lets the AI invent the contract. The tests will pass on the AI’s imagined function, not yours. The right prompt: “Function f has this signature and docstring (paste). Draft pytest tests covering that contract.” 3. No. It validates dataset identity, not the QC contract. The function is still correct if PBMC 3k is reprocessed at slightly different thresholds (say, 2,640 cells) or run on a different PBMC sample. Identity assertions break for the wrong reason. Keep contract tests, drop identity tests.

Further reading