Syllabus

Logistics

  • Format: self-paced. No instructor, no schedule, no grades.
  • Suggested pacing: about a week per unit, three to five hours for Weeks 1 and 2, five to seven hours for Weeks 3 and 4. Allow yourself four to eight weeks end-to-end depending on background and time.
  • Cadence: each week page is self-contained (readings, practice, knowledge check, project). Move at the speed you learn.
  • Duration scope: this is a foundations course. Real fluency develops over months of supervised practice on real research problems. Plan accordingly.

Prerequisites

  • Working comfort with R or Python at a scripting level. You should be able to read and modify a 100-line script.
  • An LLM chat account. The free tier of Claude, ChatGPT, or Gemini works for most exercises (see the budget note below).
  • A GitHub account.
  • A laptop with terminal access, and the ability to install Quarto and a code assistant.
WarningFree-tier budget note

Free-tier LLMs work for the Week 1 and Week 2 chat exercises. The Week 3 mini-project and the Week 4 final project benefit from a paid tier (Claude Pro, ChatGPT Plus) or institutional API access. Free-tier rate limits will otherwise interrupt sustained code-assistance sessions. If you have institutional access through a research computing service, check whether they offer enterprise LLM access before paying out of pocket.

Required tools

  • Quarto, current stable, version 1.5 or later, for reproducible reports.
  • Git and GitHub, for version control and storing your project artifacts.
  • A coding assistant. Pick one of Claude Code, Cursor, or VS Code with GitHub Copilot.
  • A chat LLM. Pick one of Claude, ChatGPT, or Gemini. Course examples use Claude by default.
  • A grounded literature tool. Pick one of Elicit, Consensus, SciSpace, or Perplexity in sources mode.
  • A bioinformatics environment. Python with Scanpy is the course default. The 5-module scRNA-seq pipeline runs in Google Colab out of the box. You don’t need a local install for Modules 3 to 5. Modules 1 and 2 need HPC, or you can skip them. If you prefer a local setup, a pinned conda or uv environment is supported (see Week 3). R and Seurat users can substitute the equivalent Bioconductor toolchain for the mini-project. Pointers are in Code assistance.

Course structure

The course covers three conceptual tracks and a 5-module scRNA-seq pipeline across four self-paced units:

Unit Theme Hands-on Project (with self-rubric)
1 Foundations and the 4 D’s Tooling setup, workflow audit Reflection (500 words or fewer), plus tooling confirmation
2 LLM literacy Solo prompt clinic Prompt-engineering exercise (weak vs. strong, with critique)
3 scRNA-seq I: QC, normalisation, clustering (Modules 3 and 4) Modules 3 and 4 in Colab Mini-project: reproducible AI-assisted PBMC 3k QC and clustering report
4 scRNA-seq II: annotation, final project (Module 5) Module 5 in Colab Final project. Your choice of literature brief, analysis, or protocol

Modules 1 (FastQC) and 2 (Cell Ranger or STARsolo) are reading material before Week 3 if you want the full picture. Module 2 needs HPC. If you don’t have it, skip to Module 3. Scanpy loads a pre-built count matrix.

Self-assessment

This course has no grades. Instead, you assess yourself with four kinds of feedback:

Mechanism Where it lives What it’s for
Per-page “Check your understanding” Bottom of each conceptual page Confirm a page landed before moving on (3 to 5 questions with worked answers)
Per-week knowledge check End of each week page Cumulative check across the week’s readings (5 to 8 mixed conceptual and applied questions)
Per-week hands-on practice Middle of each week page Apply the unit to your own data and questions (2 or 3 small exercises with self-check answers)
Project self-rubric End of each week page (Weeks 1, 2, 3, 4) Grade your own project against the same dimensions an instructor would

If you miss two or more knowledge-check questions on a topic, revisit the reading and redo the practice. The disclosure rubric below applies to every project. No one grades it for you, but if you cannot defend your disclosure, the project is not finished.

Disclosure rubric

Every project includes a disclosure section. Score it on these dimensions, and use the rubric as a self-check on whether your work meets the bar:

Dimension 0 1
Tools listed None or vague (“used AI”) Specific tool, version or model, and access tier (e.g., “Claude Sonnet 4.6, Pro tier, web UI”)
Use described Generic (“for help”) Concrete uses (“drafted DESeq2 scaffold, rewrote intro paragraph, suggested 3 papers of which 2 verified and 1 fabricated”)
Verification stated Not mentioned Names what you checked and how (ran tests, verified citations on PubMed, ran code on test data)
Rejections noted Not mentioned Lists at least one AI suggestion you rejected and why

A template disclosure paragraph comes with each project.

AI-use policy

This is a course about using AI tools well. You are expected to use them on most exercises. Two projects include an AI-free or AI-second baseline component so you can build and demonstrate the underlying skill, not only the collaboration skill:

  • Week 1 reflection: write your initial classification of three of your own tasks without AI consultation. Then, optionally, run them past an AI and note what changed.
  • Week 3 mini-project: write a small (about 25-line) AI-free baseline of the PBMC 3k QC pass first, by hand. Use sc.datasets.pbmc3k(), flag mitochondrial genes, plot three QC violins, filter on thresholds, and log-normalise. Then iterate with an AI assistant on the rest (clustering, marker checks, refactor). Submit both, with a short note on what the AI changed and why you accepted or rejected each change.

For all projects:

  • Disclose which tools you used and how. See the disclosure rubric above.
  • Verify every factual claim, citation, and piece of code before relying on it.
  • Own your output. You are the author of record. AI-generated text used verbatim must be quoted or marked.
  • Protect sensitive data. See the data policy below.

Data policy (required reading before any exercise using real data)

Do not paste any of the following into a third-party LLM service without explicit, documented permission and an approved data-handling pathway:

  • Patient-identifiable data, or any PHI under HIPAA (US) or personal data under GDPR (EU/UK).
  • Controlled-access data (NIH dbGaP, UK Biobank, EGA, and similar). Controlled-access agreements typically prohibit third-party transmission.
  • BAA-restricted clinical data. Even de-identified data may be subject to your institution’s Business Associate Agreements.
  • Embargoed pre-publication data, including unpublished collaborator data, manuscripts under review, and grant proposals.
  • Sequence or other data covered by a Material Transfer Agreement with restrictive terms.

For any such data, use on-premises tools, institutional tools, or vendor enterprise tiers with appropriate contracts (zero-retention with a signed BAA where applicable). When in doubt, consult your IRB, data steward, or institutional privacy office before the exercise, not after.

The course dataset (10x PBMC 3k) is public and safe to use with any tool. Public supplementary datasets (from GEO, ArrayExpress, and similar archives) are also safe.

Course policies

  • Pacing: there are no deadlines. The “weeks” are recommended pacing milestones, not a schedule.
  • Working with peers (optional): if you take this with a study group, share verification approaches and prompt logs, but write your own work. Use the self-review template on each other’s projects.
  • AI collaboration: governed by the AI-use policy above. Disclosure is the bar.

Academic integrity

If you take this course as part of a programme that submits credit, the same standard applies as in any course. AI output submitted as your own unverified, undisclosed work, without verification or critical engagement, violates academic integrity regardless of how the text was produced. The disclosure rubric makes the expectation concrete.

If you take the course on your own, the same standard still applies, to yourself. Disclosure and verification are habits to build now, not bureaucracy to perform later.

Accessibility

Materials are designed to be screen-reader friendly. Please report accessibility issues via the course repository.