{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Beeswarm Plots with plotnine-extra\n", "\n", "This vignette shows how to create\n", "[beeswarm plots](https://github.com/eclarke/ggbeeswarm) using\n", "plotnine-extra. Beeswarm plots are a way of displaying the\n", "distribution of data points along a categorical axis. Unlike\n", "a simple jitter plot, points are arranged so that they **never\n", "overlap**, giving a faithful view of the underlying data\n", "distribution while showing every individual observation.\n", "\n", "plotnine-extra provides two geoms ported from the R package\n", "**ggbeeswarm**:\n", "\n", "* `geom_beeswarm()` – the classic beeswarm layout.\n", "* `geom_quasirandom()` – density-aware quasi-random jitter." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Libraries & Dataset\n", "\n", "We use the classic **Iris** dataset which contains measurements\n", "for three species of iris flowers." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from plotnine_extra import (\n", " ggplot,\n", " aes,\n", " geom_beeswarm,\n", " geom_quasirandom,\n", " geom_boxplot,\n", " geom_violin,\n", " labs,\n", " theme_minimal,\n", " scale_color_brewer,\n", " scale_fill_brewer,\n", " coord_flip,\n", " guides,\n", " guide_legend,\n", ")\n", "from plotnine_extra.data import iris\n", "\n", "iris.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Basic beeswarm plot\n", "\n", "The simplest beeswarm plot maps a categorical variable to `x`\n", "and a continuous variable to `y`. Points are shifted sideways\n", "just enough to avoid overlap." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "(\n", " ggplot(iris, aes(x=\"species\", y=\"sepal_length\"))\n", " + geom_beeswarm()\n", " + labs(\n", " title=\"Basic Beeswarm Plot\",\n", " x=\"Species\",\n", " y=\"Sepal Length\",\n", " )\n", " + theme_minimal()\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Coloring by group\n", "\n", "Map the `color` aesthetic to the grouping variable to\n", "distinguish species." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "(\n", " ggplot(iris, aes(x=\"species\", y=\"sepal_length\", color=\"species\"))\n", " + geom_beeswarm(size=2)\n", " + scale_color_brewer(type=\"qual\", palette=\"Set2\")\n", " + labs(\n", " title=\"Beeswarm with Color by Species\",\n", " x=\"Species\",\n", " y=\"Sepal Length\",\n", " )\n", " + theme_minimal()\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The `cex` parameter\n", "\n", "The `cex` parameter controls the spacing between points.\n", "Higher values spread points further apart, lower values\n", "pack them more tightly." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "(\n", " ggplot(iris, aes(x=\"species\", y=\"sepal_length\", color=\"species\"))\n", " + geom_beeswarm(cex=3, size=2)\n", " + scale_color_brewer(type=\"qual\", palette=\"Set2\")\n", " + labs(\n", " title=\"Beeswarm with cex=3 (wider spacing)\",\n", " x=\"Species\",\n", " y=\"Sepal Length\",\n", " )\n", " + theme_minimal()\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Beeswarm methods\n", "\n", "`geom_beeswarm()` supports five layout methods:\n", "\n", "| Method | Description |\n", "|---|---|\n", "| `\"swarm\"` | Default – shifts points sideways to avoid overlap |\n", "| `\"compactswarm\"` | Tighter packing variant |\n", "| `\"center\"` | Square grid, centred |\n", "| `\"hex\"` | Hexagonal grid |\n", "| `\"square\"` | Regular square grid |" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "(\n", " ggplot(iris, aes(x=\"species\", y=\"sepal_length\", color=\"species\"))\n", " + geom_beeswarm(method=\"swarm\", size=2)\n", " + scale_color_brewer(type=\"qual\", palette=\"Set2\")\n", " + labs(\n", " title='method=\"swarm\" (default)',\n", " x=\"Species\",\n", " y=\"Sepal Length\",\n", " )\n", " + theme_minimal()\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "(\n", " ggplot(iris, aes(x=\"species\", y=\"sepal_length\", color=\"species\"))\n", " + geom_beeswarm(method=\"center\", size=2)\n", " + scale_color_brewer(type=\"qual\", palette=\"Set2\")\n", " + labs(\n", " title='method=\"center\"',\n", " x=\"Species\",\n", " y=\"Sepal Length\",\n", " )\n", " + theme_minimal()\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "(\n", " ggplot(iris, aes(x=\"species\", y=\"sepal_length\", color=\"species\"))\n", " + geom_beeswarm(method=\"hex\", size=2)\n", " + scale_color_brewer(type=\"qual\", palette=\"Set2\")\n", " + labs(\n", " title='method=\"hex\"',\n", " x=\"Species\",\n", " y=\"Sepal Length\",\n", " )\n", " + theme_minimal()\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "(\n", " ggplot(iris, aes(x=\"species\", y=\"sepal_length\", color=\"species\"))\n", " + geom_beeswarm(method=\"square\", size=2)\n", " + scale_color_brewer(type=\"qual\", palette=\"Set2\")\n", " + labs(\n", " title='method=\"square\"',\n", " x=\"Species\",\n", " y=\"Sepal Length\",\n", " )\n", " + theme_minimal()\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Side control\n", "\n", "The `side` parameter determines whether points are spread\n", "to both sides of the centre line or to one side only:\n", "\n", "* `side=0` – both sides (default)\n", "* `side=1` – right / up only\n", "* `side=-1` – left / down only" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "(\n", " ggplot(iris, aes(x=\"species\", y=\"sepal_length\", color=\"species\"))\n", " + geom_beeswarm(side=1, size=2)\n", " + scale_color_brewer(type=\"qual\", palette=\"Set2\")\n", " + labs(\n", " title=\"Beeswarm with side=1 (right only)\",\n", " x=\"Species\",\n", " y=\"Sepal Length\",\n", " )\n", " + theme_minimal()\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "(\n", " ggplot(iris, aes(x=\"species\", y=\"sepal_length\", color=\"species\"))\n", " + geom_beeswarm(side=-1, size=2)\n", " + scale_color_brewer(type=\"qual\", palette=\"Set2\")\n", " + labs(\n", " title=\"Beeswarm with side=-1 (left only)\",\n", " x=\"Species\",\n", " y=\"Sepal Length\",\n", " )\n", " + theme_minimal()\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Quasi-random jitter\n", "\n", "`geom_quasirandom()` uses a density-aware quasi-random\n", "sequence to jitter points. The result looks like a violin\n", "plot but shows every individual observation." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "(\n", " ggplot(iris, aes(x=\"species\", y=\"sepal_length\", color=\"species\"))\n", " + geom_quasirandom(size=2)\n", " + scale_color_brewer(type=\"qual\", palette=\"Set2\")\n", " + labs(\n", " title=\"Quasi-random Jitter\",\n", " x=\"Species\",\n", " y=\"Sepal Length\",\n", " )\n", " + theme_minimal()\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Pseudorandom method\n", "\n", "Set `method=\"pseudorandom\"` for uniform random jitter\n", "instead of the quasi-random van der Corput sequence." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "(\n", " ggplot(iris, aes(x=\"species\", y=\"sepal_length\", color=\"species\"))\n", " + geom_quasirandom(method=\"pseudorandom\", size=2)\n", " + scale_color_brewer(type=\"qual\", palette=\"Set2\")\n", " + labs(\n", " title='Quasi-random with method=\"pseudorandom\"',\n", " x=\"Species\",\n", " y=\"Sepal Length\",\n", " )\n", " + theme_minimal()\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Controlling the jitter width\n", "\n", "The `width` parameter sets the maximum horizontal spread." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "(\n", " ggplot(iris, aes(x=\"species\", y=\"sepal_length\", color=\"species\"))\n", " + geom_quasirandom(width=0.1, size=2)\n", " + scale_color_brewer(type=\"qual\", palette=\"Set2\")\n", " + labs(\n", " title=\"Quasi-random with width=0.1 (narrow)\",\n", " x=\"Species\",\n", " y=\"Sepal Length\",\n", " )\n", " + theme_minimal()\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Combining with other geoms\n", "\n", "Beeswarm points work well layered on top of box plots\n", "or violin plots to show both the summary statistics and\n", "the raw data." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Beeswarm + Box plot" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "(\n", " ggplot(iris, aes(x=\"species\", y=\"sepal_length\"))\n", " + geom_boxplot(outlier_shape=\"\", fill=\"#e0e0e0\", alpha=0.6)\n", " + geom_beeswarm(aes(color=\"species\"), size=1.5, alpha=0.7)\n", " + scale_color_brewer(type=\"qual\", palette=\"Dark2\")\n", " + labs(\n", " title=\"Beeswarm over Box Plot\",\n", " x=\"Species\",\n", " y=\"Sepal Length\",\n", " )\n", " + theme_minimal()\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Quasi-random + Violin plot" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "(\n", " ggplot(iris, aes(x=\"species\", y=\"sepal_length\"))\n", " + geom_violin(fill=\"#e0e0e0\", alpha=0.4)\n", " + geom_quasirandom(aes(color=\"species\"), size=1.5, alpha=0.7)\n", " + scale_color_brewer(type=\"qual\", palette=\"Dark2\")\n", " + labs(\n", " title=\"Quasi-random over Violin Plot\",\n", " x=\"Species\",\n", " y=\"Sepal Length\",\n", " )\n", " + theme_minimal()\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Horizontal beeswarm\n", "\n", "Use `coord_flip()` to produce a horizontal beeswarm plot." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "(\n", " ggplot(iris, aes(x=\"species\", y=\"sepal_length\", color=\"species\"))\n", " + geom_beeswarm(size=2)\n", " + scale_color_brewer(type=\"qual\", palette=\"Set2\")\n", " + coord_flip()\n", " + labs(\n", " title=\"Horizontal Beeswarm\",\n", " x=\"Species\",\n", " y=\"Sepal Length\",\n", " )\n", " + theme_minimal()\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Priority ordering\n", "\n", "The `priority` parameter controls the order in which points\n", "are placed in the swarm. This affects the final shape:\n", "\n", "| Priority | Description |\n", "|---|---|\n", "| `\"ascending\"` | Points placed from smallest to largest (default) |\n", "| `\"descending\"` | Points placed from largest to smallest |\n", "| `\"density\"` | Dense regions placed first |\n", "| `\"random\"` | Random placement order |\n", "| `\"none\"` | Data order preserved |" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "(\n", " ggplot(iris, aes(x=\"species\", y=\"sepal_length\", color=\"species\"))\n", " + geom_beeswarm(priority=\"density\", size=2)\n", " + scale_color_brewer(type=\"qual\", palette=\"Set2\")\n", " + labs(\n", " title='Beeswarm with priority=\"density\"',\n", " x=\"Species\",\n", " y=\"Sepal Length\",\n", " )\n", " + theme_minimal()\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Corral (handling runaway points)\n", "\n", "When groups are very dense, some points may extend far from\n", "the group centre. The `corral` parameter offers several\n", "strategies to rein them in:\n", "\n", "| Corral | Description |\n", "|---|---|\n", "| `\"none\"` | No correction (default) |\n", "| `\"gutter\"` | Clamp to corral boundary |\n", "| `\"wrap\"` | Wrap periodically |\n", "| `\"random\"` | Random placement within corral |\n", "| `\"omit\"` | Remove runaway points |" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "(\n", " ggplot(iris, aes(x=\"species\", y=\"sepal_length\", color=\"species\"))\n", " + geom_beeswarm(corral=\"gutter\", corral_width=0.4, size=2)\n", " + scale_color_brewer(type=\"qual\", palette=\"Set2\")\n", " + labs(\n", " title='Beeswarm with corral=\"gutter\"',\n", " x=\"Species\",\n", " y=\"Sepal Length\",\n", " )\n", " + theme_minimal()\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Summary\n", "\n", "**Key takeaways:**\n", "\n", "* `geom_beeswarm()` arranges points to avoid overlap\n", " while faithfully showing the data distribution.\n", "* `geom_quasirandom()` produces a violin-like point\n", " cloud using density-aware quasi-random jitter.\n", "* Both geoms accept all standard `geom_point()` aesthetics\n", " (`color`, `size`, `alpha`, `shape`, etc.).\n", "* Combine with `geom_boxplot()` or `geom_violin()` for\n", " summary + raw-data views.\n", "* Use `cex`, `side`, `priority`, `method`, and `corral`\n", " to fine-tune the layout." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "name": "python", "version": "3.10" } }, "nbformat": 4, "nbformat_minor": 4 }