Summarize

The .summarize() method (also available as .summarise()) lets you compute summary statistics — a single value per group (or for the entire dataset if no groups are specified). The result has as many rows as there are unique groups.

import tidypolars_extra as tp

mtcars = tp.tibble(tp.read_data(fn="tidypolars_extra/data/mtcars.csv", sep=",", silently=True))

Summarize over everything

When used without grouping, summarize returns a single row.

mtcars.summarize(avg_mpg=tp.col("mpg").mean())
shape: (1, 1)
┌───────────┐
│ avg_mpg   │
╞═══════════╡
│ 20.090625 │
└───────────┘

Summarizing per group

Use the by parameter to compute summaries within groups. For example, there are 3 values of cylinders (cyl) — 4, 6, and 8 — so the result will have 3 rows:

mtcars.summarize(avg_mpg=tp.col("mpg").mean(), by="cyl")
shape: (3, 2)
┌─────┬───────────┐
│ cyl ┆ avg_mpg   │
╞═════╪═══════════╡
│ 8   ┆ 15.1      │
│ 6   ┆ 19.742857 │
│ 4   ┆ 26.663636 │
└─────┴───────────┘

Multiple summary statistics

You can compute multiple summary statistics in a single call:

mtcars.summarize(
    avg_mpg=tp.col("mpg").mean(),
    max_hp=tp.col("hp").max(),
    min_wt=tp.col("wt").min(),
    by="cyl",
)
shape: (3, 4)
┌─────┬───────────┬────────┬────────┐
│ cyl ┆ avg_mpg   ┆ max_hp ┆ min_wt │
╞═════╪═══════════╪════════╪════════╡
│ 8   ┆ 15.1      ┆ 335    ┆ 3.17   │
│ 6   ┆ 19.742857 ┆ 175    ┆ 2.62   │
│ 4   ┆ 26.663636 ┆ 113    ┆ 1.513  │
└─────┴───────────┴────────┴────────┘

Summarize with a literal value

You can also include a literal (scalar) value in the summary:

mtcars.summarize(
    measure=tp.lit("mean miles per gallon"),
    value=tp.col("mpg").mean(),
    by="cyl",
)
shape: (3, 3)
┌─────┬──────────────────────┬───────────┐
│ cyl ┆ measure              ┆ value     │
╞═════╪══════════════════════╪═══════════╡
│ 8   ┆ mean miles per gallon┆ 15.1      │
│ 6   ┆ mean miles per gallon┆ 19.742857 │
│ 4   ┆ mean miles per gallon┆ 26.663636 │
└─────┴──────────────────────┴───────────┘

Using count for quick frequency tables

The .count() method is a convenient shortcut for counting rows per group:

mtcars.count("cyl")
shape: (3, 2)
┌─────┬─────┐
│ cyl ┆ n   │
╞═════╪═════╡
│ 4   ┆ 11  │
│ 8   ┆ 14  │
│ 6   ┆ 7   │
└─────┴─────┘