Mutate¶
The .mutate() method adds new columns or modifies existing columns in your
data. New columns are defined using keyword arguments, where the key becomes
the column name and the value is a Polars expression.
import tidypolars_extra as tp
mtcars = tp.tibble(tp.read_data(fn="tidypolars_extra/data/mtcars.csv", sep=",", silently=True))
small_cars = mtcars.select("name", "cyl", "mpg", "hp")
Assign new columns¶
Create a new column cyl2 that doubles the cyl value:
small_cars.mutate(cyl2=tp.col("cyl") * 2)
shape: (32, 5)
┌───────────────────┬─────┬──────┬─────┬──────┐
│ name ┆ cyl ┆ mpg ┆ hp ┆ cyl2 │
╞═══════════════════╪═════╪══════╪═════╪══════╡
│ Mazda RX4 ┆ 6 ┆ 21.0 ┆ 110 ┆ 12 │
│ Mazda RX4 Wag ┆ 6 ┆ 21.0 ┆ 110 ┆ 12 │
│ Datsun 710 ┆ 4 ┆ 22.8 ┆ 93 ┆ 8 │
│ … ┆ … ┆ … ┆ … ┆ … │
└───────────────────┴─────┴──────┴─────┴──────┘
You can also add a column with a scalar value:
small_cars.mutate(label=tp.lit("car"))
Combining expressions¶
You can define multiple columns in a single mutate call:
small_cars.mutate(
hp_per_cyl=tp.col("hp") / tp.col("cyl"),
double_mpg=tp.col("mpg") * 2,
)
Used with grouped operations¶
The by parameter lets you compute group-level statistics and attach them
back to each row. For example, computing the mean hp per cyl group and
then demeaning the values:
(
small_cars
.mutate(
hp_mean=tp.col("hp").mean(),
demeaned_hp=tp.col("hp") - tp.col("hp").mean(),
by="cyl",
)
)
shape: (32, 6)
┌────────────────┬─────┬──────┬─────┬────────────┬─────────────┐
│ name ┆ cyl ┆ mpg ┆ hp ┆ hp_mean ┆ demeaned_hp │
╞════════════════╪═════╪══════╪═════╪════════════╪═════════════╡
│ Mazda RX4 ┆ 6 ┆ 21.0 ┆ 110 ┆ 122.285714 ┆ -12.285714 │
│ Mazda RX4 Wag ┆ 6 ┆ 21.0 ┆ 110 ┆ 122.285714 ┆ -12.285714 │
│ … ┆ … ┆ … ┆ … ┆ … ┆ … │
└────────────────┴─────┴──────┴─────┴────────────┴─────────────┘
With if_else and case_when¶
Using if_else¶
tp.if_else() works like a ternary operator — it evaluates a condition and
returns one value when true and another when false:
small_cars.mutate(
hp_category=tp.if_else(tp.col("hp") > 150, tp.lit("high"), tp.lit("low"))
)
shape: (32, 5)
┌───────────────────┬─────┬──────┬─────┬─────────────┐
│ name ┆ cyl ┆ mpg ┆ hp ┆ hp_category │
╞═══════════════════╪═════╪══════╪═════╪═════════════╡
│ Mazda RX4 ┆ 6 ┆ 21.0 ┆ 110 ┆ low │
│ Hornet Sportabout ┆ 8 ┆ 18.7 ┆ 175 ┆ high │
│ … ┆ … ┆ … ┆ … ┆ … │
└───────────────────┴─────┴──────┴─────┴─────────────┘
Using case_when¶
tp.case_when() handles multiple conditions. Conditions are specified as
pairs of (condition, value), with an optional _default for unmatched rows:
small_cars.mutate(
size=tp.case_when(
tp.col("cyl") == 4, "small",
tp.col("cyl") == 6, "medium",
_default="large",
)
)
shape: (32, 5)
┌───────────────────┬─────┬──────┬─────┬────────┐
│ name ┆ cyl ┆ mpg ┆ hp ┆ size │
╞═══════════════════╪═════╪══════╪═════╪════════╡
│ Mazda RX4 ┆ 6 ┆ 21.0 ┆ 110 ┆ medium │
│ Datsun 710 ┆ 4 ┆ 22.8 ┆ 93 ┆ small │
│ Hornet Sportabout ┆ 8 ┆ 18.7 ┆ 175 ┆ large │
│ … ┆ … ┆ … ┆ … ┆ … │
└───────────────────┴─────┴──────┴─────┴────────┘