tidypolars_extra.forcats¶
Functions¶
|
Collapse multiple factor levels into one |
|
Reorder factor levels by frequency (most common first) |
|
Collapse least frequent factor levels into 'Other' |
|
Manually recode factor levels |
|
Reverse factor level order |
Module Contents¶
- tidypolars_extra.forcats.fct_collapse(x, **kwargs)[source]¶
Collapse multiple factor levels into one
- Parameters:
x (Expr, str) – Factor/categorical column
**kwargs – Mapping of new_level = [‘old1’, ‘old2’, …]
- Returns:
Expression with collapsed levels.
- Return type:
Expr
Examples
>>> df.mutate(x_collapsed = tp.fct_collapse('x', ab=['a', 'b'], cd=['c', 'd']))
- tidypolars_extra.forcats.fct_infreq(df, col_name)[source]¶
Reorder factor levels by frequency (most common first)
- Parameters:
df (tibble) – The DataFrame containing the column
col_name (str) – Name of the column to reorder
- Returns:
DataFrame with column cast to Enum with levels ordered by frequency.
- Return type:
Examples
>>> df = tp.tibble(x=['a', 'b', 'a', 'a', 'b', 'c']) >>> df = tp.fct_infreq(df, 'x')
- tidypolars_extra.forcats.fct_lump(x, n=None, prop=None, other_level='Other')[source]¶
Collapse least frequent factor levels into ‘Other’
Uses a ranking approach: for each value, computes its frequency rank and replaces values outside the top n with other_level.
- Parameters:
x (Expr, str) – Factor/categorical column
n (int, optional) – Number of most frequent levels to keep
prop (float, optional) – Minimum proportion to keep a level (0 to 1)
other_level (str) – Label for collapsed levels (default: ‘Other’)
- Returns:
Expression with infrequent levels replaced.
- Return type:
Expr
Examples
>>> df.mutate(x_lumped = tp.fct_lump('x', n=3))
- tidypolars_extra.forcats.fct_recode(x, **kwargs)[source]¶
Manually recode factor levels
- Parameters:
x (Expr, str) – Factor/categorical column
**kwargs – Mapping of new_level = ‘old_level’ or new_level = [‘old1’, ‘old2’]
- Returns:
Expression with recoded levels.
- Return type:
Expr
Examples
>>> df.mutate(x_recoded = tp.fct_recode('x', good='a', bad='b'))
- tidypolars_extra.forcats.fct_rev(df, col_name)[source]¶
Reverse factor level order
- Parameters:
df (tibble) – The DataFrame containing the column
col_name (str) – Name of the column to reverse
- Returns:
DataFrame with column cast to Enum with reversed level order.
- Return type:
Examples
>>> df = tp.tibble(x=['a', 'b', 'c']) >>> df = tp.fct_rev(df, 'x')