tidypolars_extra.forcats

Functions

fct_collapse(x, **kwargs)

Collapse multiple factor levels into one

fct_infreq(df, col_name)

Reorder factor levels by frequency (most common first)

fct_lump(x[, n, prop, other_level])

Collapse least frequent factor levels into 'Other'

fct_recode(x, **kwargs)

Manually recode factor levels

fct_rev(df, col_name)

Reverse factor level order

Module Contents

tidypolars_extra.forcats.fct_collapse(x, **kwargs)[source]

Collapse multiple factor levels into one

Parameters:
  • x (Expr, str) – Factor/categorical column

  • **kwargs – Mapping of new_level = [‘old1’, ‘old2’, …]

Returns:

Expression with collapsed levels.

Return type:

Expr

Examples

>>> df.mutate(x_collapsed = tp.fct_collapse('x', ab=['a', 'b'], cd=['c', 'd']))
tidypolars_extra.forcats.fct_infreq(df, col_name)[source]

Reorder factor levels by frequency (most common first)

Parameters:
  • df (tibble) – The DataFrame containing the column

  • col_name (str) – Name of the column to reorder

Returns:

DataFrame with column cast to Enum with levels ordered by frequency.

Return type:

tibble

Examples

>>> df = tp.tibble(x=['a', 'b', 'a', 'a', 'b', 'c'])
>>> df = tp.fct_infreq(df, 'x')
tidypolars_extra.forcats.fct_lump(x, n=None, prop=None, other_level='Other')[source]

Collapse least frequent factor levels into ‘Other’

Uses a ranking approach: for each value, computes its frequency rank and replaces values outside the top n with other_level.

Parameters:
  • x (Expr, str) – Factor/categorical column

  • n (int, optional) – Number of most frequent levels to keep

  • prop (float, optional) – Minimum proportion to keep a level (0 to 1)

  • other_level (str) – Label for collapsed levels (default: ‘Other’)

Returns:

Expression with infrequent levels replaced.

Return type:

Expr

Examples

>>> df.mutate(x_lumped = tp.fct_lump('x', n=3))
tidypolars_extra.forcats.fct_recode(x, **kwargs)[source]

Manually recode factor levels

Parameters:
  • x (Expr, str) – Factor/categorical column

  • **kwargs – Mapping of new_level = ‘old_level’ or new_level = [‘old1’, ‘old2’]

Returns:

Expression with recoded levels.

Return type:

Expr

Examples

>>> df.mutate(x_recoded = tp.fct_recode('x', good='a', bad='b'))
tidypolars_extra.forcats.fct_rev(df, col_name)[source]

Reverse factor level order

Parameters:
  • df (tibble) – The DataFrame containing the column

  • col_name (str) – Name of the column to reverse

Returns:

DataFrame with column cast to Enum with reversed level order.

Return type:

tibble

Examples

>>> df = tp.tibble(x=['a', 'b', 'c'])
>>> df = tp.fct_rev(df, 'x')