tidypolars_extra.stats

Functions

abs(x)

Absolute value

cor(x, y[, method])

Find the correlation of two columns

count(x)

Number of observations in each group

cov(x, y)

Find the covariance of two columns

cume_dist(x)

Compute cumulative distribution (proportion of values <= current value)

cummax(x)

Cumulative maximum

cummin(x)

Cumulative minimum

cumprod(x)

Cumulative product

cumsum(x)

Cumulative sum

first(x)

Get first value

floor(x)

Round numbers down to the lower integer

iqr(x)

Compute the interquartile range (Q3 - Q1)

last(x)

Get last value

length(x)

Number of observations in each group.

log(x)

Compute the natural logarithm of a column

log10(x)

Compute the base 10 logarithm of a column

mad(x)

Compute the median absolute deviation

max(x)

Get column max

mean(x)

Get column mean

median(x)

Get column median

min(x)

Get column minimum

mode(x)

Compute the statistical mode (most frequent value)

n()

Number of observations in each group

ntile(x, n)

Divide values into n roughly equal groups

percent_rank(x)

Compute percent rank (values between 0 and 1)

quantile(x[, quantile])

Get number of distinct values in a column

rank(x[, method])

Assigns a minimum rank to each element in the input list, handling ties by

scale(x)

Standardize the input by scaling it to a mean of 0 and a standard deviation of 1.

sd(x)

Get column standard deviation

sqrt(x)

Get column square root

sum(x)

Get column sum

var(x)

Get column variance

weighted_mean(x, w)

Compute weighted mean

zscore(x)

Standardize to z-scores (alias for scale)

Module Contents

tidypolars_extra.stats.abs(x)[source]

Absolute value

Parameters:

x (Expr, Series) – Column to operate on

Examples

>>> df.mutate(abs_x = tp.abs('x'))
>>> df.mutate(abs_x = tp.abs(col('x')))
tidypolars_extra.stats.cor(x, y, method='pearson')[source]

Find the correlation of two columns

Parameters:
  • x (Expr) – A column

  • y (Expr) – A column

  • method (str) – Type of correlation to find. Either ‘pearson’ or ‘spearman’.

Examples

>>> df.summarize(cor = tp.cor(col('x'), col('y')))
tidypolars_extra.stats.count(x)[source]

Number of observations in each group

Parameters:

x (Expr, Series) – Column to operate on

Examples

>>> df.summarize(count = tp.count(col('x')))
tidypolars_extra.stats.cov(x, y)[source]

Find the covariance of two columns

Parameters:
  • x (Expr) – A column

  • y (Expr) – A column

Examples

>>> df.summarize(cor = tp.cov(col('x'), col('y')))
tidypolars_extra.stats.cume_dist(x)[source]

Compute cumulative distribution (proportion of values <= current value)

Parameters:

x (Expr, Series) – Column to operate on

Examples

>>> df.mutate(cd = tp.cume_dist('x'))
tidypolars_extra.stats.cummax(x)[source]

Cumulative maximum

Parameters:

x (Expr, Series) – Column to operate on

Examples

>>> df.mutate(cmax = tp.cummax('x'))
tidypolars_extra.stats.cummin(x)[source]

Cumulative minimum

Parameters:

x (Expr, Series) – Column to operate on

Examples

>>> df.mutate(cmin = tp.cummin('x'))
tidypolars_extra.stats.cumprod(x)[source]

Cumulative product

Parameters:

x (Expr, Series) – Column to operate on

Examples

>>> df.mutate(cprod = tp.cumprod('x'))
tidypolars_extra.stats.cumsum(x)[source]

Cumulative sum

Parameters:

x (Expr, Series) – Column to operate on

Examples

>>> df.mutate(csum = tp.cumsum('x'))
tidypolars_extra.stats.first(x)[source]

Get first value

Parameters:

x (Expr, Series) – Column to operate on

Examples

>>> df.summarize(first_x = tp.first('x'))
>>> df.summarize(first_x = tp.first(col('x')))
tidypolars_extra.stats.floor(x)[source]

Round numbers down to the lower integer

Parameters:

x (Expr, Series) – Column to operate on

Examples

>>> df.mutate(floor_x = tp.floor(col('x')))
tidypolars_extra.stats.iqr(x)[source]

Compute the interquartile range (Q3 - Q1)

Use in summarize() context only. Not suitable for mutate().

Parameters:

x (Expr, Series) – Column to operate on

Examples

>>> df.summarize(iqr_val = tp.iqr('x'))
tidypolars_extra.stats.last(x)[source]

Get last value

Parameters:

x (Expr, Series) – Column to operate on

Examples

>>> df.summarize(last_x = tp.last('x'))
>>> df.summarize(last_x = tp.last(col('x')))
tidypolars_extra.stats.length(x)[source]

Number of observations in each group.

Alias for count().

Parameters:

x (Expr, Series) – Column to operate on

Examples

>>> df.summarize(length = tp.length(col('x')))
tidypolars_extra.stats.log(x)[source]

Compute the natural logarithm of a column

Parameters:

x (Expr) – Column to operate on

Examples

>>> df.mutate(log = tp.log('x'))
tidypolars_extra.stats.log10(x)[source]

Compute the base 10 logarithm of a column

Parameters:

x (Expr) – Column to operate on

Examples

>>> df.mutate(log = tp.log10('x'))
tidypolars_extra.stats.mad(x)[source]

Compute the median absolute deviation

Use in summarize() context only. Not suitable for mutate().

Parameters:

x (Expr, Series) – Column to operate on

Examples

>>> df.summarize(mad_val = tp.mad('x'))
tidypolars_extra.stats.max(x)[source]

Get column max

Parameters:

x (Expr, Series) – Column to operate on

Examples

>>> df.summarize(max_x = tp.max('x'))
>>> df.summarize(max_x = tp.max(col('x')))
tidypolars_extra.stats.mean(x)[source]

Get column mean

Parameters:

x (Expr, Series) – Column to operate on

Examples

>>> df.summarize(mean_x = tp.mean('x'))
>>> df.summarize(mean_x = tp.mean(col('x')))
tidypolars_extra.stats.median(x)[source]

Get column median

Parameters:

x (Expr, Series) – Column to operate on

Examples

>>> df.summarize(median_x = tp.median('x'))
>>> df.summarize(median_x = tp.median(col('x')))
tidypolars_extra.stats.min(x)[source]

Get column minimum

Parameters:

x (Expr, Series) – Column to operate on

Examples

>>> df.summarize(min_x = tp.min('x'))
>>> df.summarize(min_x = tp.min(col('x')))
tidypolars_extra.stats.mode(x)[source]

Compute the statistical mode (most frequent value)

Returns the first mode if there are ties (non-deterministic for ties). Use in summarize() context.

Parameters:

x (Expr, Series) – Column to operate on

Examples

>>> df.summarize(m = tp.mode('x'))
tidypolars_extra.stats.n()[source]

Number of observations in each group

Examples

>>> df.summarize(count = tp.n())
tidypolars_extra.stats.ntile(x, n)[source]

Divide values into n roughly equal groups

Parameters:
  • x (Expr, Series) – Column to operate on

  • n (int) – Number of groups

Examples

>>> df.mutate(quartile = tp.ntile('x', 4))
tidypolars_extra.stats.percent_rank(x)[source]

Compute percent rank (values between 0 and 1)

Parameters:

x (Expr, Series) – Column to operate on

Examples

>>> df.mutate(prank = tp.percent_rank('x'))
tidypolars_extra.stats.quantile(x, quantile=0.5)[source]

Get number of distinct values in a column

Parameters:
  • x (Expr, Series) – Column to operate on

  • quantile (float) – Quantile to return

Examples

>>> df.summarize(quantile_x = tp.quantile('x', .25))
tidypolars_extra.stats.rank(x, method='dense')[source]

Assigns a minimum rank to each element in the input list, handling ties by assigning the same (lowest) rank to tied values. The next distinct value’s rank is increased by the number of tied values before it.

Parameters:
  • x (str) – Column to operate on

  • method (str) – dense (default): Assigns ranks in a consecutive manner, without gaps, even for ties. average : Assigns the average rank to tied values. min: Assigns the minimum rank to tied values. max: Assigns the maximum rank to tied values. ordinal: Assigns a distinct rank to each value based on its order of appearance.

Returns:

A list of ranks corresponding to the elements of x.

Return type:

list of int

Examples

>>> rank([10, 20, 20, 30])
[1, 2, 2, 3]
>>> rank([3, 1, 2])
[3, 1, 2]  # since sorted order is 1,2,3 => ranks are assigned as per their order
>>> rank(["b", "a", "a", "c"])
[2, 1, 1, 3]
tidypolars_extra.stats.scale(x)[source]

Standardize the input by scaling it to a mean of 0 and a standard deviation of 1.

Parameters:

x (Expr) – Column to operate on

Returns:

The standardized version of the input data.

Return type:

array-like

tidypolars_extra.stats.sd(x)[source]

Get column standard deviation

Parameters:

x (Expr, Series) – Column to operate on

Examples

>>> df.summarize(sd_x = tp.sd('x'))
>>> df.summarize(sd_x = tp.sd(col('x')))
tidypolars_extra.stats.sqrt(x)[source]

Get column square root

Parameters:

x (Expr, Series) – Column to operate on

Examples

>>> df.mutate(sqrt_x = tp.sqrt('x'))
tidypolars_extra.stats.sum(x)[source]

Get column sum

Parameters:

x (Expr, Series) – Column to operate on

Examples

>>> df.summarize(sum_x = tp.sum('x'))
>>> df.summarize(sum_x = tp.sum(col('x')))
tidypolars_extra.stats.var(x)[source]

Get column variance

Parameters:

x (Expr) – Column to operate on

Examples

>>> df.summarize(sum_x = tp.var('x'))
>>> df.summarize(sum_x = tp.var(col('x')))
tidypolars_extra.stats.weighted_mean(x, w)[source]

Compute weighted mean

Parameters:
  • x (Expr, Series) – Column of values

  • w (Expr, Series) – Column of weights

Examples

>>> df.summarize(wm = tp.weighted_mean('x', 'w'))
tidypolars_extra.stats.zscore(x)[source]

Standardize to z-scores (alias for scale)

Parameters:

x (Expr, Series) – Column to operate on

Examples

>>> df.mutate(z = tp.zscore('x'))