tidypolars_extra.stats¶
Functions¶
|
Absolute value |
|
Find the correlation of two columns |
|
Number of observations in each group |
|
Find the covariance of two columns |
|
Compute cumulative distribution (proportion of values <= current value) |
|
Cumulative maximum |
|
Cumulative minimum |
|
Cumulative product |
|
Cumulative sum |
|
Get first value |
|
Round numbers down to the lower integer |
|
Compute the interquartile range (Q3 - Q1) |
|
Get last value |
|
Number of observations in each group. |
|
Compute the natural logarithm of a column |
|
Compute the base 10 logarithm of a column |
|
Compute the median absolute deviation |
|
Get column max |
|
Get column mean |
|
Get column median |
|
Get column minimum |
|
Compute the statistical mode (most frequent value) |
|
Number of observations in each group |
|
Divide values into n roughly equal groups |
|
Compute percent rank (values between 0 and 1) |
|
Get number of distinct values in a column |
|
Assigns a minimum rank to each element in the input list, handling ties by |
|
Standardize the input by scaling it to a mean of 0 and a standard deviation of 1. |
|
Get column standard deviation |
|
Get column square root |
|
Get column sum |
|
Get column variance |
|
Compute weighted mean |
|
Standardize to z-scores (alias for scale) |
Module Contents¶
- tidypolars_extra.stats.abs(x)[source]¶
Absolute value
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df.mutate(abs_x = tp.abs('x')) >>> df.mutate(abs_x = tp.abs(col('x')))
- tidypolars_extra.stats.cor(x, y, method='pearson')[source]¶
Find the correlation of two columns
- Parameters:
x (Expr) – A column
y (Expr) – A column
method (str) – Type of correlation to find. Either ‘pearson’ or ‘spearman’.
Examples
>>> df.summarize(cor = tp.cor(col('x'), col('y')))
- tidypolars_extra.stats.count(x)[source]¶
Number of observations in each group
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df.summarize(count = tp.count(col('x')))
- tidypolars_extra.stats.cov(x, y)[source]¶
Find the covariance of two columns
- Parameters:
x (Expr) – A column
y (Expr) – A column
Examples
>>> df.summarize(cor = tp.cov(col('x'), col('y')))
- tidypolars_extra.stats.cume_dist(x)[source]¶
Compute cumulative distribution (proportion of values <= current value)
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df.mutate(cd = tp.cume_dist('x'))
- tidypolars_extra.stats.cummax(x)[source]¶
Cumulative maximum
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df.mutate(cmax = tp.cummax('x'))
- tidypolars_extra.stats.cummin(x)[source]¶
Cumulative minimum
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df.mutate(cmin = tp.cummin('x'))
- tidypolars_extra.stats.cumprod(x)[source]¶
Cumulative product
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df.mutate(cprod = tp.cumprod('x'))
- tidypolars_extra.stats.cumsum(x)[source]¶
Cumulative sum
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df.mutate(csum = tp.cumsum('x'))
- tidypolars_extra.stats.first(x)[source]¶
Get first value
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df.summarize(first_x = tp.first('x')) >>> df.summarize(first_x = tp.first(col('x')))
- tidypolars_extra.stats.floor(x)[source]¶
Round numbers down to the lower integer
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df.mutate(floor_x = tp.floor(col('x')))
- tidypolars_extra.stats.iqr(x)[source]¶
Compute the interquartile range (Q3 - Q1)
Use in summarize() context only. Not suitable for mutate().
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df.summarize(iqr_val = tp.iqr('x'))
- tidypolars_extra.stats.last(x)[source]¶
Get last value
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df.summarize(last_x = tp.last('x')) >>> df.summarize(last_x = tp.last(col('x')))
- tidypolars_extra.stats.length(x)[source]¶
Number of observations in each group.
Alias for
count().- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df.summarize(length = tp.length(col('x')))
- tidypolars_extra.stats.log(x)[source]¶
Compute the natural logarithm of a column
- Parameters:
x (Expr) – Column to operate on
Examples
>>> df.mutate(log = tp.log('x'))
- tidypolars_extra.stats.log10(x)[source]¶
Compute the base 10 logarithm of a column
- Parameters:
x (Expr) – Column to operate on
Examples
>>> df.mutate(log = tp.log10('x'))
- tidypolars_extra.stats.mad(x)[source]¶
Compute the median absolute deviation
Use in summarize() context only. Not suitable for mutate().
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df.summarize(mad_val = tp.mad('x'))
- tidypolars_extra.stats.max(x)[source]¶
Get column max
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df.summarize(max_x = tp.max('x')) >>> df.summarize(max_x = tp.max(col('x')))
- tidypolars_extra.stats.mean(x)[source]¶
Get column mean
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df.summarize(mean_x = tp.mean('x')) >>> df.summarize(mean_x = tp.mean(col('x')))
- tidypolars_extra.stats.median(x)[source]¶
Get column median
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df.summarize(median_x = tp.median('x')) >>> df.summarize(median_x = tp.median(col('x')))
- tidypolars_extra.stats.min(x)[source]¶
Get column minimum
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df.summarize(min_x = tp.min('x')) >>> df.summarize(min_x = tp.min(col('x')))
- tidypolars_extra.stats.mode(x)[source]¶
Compute the statistical mode (most frequent value)
Returns the first mode if there are ties (non-deterministic for ties). Use in summarize() context.
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df.summarize(m = tp.mode('x'))
- tidypolars_extra.stats.n()[source]¶
Number of observations in each group
Examples
>>> df.summarize(count = tp.n())
- tidypolars_extra.stats.ntile(x, n)[source]¶
Divide values into n roughly equal groups
- Parameters:
x (Expr, Series) – Column to operate on
n (int) – Number of groups
Examples
>>> df.mutate(quartile = tp.ntile('x', 4))
- tidypolars_extra.stats.percent_rank(x)[source]¶
Compute percent rank (values between 0 and 1)
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df.mutate(prank = tp.percent_rank('x'))
- tidypolars_extra.stats.quantile(x, quantile=0.5)[source]¶
Get number of distinct values in a column
- Parameters:
x (Expr, Series) – Column to operate on
quantile (float) – Quantile to return
Examples
>>> df.summarize(quantile_x = tp.quantile('x', .25))
- tidypolars_extra.stats.rank(x, method='dense')[source]¶
Assigns a minimum rank to each element in the input list, handling ties by assigning the same (lowest) rank to tied values. The next distinct value’s rank is increased by the number of tied values before it.
- Parameters:
x (str) – Column to operate on
method (str) – dense (default): Assigns ranks in a consecutive manner, without gaps, even for ties. average : Assigns the average rank to tied values. min: Assigns the minimum rank to tied values. max: Assigns the maximum rank to tied values. ordinal: Assigns a distinct rank to each value based on its order of appearance.
- Returns:
A list of ranks corresponding to the elements of x.
- Return type:
list of int
Examples
>>> rank([10, 20, 20, 30]) [1, 2, 2, 3] >>> rank([3, 1, 2]) [3, 1, 2] # since sorted order is 1,2,3 => ranks are assigned as per their order >>> rank(["b", "a", "a", "c"]) [2, 1, 1, 3]
- tidypolars_extra.stats.scale(x)[source]¶
Standardize the input by scaling it to a mean of 0 and a standard deviation of 1.
- Parameters:
x (Expr) – Column to operate on
- Returns:
The standardized version of the input data.
- Return type:
array-like
- tidypolars_extra.stats.sd(x)[source]¶
Get column standard deviation
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df.summarize(sd_x = tp.sd('x')) >>> df.summarize(sd_x = tp.sd(col('x')))
- tidypolars_extra.stats.sqrt(x)[source]¶
Get column square root
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df.mutate(sqrt_x = tp.sqrt('x'))
- tidypolars_extra.stats.sum(x)[source]¶
Get column sum
- Parameters:
x (Expr, Series) – Column to operate on
Examples
>>> df.summarize(sum_x = tp.sum('x')) >>> df.summarize(sum_x = tp.sum(col('x')))
- tidypolars_extra.stats.var(x)[source]¶
Get column variance
- Parameters:
x (Expr) – Column to operate on
Examples
>>> df.summarize(sum_x = tp.var('x')) >>> df.summarize(sum_x = tp.var(col('x')))