Jackstraw significance testing for RaJIVE joint loadings

Applies a permutation-based jackstraw test to determine which features in each data block have statistically significantly nonzero joint loadings from a Rajive decomposition.

Usage

jackstraw_rajive(
  ajive_output,
  blocks,
  alpha = 0.05,
  n_null = 10,
  correction = c("bonferroni", "BH", "none")
)

Arguments

ajive_output: List returned by Rajive.
blocks: List of data matrices (same list passed to Rajive).
alpha: Numeric scalar; desired significance level. Default 0.05.
n_null: Positive integer; number of null F-statistics generated per feature per joint component. Larger values give more stable p-values at the cost of computation time. Default 10; recommended 50–100 for publication-quality results.
correction: Character string controlling multiple-testing correction. One of "bonferroni" (default), "BH" (Benjamini–Hochberg FDR), or "none". "bonferroni" divides alpha by \(d_k \times \text{joint\_rank}\) for each block, matching the original Python implementation.

Value

An object of class "jackstraw_rajive": a named list with one element per block (block1, block2, ...). Each element is itself a list with one element per joint component (comp1, comp2, ...) containing:

f_obs: length-\(d_k\) numeric vector of observed F-statistics.
f_null: \(d_k \times\) n_null matrix of null F-statistics.
p_values: Empirical p-values (length \(d_k\)).
p_adj: Multiple-testing adjusted p-values (length \(d_k\)).
significant: Named logical vector (length \(d_k\)) indicating significance.
significant_vars: Integer indices (or column names when available) of significant features.

The object also carries attributes alpha, correction, joint_rank, and n_blocks.

Details

For each data block \(k\) and each joint component \(j\), the observed F-statistic for the regression feature ~ joint_score_j + 1 is compared to a null distribution generated by permuting randomly sampled feature values, thereby breaking the association with the joint scores. Empirical p-values are computed and optionally corrected for multiple testing.

References

Yang X, Hoadley KA, Hannig J, Marron JS (2021). Statistical inference for data integration. arXiv:2109.12272.

Chung NC, Storey JD (2015). Statistical significance of variables driving systematic variation in high-dimensional data. Bioinformatics, 31(4):545–554.

Examples

# \donttest{
set.seed(42)
n   <- 50
pks <- c(100, 80)
Y   <- ajive.data.sim(K = 2, rankJ = 2, rankA = c(5, 4), n = n,
                      pks = pks, dist.type = 1)
data.ajive           <- Y$sim_data
initial_signal_ranks <- c(5, 4)
ajive_result <- Rajive(data.ajive, initial_signal_ranks)
js <- jackstraw_rajive(ajive_result, data.ajive, alpha = 0.05, n_null = 10)
print(js)
#> JIVE Jackstraw Significance Test
#>   Joint rank: 1   Alpha: 0.05   Correction: bonferroni
#> 
#>   Block      Component    N features     N significant 
#>   ----------------------------------------------------
#>   block1     comp1        100            60            
#>   block2     comp1        80             51            
summary(js)
#>   block component n_features n_significant alpha correction
#>  block1     comp1        100            60  0.05 bonferroni
#>  block2     comp1         80            51  0.05 bonferroni
# }