Usage guide¶
This page introduces the cycombinepy mental model and walks through the
recommended workflow. For fully runnable end-to-end examples, see the
two tutorial notebooks:
Main cyCombine workflow (Python port) — main batch-correction pipeline
Detecting batch effects (Python port) — diagnostic plots
Data conventions¶
cycombinepy operates on an
AnnData object:
Slot |
Meaning |
|---|---|
|
cells × markers expression (post-asinh) |
|
batch assignment (required) |
|
sample / patient id (optional) |
|
biological condition (optional covariate) |
|
reference sample carried across batches |
|
SOM cluster labels (set by |
|
corrected expression (set by |
All functions read from adata.X by default; pass layer=... to read or
write a named layer instead. Columns other than batch are optional but
enable additional features (covariate-aware correction, anchor-based
alignment, diagnostic MDS plots, etc.).
The pipeline in one call¶
The most common entry point is cycombinepy.batch_correct, which runs the
full pipeline in one call:
import cycombinepy as pc
from cycombinepy.correct import CORRECTED_LAYER
pc.batch_correct(
adata,
batch_key="batch",
xdim=8, ydim=8, # 64-node SOM
rlen=10,
norm_method="scale", # batch-wise z-score before clustering
covar="condition", # optional: preserve a biological covariate
seed=473,
)
# The corrected matrix is now in adata.layers["cycombine_corrected"]
# (constant CORRECTED_LAYER). adata.X is unchanged.
Under the hood this is equivalent to running three steps by hand:
pc.normalize(adata, method="scale", batch_key="batch")
pc.create_som(adata, xdim=8, ydim=8, rlen=10, seed=473,
label_key="cycombine_som")
# Restore the unnormalized view before per-cluster correction.
pc.correct_data(adata, label_key="cycombine_som",
batch_key="batch", covar="condition")
Use the modular form when you want to swap components (e.g. try
norm_method="rank"), reuse an existing clustering, or inspect the
intermediate state.
Loading data¶
From a directory of FCS files¶
from cycombinepy import io as pcio
adata = pcio.read_fcs_dir(
"data/",
metadata="metadata.csv",
filename_col="Filename",
batch_key="Batch",
sample_key="Patient",
condition_key="condition",
cofactor=5, # 5 CyTOF, 150 flow, 6000 spectral
derand=True,
)
read_fcs_dir wraps
pytometry /
readfcs and applies the
arcsinh transform in one step. Requires pip install "cycombinepy[io]".
From FCS files with no directory metadata¶
If you only have the FCS files and want to assign batches manually (this is the path used in the Main cyCombine workflow (Python port) tutorial):
import anndata as ad
import readfcs
b1 = readfcs.read("batch1.fcs"); b1.obs["batch"] = "batch1"
b2 = readfcs.read("batch2.fcs"); b2.obs["batch"] = "batch2"
adata = ad.concat([b1, b2], join="outer", index_unique="-")
pc.transform_asinh(adata, cofactor=5)
From an existing AnnData¶
Nothing to do — just set adata.obs["batch"].
Preprocessing¶
pc.transform_asinh(adata, cofactor=5, derand=True)
cofactor: marker-type-appropriate scale factor. Typical values: 5 (mass cytometry), 150 (conventional flow), 6000 (full-spectrum flow).derand=Trueapplies the cyCombine-style derandomization trick (ceil(x) - Uniform(0, 0.9999)) to break ties at low intensity.
Normalization¶
pc.normalize(adata, method="scale", batch_key="batch")
Available methods:
Method |
Notes |
|---|---|
|
Z-score per batch (default; matches cyCombine’s |
|
Percentile rank per batch (matches |
|
Centered log-ratio, centered on the arithmetic mean of |
|
Seurat-flavor CLR |
|
CLR centered on the median |
|
PCHIP monotone-spline quantile normalization |
Normalization is only used to make the SOM clustering resilient to batch shifts; the correction step itself runs on the unnormalized expression.
Clustering¶
pc.create_som(adata, xdim=8, ydim=8, rlen=10, seed=473)
Trains a FlowSOM on the normalized marker view and writes 1-indexed
integer cluster ids to adata.obs["cycombine_som"]. Pass
n_clusters=20 to metacluster the SOM nodes via consensus clustering.
Per-cluster ComBat correction¶
pc.correct_data(
adata,
label_key="cycombine_som",
batch_key="batch",
covar="condition", # optional
anchor="reference_id", # optional
ref_batch=None,
parametric=True,
)
Runs ComBat inside each SOM
cluster in isolation, capping the corrected values to the per-cluster
min/max of the input (matching R cyCombine lines 524–531). If a
cluster contains cells from only one batch — or if the covariate design
is confounded with the batch variable — the cluster is left unchanged.
Evaluating a correction¶
from cycombinepy.correct import CORRECTED_LAYER
emd_before = pc.compute_emd(adata, cell_key="cycombine_som", layer=None)
emd_after = pc.compute_emd(adata, cell_key="cycombine_som",
layer=CORRECTED_LAYER)
report = pc.evaluate_emd(emd_before, emd_after)
report.groupby("marker")["reduction_pct"].mean()
compute_emd uses
scipy.stats.wasserstein_distance
to compute per-(cluster, marker, batch-pair) 1-D Earth Mover’s Distance;
evaluate_emd merges uncorrected vs. corrected tables and reports
absolute and percent reduction. compute_mad / evaluate_mad provide
the same interface for Median Absolute Deviation.
Detecting batch effects¶
Before deciding to correct, run the diagnostic plots to confirm there is a batch effect worth correcting:
figs = pc.detect_batch_effect_express(
adata, batch_key="batch", sample_key="sample",
downsample=10000, out_dir="before/",
)
# Or the full report (adds UMAP + per-marker MAD):
figs = pc.detect_batch_effect(
adata, batch_key="batch", sample_key="sample",
downsample=10000, out_dir="before/", seed=434,
)
Both functions return a dict of matplotlib figures and optionally save
them as PNGs under out_dir. See the Detecting batch effects (Python port)
notebook for a walkthrough.
Plotting helpers¶
from cycombinepy import plotting as pcpl
from cycombinepy.correct import CORRECTED_LAYER
pcpl.plot_dimred(adata, kind="umap", color="batch",
layer=CORRECTED_LAYER)
pcpl.plot_density(adata, batch_key="batch", layer=CORRECTED_LAYER)
pcpl.plot_emd_heatmap(emd_df)
These are thin, opinionated wrappers around scanpy.pl.umap and
seaborn.kdeplot that handle the before/after layer logic. For fine
control, call scanpy/seaborn directly.