spamosaic.framework
- class spamosaic.framework.SpaMosaic(modBatch_dict={}, input_key='dimred_bc', mnn_rep_key=None, batch_key='batch', radius_cutoff=2000, intra_knns=10, inter_knn_base=10, smooth_input=False, smooth_L=1, inter_auto_knn=False, inter_auto_thr=0.8, rmv_outlier=False, contamination='auto', w_g=0.8, log_dir=None, seed=1234, num_workers=6, device='cuda:0')[source]
Bases:
objectSpaMosaic: A modular framework for multi-modal spatial omics integration.
This class manages data pre-processing, intra- and inter-batch graph construction, model initialization and training, feature alignment, and imputation across multiple omics modalities (e.g., RNA, ADT, ATAC) in spatial transcriptomics.
- Parameters:
modBatch_dict (dict) – Dictionary mapping modality name (e.g., ‘rna’, ‘adt’) to a list of AnnData objects (batches).
input_key (str) – Key in obsm where input features are stored (e.g., ‘dimred_bc’).
mnn_rep_key (str, optional) – Key for representation used in MNN search. If None, defaults to input_key.
batch_key (str) – Column name in obs denoting batch identity.
radius_cutoff (int) – Radius threshold to construct spatial neighbor graph.
intra_knns (int or list of int) – Number of neighbors in intra-batch graph (can be int or per-batch list).
inter_knn_base (int) – Base KNN size for inter-batch MNN search.
smooth_input (bool) – Whether to apply GCN-based input feature smoothing.
smooth_L (int) – Number of GCN layers used in smoothing.
inter_auto_knn (bool) – Whether to adapt KNN size based on batch size ratio.
inter_auto_thr (float) – Size ratio threshold to apply adaptive KNN.
rmv_outlier (bool) – Whether to remove outlier MNN pairs.
contamination (str or float) – Contamination level for outlier detection (used in IsolationForest).
w_g (float) – Weight for inter-batch expression edges in the merged graph.
log_dir (str, optional) – Directory for saving logs or results.
seed (int) – Random seed.
num_workers (int) – Number of workers used for computation.
device (str) – Device to use, e.g., ‘cuda:0’ or ‘cpu’.
- apply_smoothing(modBatch_dict)[source]
Apply feature smoothing using LGCN on intra-batch graphs.
- Parameters:
modBatch_dict (dict) – Dictionary mapping modality to list of AnnData objects.
- check_integrity()[source]
Check if all modalities across batches form a connected integration graph.
- Raises:
RuntimeError – If the graph of shared modalities across batches is not fully connected.
- impute(modBatch_dict, emb_key='emb', layer_key='counts', imp_knn=10)[source]
Impute missing modalities using aligned embedding space and KNN.
- Parameters:
modBatch_dict (dict) – Input dictionary of modalities and batches.
emb_key (str) – Key where latent embeddings are stored.
layer_key (str) – Which layer to impute (e.g., ‘counts’).
imp_knn (int) – Number of neighbors to use in KNN-based imputation.
- Returns:
dict
Imputed data dictionary ({modality -> list of arrays (or None)}.)
- infer_emb(modBatch_dict, emb_key='emb', final_latent_key='merged_emb', cat=False)[source]
Infer latent embeddings for each cell and return merged AnnData list.
- Parameters:
modBatch_dict (dict) – Original input dictionary of modalities and batches.
emb_key (str) – Key to store intermediate embeddings.
final_latent_key (str) – Key to store final merged embedding in returned AnnData.
cat (bool) – Whether to concatenate (True) or average (False) modalities.
- Returns:
Reconstructed AnnData objects with merged embeddings.
- Return type:
list of AnnData
- prepare_inputs(modBatch_dict)[source]
Merge AnnData objects and construct final PyTorch graph inputs.
Includes: - Concatenating features and adjacency matrices. - Adding intra- and inter-batch edges. - Computing edge types (same-batch or cross-batch).
- prepare_inter_graphs(modBatch_dict)[source]
Build mutual nearest neighbor (MNN) graphs between batches for each modality.
This includes: - Identifying bridge and non-bridge batches. - Computing MNN pairs within each modality. - Optionally filtering outliers.
- prepare_intra_graphs(modBatch_dict)[source]
Build spatial neighbor graphs for each modality across batches.
- Parameters:
modBatch_dict (dict) – Dictionary mapping modality to list of AnnData objects.
- prepare_net(net)[source]
Instantiate the architecture for each modality based on config.
- Parameters:
net (str) – Name of model architecture (must match a config YAML file).
- Returns:
Mapping from modality name to initialized PyTorch model.
- Return type:
dict
- train(net, lr, use_mini_thr=8000, mini_batch_size=1024, loss_type='adapted', T=0.01, bias=0, n_epochs=100, w_rec_g=0.0)[source]
Train the SpaMosaic model using contrastive and reconstruction losses.
- Parameters:
net (str) – Architecture name.
lr (float) – Learning rate.
use_mini_thr (int) – Threshold above which mini-batch training is used.
mini_batch_size (int) – Size of mini-batches if needed.
loss_type (str) – Type of loss (‘adapted’ or ‘ce’).
T (float) – Temperature for contrastive loss.
bias (float) – Bias term in adapted contrastive loss.
n_epochs (int) – Number of training epochs.
w_rec_g (float) – Weight for reconstruction loss.
Classes
SpaMosaic: A modular framework for multi-modal spatial omics integration. |