spamosaic.framework

class spamosaic.framework.SpaMosaic(modBatch_dict={}, input_key='dimred_bc', mnn_rep_key=None, batch_key='batch', radius_cutoff=2000, intra_knns=10, inter_knn_base=10, smooth_input=False, smooth_L=1, inter_auto_knn=False, inter_auto_thr=0.8, rmv_outlier=False, contamination='auto', w_g=0.8, log_dir=None, seed=1234, num_workers=6, device='cuda:0')[source]

Bases: object

SpaMosaic: A modular framework for multi-modal spatial omics integration.

This class manages data pre-processing, intra- and inter-batch graph construction, model initialization and training, feature alignment, and imputation across multiple omics modalities (e.g., RNA, ADT, ATAC) in spatial transcriptomics.

Parameters:
  • modBatch_dict (dict) – Dictionary mapping modality name (e.g., ‘rna’, ‘adt’) to a list of AnnData objects (batches).

  • input_key (str) – Key in obsm where input features are stored (e.g., ‘dimred_bc’).

  • mnn_rep_key (str, optional) – Key for representation used in MNN search. If None, defaults to input_key.

  • batch_key (str) – Column name in obs denoting batch identity.

  • radius_cutoff (int) – Radius threshold to construct spatial neighbor graph.

  • intra_knns (int or list of int) – Number of neighbors in intra-batch graph (can be int or per-batch list).

  • inter_knn_base (int) – Base KNN size for inter-batch MNN search.

  • smooth_input (bool) – Whether to apply GCN-based input feature smoothing.

  • smooth_L (int) – Number of GCN layers used in smoothing.

  • inter_auto_knn (bool) – Whether to adapt KNN size based on batch size ratio.

  • inter_auto_thr (float) – Size ratio threshold to apply adaptive KNN.

  • rmv_outlier (bool) – Whether to remove outlier MNN pairs.

  • contamination (str or float) – Contamination level for outlier detection (used in IsolationForest).

  • w_g (float) – Weight for inter-batch expression edges in the merged graph.

  • log_dir (str, optional) – Directory for saving logs or results.

  • seed (int) – Random seed.

  • num_workers (int) – Number of workers used for computation.

  • device (str) – Device to use, e.g., ‘cuda:0’ or ‘cpu’.

apply_smoothing(modBatch_dict)[source]

Apply feature smoothing using LGCN on intra-batch graphs.

Parameters:

modBatch_dict (dict) – Dictionary mapping modality to list of AnnData objects.

check_integrity()[source]

Check if all modalities across batches form a connected integration graph.

Raises:

RuntimeError – If the graph of shared modalities across batches is not fully connected.

impute(modBatch_dict, emb_key='emb', layer_key='counts', imp_knn=10)[source]

Impute missing modalities using aligned embedding space and KNN.

Parameters:
  • modBatch_dict (dict) – Input dictionary of modalities and batches.

  • emb_key (str) – Key where latent embeddings are stored.

  • layer_key (str) – Which layer to impute (e.g., ‘counts’).

  • imp_knn (int) – Number of neighbors to use in KNN-based imputation.

Returns:

  • dict

  • Imputed data dictionary ({modality -> list of arrays (or None)}.)

infer_emb(modBatch_dict, emb_key='emb', final_latent_key='merged_emb', cat=False)[source]

Infer latent embeddings for each cell and return merged AnnData list.

Parameters:
  • modBatch_dict (dict) – Original input dictionary of modalities and batches.

  • emb_key (str) – Key to store intermediate embeddings.

  • final_latent_key (str) – Key to store final merged embedding in returned AnnData.

  • cat (bool) – Whether to concatenate (True) or average (False) modalities.

Returns:

Reconstructed AnnData objects with merged embeddings.

Return type:

list of AnnData

prepare_inputs(modBatch_dict)[source]

Merge AnnData objects and construct final PyTorch graph inputs.

Includes: - Concatenating features and adjacency matrices. - Adding intra- and inter-batch edges. - Computing edge types (same-batch or cross-batch).

prepare_inter_graphs(modBatch_dict)[source]

Build mutual nearest neighbor (MNN) graphs between batches for each modality.

This includes: - Identifying bridge and non-bridge batches. - Computing MNN pairs within each modality. - Optionally filtering outliers.

prepare_intra_graphs(modBatch_dict)[source]

Build spatial neighbor graphs for each modality across batches.

Parameters:

modBatch_dict (dict) – Dictionary mapping modality to list of AnnData objects.

prepare_net(net)[source]

Instantiate the architecture for each modality based on config.

Parameters:

net (str) – Name of model architecture (must match a config YAML file).

Returns:

Mapping from modality name to initialized PyTorch model.

Return type:

dict

train(net, lr, use_mini_thr=8000, mini_batch_size=1024, loss_type='adapted', T=0.01, bias=0, n_epochs=100, w_rec_g=0.0)[source]

Train the SpaMosaic model using contrastive and reconstruction losses.

Parameters:
  • net (str) – Architecture name.

  • lr (float) – Learning rate.

  • use_mini_thr (int) – Threshold above which mini-batch training is used.

  • mini_batch_size (int) – Size of mini-batches if needed.

  • loss_type (str) – Type of loss (‘adapted’ or ‘ce’).

  • T (float) – Temperature for contrastive loss.

  • bias (float) – Bias term in adapted contrastive loss.

  • n_epochs (int) – Number of training epochs.

  • w_rec_g (float) – Weight for reconstruction loss.

Classes

spamosaic.framework.SpaMosaic

SpaMosaic: A modular framework for multi-modal spatial omics integration.