spamosaic.utils

Utility functions for SpaMosaic.

Includes configuration loader, batching helpers, nearest-neighbor wrappers, clustering/UMAP utilities, and small AnnData helpers.

class spamosaic.utils.Config(dictionary)[source]

Bases: object

A wrapper that recursively converts a nested dictionary to an object with attribute-style access.

Parameters:

dictionary (dict) – Input configuration dictionary.

__dict__

Internal storage for nested configuration items, enabling attribute access.

Type:

dict

spamosaic.utils.check_batch_empty(modBatch_dict, verbose=True)[source]

Check that each batch contains at least one measured modality.

Parameters:
  • modBatch_dict (dict) – Mapping {modality_name -> list[AnnData or None]}.

  • verbose (bool, default=True) – Whether to print batch composition.

Returns:

For each batch index, a list of modality indices present in that batch.

Return type:

list of list of int

Raises:

ValueError – If any batch is completely empty.

spamosaic.utils.clustering(adata, n_cluster, used_obsm, algo='kmeans', key='tmp_clust')[source]

Cluster cells using k-means or Mclust and store labels in .obs.

Parameters:
  • adata (AnnData) – Input data with an embedding in .obsm[used_obsm].

  • n_cluster (int) – Number of clusters.

  • used_obsm (str) – Key in .obsm to cluster on.

  • algo ({'kmeans', 'mclust'}, default='kmeans') – Clustering algorithm to use.

  • key (str, default='tmp_clust') – Column name in .obs to store cluster labels.

Returns:

Annotated object with cluster assignments in .obs[key].

Return type:

AnnData

spamosaic.utils.dict_map(_dict, _list)[source]

Map a list of keys using a dictionary.

Parameters:
  • _dict (dict) – Mapping dictionary.

  • _list (list) – List of keys to map.

Returns:

List of mapped values.

Return type:

list

spamosaic.utils.flip_axis(ads, axis=0)[source]

Flip the spatial coordinates of AnnData objects along a specified axis.

Parameters:
  • ads (list of AnnData) – Data objects to modify (in-place).

  • axis ({0, 1}, default=0) – Axis to flip (0 for x, 1 for y).

Return type:

None

spamosaic.utils.get_barc2batch(modBatch_dict)[source]

Create a mapping from cell barcodes to their batch indices.

Parameters:

modBatch_dict (dict) – Mapping {modality_name -> list[AnnData or None]}.

Returns:

Dictionary {barcode -> batch_index}.

Return type:

dict

spamosaic.utils.get_umap(ad, use_reps=[])[source]

Compute UMAP embeddings for specified representations and store them in .obsm.

Parameters:
  • ad (AnnData) – Input object.

  • use_reps (list of str, default=[]) – Keys in .obsm to compute UMAP for (e.g., ['X_pca']).

Returns:

The same object with additional .obsm[f'{rep}_umap'] for each rep.

Return type:

AnnData

spamosaic.utils.load_config(filepath)[source]

Load a YAML configuration file into a Config object.

Parameters:

filepath (str) – Path to the YAML configuration file.

Returns:

Parsed configuration object.

Return type:

Config

spamosaic.utils.mclust_R(adata, num_cluster, modelNames='EEE', used_obsm='emb', random_seed=2020)[source]

Run R’s Mclust (via rpy2) on an embedding to obtain soft clustering.

Parameters:
  • adata (AnnData) – AnnData with embedding stored in .obsm.

  • num_cluster (int) – Desired number of clusters.

  • modelNames (str, default='EEE') – Covariance structure model in Mclust.

  • used_obsm (str, default='emb') – Key in .obsm to use for clustering.

  • random_seed (int, default=2020) – Random seed for both NumPy and R.

Returns:

Annotated object with a categorical column obs['mclust'].

Return type:

AnnData

spamosaic.utils.nn_approx(ds1, ds2, norm=True, knn=10, metric='manhattan', n_trees=10, include_distances=False)[source]

Perform approximate nearest-neighbor search using Annoy.

Parameters:
  • ds1 (np.ndarray) – Query data of shape (N1, D).

  • ds2 (np.ndarray) – Reference data of shape (N2, D).

  • norm (bool, default=True) – Whether to L2-normalize ds1 and ds2 before indexing/search.

  • knn (int, default=10) – Number of nearest neighbors to retrieve per query.

  • metric (str, default='manhattan') – Distance metric for Annoy (e.g., 'manhattan', 'euclidean').

  • n_trees (int, default=10) – Number of trees in the Annoy index (trade-off between speed/accuracy).

  • include_distances (bool, default=False) – If True, also return distances.

Returns:

If include_distances is False, returns indices array of shape (N1, knn). Otherwise returns (indices, distances) with the same shape.

Return type:

np.ndarray or tuple of (np.ndarray, np.ndarray)

spamosaic.utils.plot_basis(ad, basis, color, **kwargs)[source]

Wrapper around scanpy.pl.embedding with warning suppression.

Parameters:
  • ad (AnnData) – Annotated data object.

  • basis (str) – Name of the embedding basis (e.g., 'umap' or 'spatial').

  • color (str) – Column in .obs to color by.

  • **kwargs – Additional keyword arguments passed to scanpy.pl.embedding.

Return type:

None

spamosaic.utils.reorder(ad1, ad2)[source]

Align and reorder two AnnData objects to their shared barcodes.

Parameters:
  • ad1 (AnnData) – First object.

  • ad2 (AnnData) – Second object.

Returns:

Views of ad1 and ad2 containing only shared barcodes, with matching order.

Return type:

tuple of (AnnData, AnnData)

spamosaic.utils.split_adata_ob(ads, ad_ref, ob='obs', key='emb')[source]

Split a merged AnnData object’s observations/embeddings back to per-batch objects.

Parameters:
  • ads (list of AnnData) – Target AnnData objects to receive splits.

  • ad_ref (AnnData) – Source AnnData containing concatenated .obs or .obsm.

  • ob ({'obs', 'obsm'}, default='obs') – Which attribute to split.

  • key (str, default='emb') – Key in .obs or .obsm to split and assign.

Return type:

None

Functions

spamosaic.utils.check_batch_empty

Check that each batch contains at least one measured modality.

spamosaic.utils.clustering

Cluster cells using k-means or Mclust and store labels in .obs.

spamosaic.utils.dict_map

Map a list of keys using a dictionary.

spamosaic.utils.flip_axis

Flip the spatial coordinates of AnnData objects along a specified axis.

spamosaic.utils.get_barc2batch

Create a mapping from cell barcodes to their batch indices.

spamosaic.utils.get_umap

Compute UMAP embeddings for specified representations and store them in .obsm.

spamosaic.utils.load_config

Load a YAML configuration file into a Config object.

spamosaic.utils.mclust_R

Run R's Mclust (via rpy2) on an embedding to obtain soft clustering.

spamosaic.utils.nn_approx

Perform approximate nearest-neighbor search using Annoy.

spamosaic.utils.plot_basis

Wrapper around scanpy.pl.embedding with warning suppression.

spamosaic.utils.reorder

Align and reorder two AnnData objects to their shared barcodes.

spamosaic.utils.split_adata_ob

Split a merged AnnData object's observations/embeddings back to per-batch objects.

Classes

spamosaic.utils.Config

A wrapper that recursively converts a nested dictionary to an object with attribute-style access.