spamosaic.preprocessing.lsiTransformer

class spamosaic.preprocessing.lsiTransformer(n_components: int = 20, drop_first=True, use_highly_variable=None, log=True, norm=True, z_score=True, tfidf=True, svd=True, use_counts=False, pcaAlgo='arpack')[source]

Latent Semantic Indexing (LSI) pipeline for dimensionality reduction.

Parameters:
  • n_components (int) – Number of SVD components.

  • drop_first (bool) – Whether to drop the first principal component.

  • use_highly_variable (bool or None) – Whether to subset to highly variable features.

  • log (bool) – Whether to apply log1p transformation.

  • norm (bool) – Whether to normalize features.

  • z_score (bool) – Whether to z-score features.

  • tfidf (bool) – Whether to apply TF-IDF normalization.

  • svd (bool) – Whether to apply SVD transformation.

  • use_counts (bool) – Use .layers['counts'] instead of .X for data.

  • pcaAlgo (str) – SVD backend (e.g., 'arpack').