spamosaic.MNN.nn_annoy

spamosaic.MNN.nn_annoy(ds1, ds2, names1, names2, norm=True, knn=20, metric='euclidean', n_trees=10, save_on_disk=False)[source]

Approximate nearest-neighbor search using Annoy index.

Parameters:
  • ds1 (np.ndarray) – Query dataset of shape (N1, D).

  • ds2 (np.ndarray) – Reference dataset of shape (N2, D).

  • names1 (list of str) – Identifiers for rows in ds1.

  • names2 (list of str) – Identifiers for rows in ds2.

  • norm (bool, default=True) – Whether to L2-normalize datasets before indexing/search.

  • knn (int) – Number of nearest neighbors to retrieve.

  • metric (str, default='euclidean') – Distance metric (e.g., 'euclidean', 'manhattan').

  • n_trees (int, default=10) – Number of trees to build in the Annoy index.

  • save_on_disk (bool, default=False) – If True, write the index to disk.

Returns:

Set of nearest-neighbor pairs.

Return type:

set[tuple[str, str]]