spamosaic.MNN
- spamosaic.MNN.mnn(ds1, ds2, names1, names2, knn1=20, knn2=20, approx=True, metric='euclidean', way='hnsw', norm=False)[source]
Compute Mutual Nearest Neighbors (MNN) between two datasets.
- Parameters:
ds1 (np.ndarray) – First dataset (queries).
ds2 (np.ndarray) – Second dataset (references).
names1 (list of str) – Identifiers for ds1.
names2 (list of str) – Identifiers for ds2.
knn1 (int) – Number of neighbors for ds1 → ds2.
knn2 (int) – Number of neighbors for ds2 → ds1.
approx (bool, default=True) – Whether to use approximate neighbor search.
metric (str, default='euclidean') – Distance metric to use.
way (str, default='hnsw') – Method for approximate search: ‘hnsw’ or ‘annoy’.
norm (bool, default=False) – Whether to normalize data before Annoy search.
- Returns:
Set of mutual nearest neighbor pairs.
- Return type:
Set[Tuple[str, str]]
- spamosaic.MNN.nn(ds1, ds2, names1, names2, knn=50, metric_p=2)[source]
Exact nearest neighbor search using scikit-learn.
- Parameters:
ds1 (np.ndarray) – Query dataset.
ds2 (np.ndarray) – Reference dataset.
names1 (list of str) – Identifiers for ds1.
names2 (list of str) – Identifiers for ds2.
knn (int) – Number of nearest neighbors.
metric_p (int) – Minkowski distance parameter (e.g., 2 for Euclidean).
- Returns:
Set of matched nearest neighbor pairs.
- Return type:
Set[Tuple[str, str]]
- spamosaic.MNN.nn_annoy(ds1, ds2, names1, names2, norm=True, knn=20, metric='euclidean', n_trees=10, save_on_disk=False)[source]
Approximate nearest neighbor search using Annoy index.
- Parameters:
ds1 (np.ndarray) – Query dataset.
ds2 (np.ndarray) – Reference dataset.
names1 (list of str) – Identifiers for ds1.
names2 (list of str) – Identifiers for ds2.
norm (bool, default=True) – Whether to L2 normalize datasets before indexing.
knn (int) – Number of nearest neighbors to retrieve.
metric (str, default='euclidean') – Distance metric (‘euclidean’, ‘manhattan’, etc.).
n_trees (int, default=10) – Number of trees to build in Annoy index.
save_on_disk (bool, default=False) – Whether to write index to disk.
- Returns:
Set of nearest neighbor pairs.
- Return type:
Set[Tuple[str, str]]
- spamosaic.MNN.nn_approx(ds1, ds2, names1, names2, knn=50)[source]
Approximate nearest neighbor search using HNSW (hnswlib).
- Parameters:
ds1 (np.ndarray) – Query dataset (N1, D).
ds2 (np.ndarray) – Reference dataset (N2, D).
names1 (list of str) – Identifiers for rows in ds1.
names2 (list of str) – Identifiers for rows in ds2.
knn (int, default=50) – Number of nearest neighbors to find.
- Returns:
Set of matched (query_name, reference_name) pairs.
- Return type:
Set[Tuple[str, str]]
Functions
Compute Mutual Nearest Neighbors (MNN) between two datasets. |
|
Exact nearest neighbor search using scikit-learn. |
|
Approximate nearest neighbor search using Annoy index. |
|
Approximate nearest neighbor search using HNSW (hnswlib). |