drvi.utils.metrics.nn_alignment_score#
- drvi.utils.metrics.nn_alignment_score(all_vars_continues, gt_cat_series=None, gt_one_hot=None)[source]#
Compute nearest neighbor alignment scores for all continuous variables.
This function calculates how well the categorical ground truth labels align with the rightmost nearest neighbor of samples in each continuous variable.
- Parameters:
all_vars_continues (
ndarray) – Matrix of continuous variables with shape (n_samples, n_variables). Each column represents a different continuous variable.gt_cat_series (default:
None) – Categorical series with ground truth labels.gt_one_hot (default:
None) – One-hot encoded ground truth matrix with shape (n_samples, n_categories).
- Return type:
- Returns:
np.ndarray Alignment score matrix with shape (n_variables, n_categories). Element [i, j] represents the alignment score between variable i and category j. Higher values indicate better alignment.
Notes
The score are adjusted to account for the frequency of the categories.
Examples
>>> import numpy as np >>> import pandas as pd >>> # Simple example: 3 variables, 2 categories >>> all_vars = np.array([[1.0, 2.0, 0.5], [2.0, 1.0, 0.8], [3.0, 0.5, 1.2], [0.5, 3.0, 0.9]]) >>> gt_series = pd.Series(["A", "A", "B", "B"], dtype="category") >>> scores = nn_alignment_score(all_vars, gt_cat_series=gt_series) >>> print(scores.shape) # (3, 2) >>> print(scores)