drvi.utils.metrics.spearman_correlataion_score#
- drvi.utils.metrics.spearman_correlataion_score(all_vars_continues, gt_cat_series=None, gt_one_hot=None)[source]#
Compute Spearman correlation scores between continuous variables and categories.
This function computes the absolute Spearman correlation coefficients between each continuous variable and each categorical ground truth variable (encoded as one-hot). Spearman correlation measures monotonic relationships and is robust to outliers.
- Parameters:
all_vars_continues (
ndarray) – Matrix of continuous variables with shape (n_samples, n_variables). Each column represents a different continuous variable.gt_cat_series (default:
None) – Categorical series with ground truth labels.gt_one_hot (default:
None) – One-hot encoded ground truth matrix with shape (n_samples, n_categories).
- Return type:
- Returns:
np.ndarray Absolute Spearman correlation matrix with shape (n_variables, n_categories). Element [i, j] represents the absolute Spearman correlation between variable i and category j. Scores range from 0 to 1.
Notes
Spearman correlation measures the strength and direction of monotonic relationships between variables. This function uses absolute values to focus on the strength of relationships regardless of direction.
Limitations: Spearman correlaton is not suitable for discrete targets. More info: https://www.biorxiv.org/content/10.1101/2024.11.06.622266v1.full.pdf lines 985 to 989
Examples
>>> import numpy as np >>> import pandas as pd >>> # Simple example: 3 variables, 2 categories >>> all_vars = np.array([[1.0, 2.0, 0.5], [2.0, 1.0, 0.8], [3.0, 0.5, 1.2], [0.5, 3.0, 0.9]]) >>> gt_series = pd.Series(["A", "A", "B", "B"], dtype="category") >>> scores = spearman_correlataion_score(all_vars, gt_cat_series=gt_series) >>> print(scores.shape) # (3, 2) >>> print(scores)