drvi.utils.metrics.spearman_correlataion_score

drvi.utils.metrics.spearman_correlataion_score#

drvi.utils.metrics.spearman_correlataion_score(all_vars_continues, gt_cat_series=None, gt_one_hot=None)[source]#

Compute Spearman correlation scores between continuous variables and categories.

This function computes the absolute Spearman correlation coefficients between each continuous variable and each categorical ground truth variable (encoded as one-hot). Spearman correlation measures monotonic relationships and is robust to outliers.

Parameters:
  • all_vars_continues (ndarray) – Matrix of continuous variables with shape (n_samples, n_variables). Each column represents a different continuous variable.

  • gt_cat_series (default: None) – Categorical series with ground truth labels.

  • gt_one_hot (default: None) – One-hot encoded ground truth matrix with shape (n_samples, n_categories).

Return type:

ndarray

Returns:

np.ndarray Absolute Spearman correlation matrix with shape (n_variables, n_categories). Element [i, j] represents the absolute Spearman correlation between variable i and category j. Scores range from 0 to 1.

Notes

Spearman correlation measures the strength and direction of monotonic relationships between variables. This function uses absolute values to focus on the strength of relationships regardless of direction.

Limitations: Spearman correlaton is not suitable for discrete targets. More info: https://www.biorxiv.org/content/10.1101/2024.11.06.622266v1.full.pdf lines 985 to 989

Examples

>>> import numpy as np
>>> import pandas as pd
>>> # Simple example: 3 variables, 2 categories
>>> all_vars = np.array([[1.0, 2.0, 0.5], [2.0, 1.0, 0.8], [3.0, 0.5, 1.2], [0.5, 3.0, 0.9]])
>>> gt_series = pd.Series(["A", "A", "B", "B"], dtype="category")
>>> scores = spearman_correlataion_score(all_vars, gt_cat_series=gt_series)
>>> print(scores.shape)  # (3, 2)
>>> print(scores)