drvi.utils.metrics.local_mutual_info_score

drvi.utils.metrics.local_mutual_info_score#

drvi.utils.metrics.local_mutual_info_score(all_vars_continues, gt_cat_series=None, gt_one_hot=None)[source]#

Compute local mutual information scores for all variables and categories.

This function calculates the scaled mutual information between each continuous variable and each categorical ground truth variable. The scores are scaled by the entropy of each ground truth category.

Parameters:
  • all_vars_continues (ndarray) – Matrix of continuous variables with shape (n_samples, n_variables). Each column represents a different continuous variable.

  • gt_cat_series (default: None) – Categorical series with ground truth labels.

  • gt_one_hot (default: None) – One-hot encoded ground truth matrix with shape (n_samples, n_categories).

Return type:

ndarray

Returns:

np.ndarray Mutual information score matrix with shape (n_variables, n_categories). Element [i, j] represents the scaled mutual information between variable i and category j. Scores range from 0 to 1.

Notes

This function calculates the scaled mutual information between each continuous variable and each categorical ground truth variable. The scores are scaled by the entropy of each ground truth category. This metric is not working as expected. More info: scikit-learn/scikit-learn#30772

Examples

>>> import numpy as np
>>> import pandas as pd
>>> # Simple example: 3 variables, 2 categories
>>> all_vars = np.array([[1.0, 2.0, 0.5], [2.0, 1.0, 0.8], [3.0, 0.5, 1.2], [0.5, 3.0, 0.9]])
>>> gt_series = pd.Series(["A", "A", "B", "B"], dtype="category")
>>> scores = local_mutual_info_score(all_vars, gt_cat_series=gt_series)
>>> print(scores.shape)  # (3, 2)
>>> print(scores)