drvi.utils.tools.iterate_on_top_differential_vars

drvi.utils.tools.iterate_on_top_differential_vars#

drvi.utils.tools.iterate_on_top_differential_vars(traverse_adata, key, title_col='title', order_col='order', gene_symbols=None, score_threshold=0.0)[source]#

Create an iterator of top differential variables per latent dimension.

This function processes differential analysis results to create an organized list of top differentially expressed genes for each latent dimension, sorted by their effect scores and organized by dimension.

Parameters:
  • traverse_adata (AnnData) – AnnData object with differential analysis results from calculate_differential_vars. Must contain differential effect data for the specified key.

  • key (str) – Key prefix for the differential variables in traverse_adata. Should correspond to a key used in find_differential_effects or calculate_differential_vars. Common value: “combined_score”.

  • title_col (str (default: 'title')) – Column name in traverse_adata.obs containing dimension titles. These titles will be used in the output dimension names.

  • order_col (str (default: 'order')) – Column name in traverse_adata.obs containing dimension ordering.

  • gene_symbols (str | None (default: None)) – Column name in traverse_adata.var containing gene symbols. If None, uses the index of traverse_adata.var (usually gene IDs). Useful for converting between gene IDs and readable gene names.

  • score_threshold (float (default: 0.0)) – Minimum score threshold to include genes in the results. Only genes with scores above this threshold will be included.

Return type:

list[tuple[str, Series]]

Returns:

list[tuple[str, pd.Series]] List of tuples, where each tuple contains: - str: Dimension title with direction indicator (e.g., “Cell Cycle+”, “Cell Cycle-“) - pd.Series: Series of gene scores for that dimension/direction, sorted descending

The list is sorted by dimension order, with each dimension appearing at most twice (once for positive effects, once for negative effects).

Raises:
  • KeyError – If required columns or differential effect data are missing.

  • ValueError – If the specified key doesn’t exist in the AnnData object.

Notes

The function performs the following steps: 1. Extracts positive and negative differential effects for the specified key 2. Maps gene names to symbols if gene_symbols is provided 3. Filters genes by score threshold 4. Organizes results by dimension and direction (positive/negative) 5. Returns a list sorted by dimension order

Output Structure:

Each dimension appears twice in the results - once for positive effects and once for negative effects. The direction is indicated by “+” or “-” appended to the dimension title.

Only dimensions with at least one gene above the threshold are included.

Examples

>>> # Basic iteration over top differential variables
>>> top_vars = iterate_on_top_differential_vars(traverse_adata, "combined_score")
>>> for dim_title, gene_scores in top_vars:
...     print(f"{dim_title}: {len(gene_scores)} genes")
...     print(f"Top genes: {gene_scores.head().index.tolist()}")
>>> # With custom parameters and gene symbols
>>> top_vars = iterate_on_top_differential_vars(
...     traverse_adata, "max_possible", gene_symbols="gene_symbol", score_threshold=1.0
... )
>>> # Create a summary of results
>>> for dim_title, gene_scores in top_vars:
...     print(f"{dim_title}: {gene_scores.head().index.tolist()}")