drvi.utils.plotting.plot_relevant_genes_on_umap

drvi.utils.plotting.plot_relevant_genes_on_umap#

drvi.utils.plotting.plot_relevant_genes_on_umap(adata, embed, traverse_adata, traverse_adata_key, layer=None, title_col='title', order_col='order', gene_symbols=None, score_threshold=0.0, dim_subset=None, n_top_genes=10, max_cells_to_plot=None, show=True, **kwargs)[source]#

Plot relevant genes on UMAP embedding.

This function creates UMAP visualizations showing how genes associated with specific latent dimensions are expressed across cells. The latent dimension values are color-coded by the latent dimension values. The top genes are color-coded by the gene expression.

Parameters:
  • adata (AnnData) – AnnData object containing single-cell data with gene expression. This is the original data used for training the model.

  • embed (AnnData) – AnnData object containing latent representations and dimension metadata. Must have UMAP coordinates in embed.obsm["X_umap"] and dimension information in .var columns.

  • traverse_adata (AnnData) – AnnData object containing differential analysis results from calculate_differential_vars. Must contain differential effect data for the specified key.

  • traverse_adata_key (str) – Key prefix for the differential variables in traverse_adata.varm. Should correspond to a key used in find_differential_effects or calculate_differential_vars (e.g., “max_possible”, “min_possible”, “combined_score”).

  • layer (str | None (default: None)) – Layer name in adata to use for gene expression visualization. If None, uses .X. Common options include “counts”, “logcounts”, etc..

  • title_col (str (default: 'title')) – Column name in embed.var that contains dimension titles. These titles will be used to match dimensions between objects.

  • order_col (str (default: 'order')) – Column name in embed.var that specifies the order of dimensions. Results will be sorted by this column. Ignored if dim_subset is provided.

  • gene_symbols (str | None (default: None)) – Column name in adata.var that contains gene symbols. If provided, gene symbols will be used instead of gene indices.

  • score_threshold (float (default: 0.0)) – Threshold value for gene scores. Only genes with scores above this threshold will be visualized.

  • dim_subset (Sequence[str] | None (default: None)) – List of dimensions to plot. If None, all dimensions with significant effects are plotted.

  • n_top_genes (int (default: 10)) – Number of top genes to visualize for each dimension.

  • max_cells_to_plot (int | None (default: None)) – Maximum number of cells to include in the plot. If None, all cells are plotted. Useful for large datasets to improve performance and reduce memory usage.

  • show (bool (default: True)) – Whether to display the plot. If False, returns a generator of Ax or Axes objects.

  • **kwargs – Additional keyword arguments passed to sc.pl.embedding.

Returns:

None or a generator yielding Ax or Axes objects if show=False, otherwise None. Displays the plots directly if show=True, otherwise iteratively yields Ax or Axes objects.

Raises:
  • KeyError – If required data is missing from any of the AnnData objects.

  • ValueError – If the specified key doesn’t exist in traverse_adata.

Notes

The function performs the following steps: 1. Extracts top differential variables using iterate_on_top_differential_vars 2. For each dimension, creates two visualizations (I) UMAP of Latent dimension values across cells (II) UMAPs of Expression patterns of top genes for that dimension

Interpretation:

  • Latent dimension plots: Show how the dimension varies across cell types

  • Gene expression plots: Show expression patterns of dimension-specific genes

  • Color intensity: Indicates magnitude of values/expression

Common Use Cases:

  • Biological validation: Verify that latent dimensions capture meaningful biology

  • Gene discovery: Identify genes associated with specific processes

  • Model interpretation: Understand what biological processes each dimension represents

  • Quality assessment: Evaluate the biological relevance of the model

Examples

>>> # Basic UMAP visualization with combined scores
>>> plot_relevant_genes_on_umap(adata, embed, traverse_adata, "combined_score")
>>> # With custom parameters
>>> plot_relevant_genes_on_umap(
...     adata,
...     embed,
...     traverse_adata,
...     "max_possible",
...     layer="logcounts",
...     gene_symbols="gene_symbol",
...     score_threshold=1.0,
...     n_top_genes=5,
...     max_cells_to_plot=5000,
... )
>>> # Subset of dimensions
>>> plot_relevant_genes_on_umap(
...     adata, embed, traverse_adata, "combined_score", dim_subset=["DR 5+", "DR 12+", "DR 14+"]
... )