drvi.utils.plotting.plot_relevant_genes_on_umap#
- drvi.utils.plotting.plot_relevant_genes_on_umap(adata, embed, traverse_adata, traverse_adata_key, layer=None, title_col='title', order_col='order', gene_symbols=None, score_threshold=0.0, dim_subset=None, n_top_genes=10, max_cells_to_plot=None, show=True, **kwargs)[source]#
Plot relevant genes on UMAP embedding.
This function creates UMAP visualizations showing how genes associated with specific latent dimensions are expressed across cells. The latent dimension values are color-coded by the latent dimension values. The top genes are color-coded by the gene expression.
- Parameters:
adata (
AnnData) – AnnData object containing single-cell data with gene expression. This is the original data used for training the model.embed (
AnnData) – AnnData object containing latent representations and dimension metadata. Must have UMAP coordinates inembed.obsm["X_umap"]and dimension information in.varcolumns.traverse_adata (
AnnData) – AnnData object containing differential analysis results fromcalculate_differential_vars. Must contain differential effect data for the specified key.traverse_adata_key (
str) – Key prefix for the differential variables intraverse_adata.varm. Should correspond to a key used infind_differential_effectsorcalculate_differential_vars(e.g., “max_possible”, “min_possible”, “combined_score”).layer (
str|None(default:None)) – Layer name inadatato use for gene expression visualization. If None, uses.X. Common options include “counts”, “logcounts”, etc..title_col (
str(default:'title')) – Column name inembed.varthat contains dimension titles. These titles will be used to match dimensions between objects.order_col (
str(default:'order')) – Column name inembed.varthat specifies the order of dimensions. Results will be sorted by this column. Ignored ifdim_subsetis provided.gene_symbols (
str|None(default:None)) – Column name inadata.varthat contains gene symbols. If provided, gene symbols will be used instead of gene indices.score_threshold (
float(default:0.0)) – Threshold value for gene scores. Only genes with scores above this threshold will be visualized.dim_subset (
Sequence[str] |None(default:None)) – List of dimensions to plot. If None, all dimensions with significant effects are plotted.n_top_genes (
int(default:10)) – Number of top genes to visualize for each dimension.max_cells_to_plot (
int|None(default:None)) – Maximum number of cells to include in the plot. If None, all cells are plotted. Useful for large datasets to improve performance and reduce memory usage.show (
bool(default:True)) – Whether to display the plot. If False, returns a generator of Ax or Axes objects.**kwargs – Additional keyword arguments passed to
sc.pl.embedding.
- Returns:
None or a generator yielding Ax or Axes objects if show=False, otherwise None. Displays the plots directly if show=True, otherwise iteratively yields Ax or Axes objects.
- Raises:
KeyError – If required data is missing from any of the AnnData objects.
ValueError – If the specified key doesn’t exist in traverse_adata.
Notes
The function performs the following steps: 1. Extracts top differential variables using
iterate_on_top_differential_vars2. For each dimension, creates two visualizations (I) UMAP of Latent dimension values across cells (II) UMAPs of Expression patterns of top genes for that dimensionInterpretation:
Latent dimension plots: Show how the dimension varies across cell types
Gene expression plots: Show expression patterns of dimension-specific genes
Color intensity: Indicates magnitude of values/expression
Common Use Cases:
Biological validation: Verify that latent dimensions capture meaningful biology
Gene discovery: Identify genes associated with specific processes
Model interpretation: Understand what biological processes each dimension represents
Quality assessment: Evaluate the biological relevance of the model
Examples
>>> # Basic UMAP visualization with combined scores >>> plot_relevant_genes_on_umap(adata, embed, traverse_adata, "combined_score") >>> # With custom parameters >>> plot_relevant_genes_on_umap( ... adata, ... embed, ... traverse_adata, ... "max_possible", ... layer="logcounts", ... gene_symbols="gene_symbol", ... score_threshold=1.0, ... n_top_genes=5, ... max_cells_to_plot=5000, ... ) >>> # Subset of dimensions >>> plot_relevant_genes_on_umap( ... adata, embed, traverse_adata, "combined_score", dim_subset=["DR 5+", "DR 12+", "DR 14+"] ... )