napistu_torch.visualization.embeddings

Functions

plot_coordinates_with_masks(coordinates, ...)

Plot 2D coordinates with binary masks overlaid separately for each category.

napistu_torch.visualization.embeddings._prepare_filtering_mask(embeddings: torch.Tensor, filtering_mask: torch.Tensor | ndarray | None) → torch.Tensor

napistu_torch.visualization.embeddings.layout_tsne(embeddings: torch.Tensor | ndarray, filtering_mask: torch.Tensor | ndarray | None = None, n_components: int = 2, perplexity: int = 30, random_state: int = 42)

Layout embeddings in 2D using t-SNE with sensible defaults.

For large datasets (>10K), t-SNE becomes impractically slow. Use filtering_mask to subset the data or consider using UMAP instead.

Parameters:

embeddings (Union[torch.Tensor, np.ndarray]) – Precomputed node embeddings. Shape [num_nodes, embedding_dim].
filtering_mask (Union[torch.Tensor, np.ndarray] or None, optional) – Boolean mask of shape (num_nodes,) to select subset of embeddings. If None, uses all embeddings, by default None
n_components (int, optional) – Number of dimensions, by default 2
perplexity (int, optional) – Balance between local and global structure, by default 30. Reasonable range: 5-50 depending on dataset size
random_state (int, optional) – Random seed for reproducibility, by default 42

Returns:

Array of shape (n_selected, n_components) containing 2D embeddings

Return type:

numpy.ndarray

napistu_torch.visualization.embeddings.layout_umap(embeddings: torch.Tensor | ndarray, filtering_mask: torch.Tensor | ndarray | None = None, n_components: int = 2, n_neighbors: int = 15, random_state: int = 42)

Layout embeddings in 2D using UMAP with sensible defaults.

UMAP is generally preferred for embeddings: faster, more stable, and better at preserving both local and global structure. UMAP scales well to large datasets (100K+ samples).

Note: Requires umap-learn package. Install with:: pip install napistu-torch[viz]
or: pip install umap-learn

Parameters:

embeddings (Union[torch.Tensor, numpy.ndarray]) – Precomputed node embeddings. Shape [num_nodes, embedding_dim].
filtering_mask (Union[torch.Tensor, numpy.ndarray] or None, optional) – Boolean mask of shape (num_nodes,) to select subset of embeddings. If None, uses all embeddings, by default None
n_components (int, optional) – Number of dimensions, by default 2
n_neighbors (int, optional) – Size of local neighborhood, by default 15. Reasonable range: 5-50 depending on desired granularity
random_state (int, optional) – Random seed for reproducibility, by default 42

Returns:

Array of shape (n_selected, n_components) containing 2D embeddings

Return type:

numpy.ndarray

Raises:

ImportError – If umap-learn is not installed

napistu_torch.visualization.embeddings.plot_coordinates_with_masks(coordinates, masks, mask_names, figsize=(15, 10), ncols=3, cmap_bg='lightgray', cmap_fg='red', alpha=0.6, s=10)

Plot 2D coordinates with binary masks overlaid separately for each category.

Parameters:

coordinates (array-like, shape (n_points, 2)) – 2D coordinates (e.g., UMAP layout)
masks (array-like, shape (n_points, n_categories)) – Binary mask matrix where each column represents a category
mask_names (list of str) – Names of the categories (one per column of masks)
figsize (tuple, optional) – Figure size (width, height)
ncols (int, optional) – Number of columns in the subplot grid
cmap_bg (str, optional) – Color for points where mask is False (0)
cmap_fg (str, optional) – Color for points where mask is True (1)
alpha (float, optional) – Transparency of points
s (float, optional) – Size of points

Returns:

fig (matplotlib.figure.Figure) – The figure object
axes (array of matplotlib.axes.Axes) – Array of subplot axes