napistu_torch.load.stratification

Stratification utilities for edge splitting.

This module provides functions for creating composite edge strata and managing stratification for train/test/val splitting.

Public Functions

create_composite_edge_strata(napistu_graph, stratify_by=STRATIFY_BY.NODE_SPECIES_TYPE)

Create a composite edge attribute by concatenating the endpoint attributes.

ensure_strata_series(edge_strata)

Ensure edge_strata is a pandas Series, converting from DataFrame if needed.

merge_rare_strata(edge_strata, min_count, other_category_name=’rare’)

Merge rare strata into a single category.

validate_edge_strata_alignment(napistu_data, edge_strata)

Validate that strata series is aligned with NapistuData edge_index.

Functions

create_composite_edge_strata(napistu_graph)

Create a composite edge attribute by concatenating the endpoint attributes.

ensure_strata_series(edge_strata)

Ensure edge_strata is a pandas Series.

merge_rare_strata(edge_strata, min_count[, ...])

Merge rare strata categories into an "other" category.

validate_edge_strata_alignment(napistu_data, ...)

Verify edge_strata from->to aligns with NapistuData edge names.

napistu_torch.load.stratification.create_composite_edge_strata(napistu_graph: NapistuGraph, stratify_by: str = 'node_species_type') Series

Create a composite edge attribute by concatenating the endpoint attributes.

Parameters:
  • napistu_graph (NapistuGraph) – A NapistuGraph object.

  • stratify_by (str) – The attribute(s) to stratify by. Must be one of the following: - STRATIFY_BY.NODE_SPECIES_TYPE - species and node type - STRATIFY_BY.NODE_TYPE - node type (species and reactions) - STRATIFY_BY.EDGE_SBO_TERMS - SBO terms (upstream and downstream)

Returns:

A series with the composite edge attribute.

Return type:

pd.Series

napistu_torch.load.stratification.ensure_strata_series(edge_strata: Series | DataFrame | None) Series | None

Ensure edge_strata is a pandas Series.

Converts DataFrame with single column “edge_strata” back to Series, or returns Series as-is if already a Series.

Parameters:

edge_strata (pd.Series or pd.DataFrame or None) – Edge strata data. If DataFrame, must have single column named “edge_strata”.

Returns:

Edge strata as a Series. If None, returns None.

Return type:

pd.Series

Raises:

ValueError – If edge_strata is DataFrame but doesn’t have exactly one column named “edge_strata”.

napistu_torch.load.stratification.merge_rare_strata(edge_strata: Series, min_count: int, other_category_name: str = 'other') Series

Merge rare strata categories into an “other” category.

Categories with fewer than min_count edges are collapsed into a single “other” category. This helps prevent issues with rare relation types that may not have sufficient samples for reliable AUC computation.

Parameters:
  • edge_strata (pd.Series) – Edge strata series with composite edge attributes.

  • min_count (int) – Minimum number of edges required for a category to be kept separate. Categories with fewer edges will be merged into “other”.

  • other_category_name (str, default="other") – Name for the merged category containing rare strata.

Returns:

Edge strata with rare categories merged into “other”.

Return type:

pd.Series

Examples

>>> edge_strata = pd.Series([
...     "interactor -> interactor",
...     "interactor -> interactor",
...     "stimulator -> product",  # Rare (only 1)
... ])
>>> merged = merge_rare_strata(edge_strata, min_count=2)
>>> merged.unique()
array(['interactor -> interactor', 'other'])
napistu_torch.load.stratification.validate_edge_strata_alignment(napistu_data: NapistuData, edge_strata: Series) None

Verify edge_strata from->to aligns with NapistuData edge names.