napistu_torch.load.stratification
Stratification utilities for edge splitting.
This module provides functions for creating composite edge strata and managing stratification for train/test/val splitting.
Public Functions
- create_composite_edge_strata(napistu_graph, stratify_by=STRATIFY_BY.NODE_SPECIES_TYPE)
Create a composite edge attribute by concatenating the endpoint attributes.
- ensure_strata_series(edge_strata)
Ensure edge_strata is a pandas Series, converting from DataFrame if needed.
- merge_rare_strata(edge_strata, min_count, other_category_name=’rare’)
Merge rare strata into a single category.
- validate_edge_strata_alignment(napistu_data, edge_strata)
Validate that strata series is aligned with NapistuData edge_index.
Functions
|
Create a composite edge attribute by concatenating the endpoint attributes. |
|
Ensure edge_strata is a pandas Series. |
|
Merge rare strata categories into an "other" category. |
|
Verify edge_strata from->to aligns with NapistuData edge names. |
- napistu_torch.load.stratification.create_composite_edge_strata(napistu_graph: NapistuGraph, stratify_by: str = 'node_species_type') Series
Create a composite edge attribute by concatenating the endpoint attributes.
- Parameters:
napistu_graph (NapistuGraph) – A NapistuGraph object.
stratify_by (str) – The attribute(s) to stratify by. Must be one of the following: - STRATIFY_BY.NODE_SPECIES_TYPE - species and node type - STRATIFY_BY.NODE_TYPE - node type (species and reactions) - STRATIFY_BY.EDGE_SBO_TERMS - SBO terms (upstream and downstream)
- Returns:
A series with the composite edge attribute.
- Return type:
pd.Series
- napistu_torch.load.stratification.ensure_strata_series(edge_strata: Series | DataFrame | None) Series | None
Ensure edge_strata is a pandas Series.
Converts DataFrame with single column “edge_strata” back to Series, or returns Series as-is if already a Series.
- Parameters:
edge_strata (pd.Series or pd.DataFrame or None) – Edge strata data. If DataFrame, must have single column named “edge_strata”.
- Returns:
Edge strata as a Series. If None, returns None.
- Return type:
pd.Series
- Raises:
ValueError – If edge_strata is DataFrame but doesn’t have exactly one column named “edge_strata”.
- napistu_torch.load.stratification.merge_rare_strata(edge_strata: Series, min_count: int, other_category_name: str = 'other') Series
Merge rare strata categories into an “other” category.
Categories with fewer than min_count edges are collapsed into a single “other” category. This helps prevent issues with rare relation types that may not have sufficient samples for reliable AUC computation.
- Parameters:
edge_strata (pd.Series) – Edge strata series with composite edge attributes.
min_count (int) – Minimum number of edges required for a category to be kept separate. Categories with fewer edges will be merged into “other”.
other_category_name (str, default="other") – Name for the merged category containing rare strata.
- Returns:
Edge strata with rare categories merged into “other”.
- Return type:
pd.Series
Examples
>>> edge_strata = pd.Series([ ... "interactor -> interactor", ... "interactor -> interactor", ... "stimulator -> product", # Rare (only 1) ... ]) >>> merged = merge_rare_strata(edge_strata, min_count=2) >>> merged.unique() array(['interactor -> interactor', 'other'])
- napistu_torch.load.stratification.validate_edge_strata_alignment(napistu_data: NapistuData, edge_strata: Series) None
Verify edge_strata from->to aligns with NapistuData edge names.