napistu_torch.load.napistu_graphs

NapistuGraph to NapistuData conversion utilities.

This module provides functions for converting NapistuGraph objects to NapistuData objects with various configurations and masking strategies.

Public Functions

augment_napistu_graph(sbml_dfs, napistu_graph, sbml_dfs_summary_types=None, ignored_attributes=None, ignored_if_constant_attributes=None, inplace=False): Augment a NapistuGraph with additional vertex and edge attributes from SBML_dfs.
construct_vertex_labeled_napistu_data(sbml_dfs, napistu_graph, splitting_strategy=SPLITTING_STRATEGIES.VERTEX_MASK, label_type=LABEL_TYPE.SPECIES_TYPE, task_type=TASK_TYPES.CLASSIFICATION, name=None, **kwargs): Construct a NapistuData object with vertex labels from a NapistuGraph.
construct_unlabeled_napistu_data(sbml_dfs, napistu_graph, splitting_strategy=SPLITTING_STRATEGIES.NO_MASK, name=None, **kwargs): Construct an unlabeled NapistuData object from a NapistuGraph.
napistu_graph_to_napistu_data(napistu_graph, splitting_strategy, vertex_default_transforms=None, edge_default_transforms=None, labels=None, relation_type=None, name=None, **kwargs): Convert a NapistuGraph to a NapistuData object with optional labels and transforms.

Functions

`augment_napistu_graph`(sbml_dfs, napistu_graph)	Augment the NapistuGraph with information from the SBML_dfs.
`construct_unlabeled_napistu_data`(sbml_dfs, ...)	Construct a NapistuData object from an SBML_dfs and NapistuGraph.
`construct_vertex_labeled_napistu_data`(...[, ...])	Construct a PyG data object for supervised training tasks using a SBML_dfs and NapistuGraph.
`napistu_graph_to_napistu_data`(napistu_graph, ...)	Convert a NapistuGraph to NapistuData object(s) with specified splitting strategy.

napistu_torch.load.napistu_graphs._extract_edge_weights(edge_df: DataFrame) → torch.Tensor | None

Extract original edge weights from edge DataFrame.

Parameters:: edge_df (pd.DataFrame) – Edge DataFrame containing weight information
Returns:: 1D tensor of original edge weights, or None if no weights found
Return type:: Optional[torch.Tensor]

napistu_torch.load.napistu_graphs._get_napistu_graph_names(vertex_df: DataFrame, edge_df: DataFrame) → tuple[Series, DataFrame]

Extract minimal NapistuGraph attributes for NapistuData storage.

This function extracts only the essential attributes needed for debugging and validation, keeping file sizes small while preserving the ability to look up the full NapistuGraph when needed.

Parameters:

vertex_df (pd.DataFrame) – Full vertex DataFrame from NapistuGraph
edge_df (pd.DataFrame) – Full edge DataFrame from NapistuGraph

Returns:

vertex_names: Series of vertex names aligned with tensor rows
edge_names: DataFrame with ‘from’ and ‘to’ columns aligned with tensor columns

Return type:

tuple[pd.Series, pd.DataFrame]

napistu_torch.load.napistu_graphs._ignore_graph_attributes(napistu_graph: NapistuGraph, ignored_attributes: dict[str, list[str]] = {'edges': ['string_wt', 'IntAct_interaction_method_unknown', 'OmniPath_is_directed', 'OmniPath_is_inhibition', 'OmniPath_is_stimulation', 'sbo_term_downstream_SBO:0000336'], 'vertices': ['ontology_reactome', 'ontology_intact', 'ontology_kegg.drug', 'ontology_smiles', 'ontology_other']}) → None

Remove specified attributes from vertices or edges.

This function removes the specified attributes from either vertices or edges. This is generally to restrict the vertex and edge encodings to a manageable size.

napistu_torch.load.napistu_graphs._ignore_if_constant(napistu_graph: NapistuGraph, attributes_to_check: dict[str, dict[str, Any]] = {'edges': {'STRING_database_transferred': 0, 'STRING_neighborhood': 0}, 'vertices': {}}) → None

Remove attributes from vertices or edges if they are constant at a specific value or missing for all vertices/edges.

This function checks specified attributes and removes them if: - All vertices/edges have the specified value (or None/missing) - All vertices/edges have missing/None values

Parameters:

napistu_graph (NapistuGraph) – The NapistuGraph to check and modify.
attributes_to_check (dict[str, dict[str, Any]], optional) – Dictionary mapping NAPISTU_GRAPH.EDGES/VERTICES to dicts mapping attribute names to values to check against. For example, {“some_attr”: 0} means check if all values are 0 or None. Defaults to empty dicts.

Returns:

Modifies the NapistuGraph in place.

Return type:

None

napistu_torch.load.napistu_graphs._name_napistu_data(splitting_strategy: str, labels: torch.Tensor | None = None, labeling_manager: LabelingManager | None = None) → str

Generate a descriptive name for NapistuData objects based on labeling and splitting strategy.

This function creates a filename-friendly name that includes information about: - The labeling strategy (if supervised) - The splitting strategy used

Parameters:

splitting_strategy (Optional[str], default=None) – The splitting strategy used (e.g., ‘vertex_mask’, ‘edge_mask’, ‘no_mask’, ‘inductive’).
labels (Optional[torch.Tensor], default=None) – The labels for the data. If None, indicates unlabeled data.
labeling_manager (Optional[LabelingManager], default=None) – The labeling manager used for supervised tasks. If provided, this can improve the labels of data objects supporting supervised learning.

Returns:

A descriptive name suitable for use as a filename.

Return type:

str

Examples

>>> # Supervised data with vertex masking
>>> name = _name_napistu_data(splitting_strategy="vertex_mask", labels=labels, labeling_manager=labeling_manager)
>>> print(name)  # "supervised_species_type_vertex_mask"

>>> # Unlabeled data with no masking
>>> name = _name_napistu_data(splitting_strategy="no_mask")
>>> print(name)  # "unlabeled"

napistu_torch.load.napistu_graphs._napistu_graph_to_edge_masked_napistu_data(napistu_graph: NapistuGraph, name: str, vertex_default_transforms: Dict[str, Dict] | EncodingManager = {'categorical': {'node_type'}, 'sparse_categorical': {'species_type'}}, vertex_transforms: Dict[str, Dict] | EncodingManager | None = None, edge_default_transforms: Dict[str, Dict] | EncodingManager = {'binary': {'r_isreversible'}, 'categorical': {'direction', 'sbo_term_downstream', 'sbo_term_upstream'}, 'numeric': {'stoichiometry_downstream', 'stoichiometry_upstream', 'weight', 'weight_upstream'}}, edge_transforms: Dict[str, Dict] | EncodingManager | None = None, auto_encode: bool = True, encoders: Dict[str, Any] = {'binary': 'passthrough', 'categorical': OneHotEncoder(drop='if_binary', sparse_output=False), 'numeric': StandardScaler(), 'sparse_categorical': OneHotEncoder(handle_unknown='ignore', sparse_output=False), 'sparse_numeric': SparseContScaler()}, deduplicate_features: bool = True, train_size: float = 0.7, test_size: float = 0.15, val_size: float = 0.15, verbose: bool = True, labels: torch.Tensor | None = None, labeling_manager: LabelingManager | None = None, relation_type: torch.Tensor | None = None, relation_manager: LabelingManager | None = None) → NapistuData: NapistuGraph to NapistuData object with edge masks split across train, test, and validation edge sets.

napistu_torch.load.napistu_graphs._napistu_graph_to_inductive_napistu_data(napistu_graph: NapistuGraph, name: str, vertex_default_transforms: Dict[str, Dict] | EncodingManager = {'categorical': {'node_type'}, 'sparse_categorical': {'species_type'}}, vertex_transforms: Dict[str, Dict] | EncodingManager | None = None, edge_default_transforms: Dict[str, Dict] | EncodingManager = {'binary': {'r_isreversible'}, 'categorical': {'direction', 'sbo_term_downstream', 'sbo_term_upstream'}, 'numeric': {'stoichiometry_downstream', 'stoichiometry_upstream', 'weight', 'weight_upstream'}}, edge_transforms: Dict[str, Dict] | EncodingManager | None = None, encoders: Dict[str, Any] = {'binary': 'passthrough', 'categorical': OneHotEncoder(drop='if_binary', sparse_output=False), 'numeric': StandardScaler(), 'sparse_categorical': OneHotEncoder(handle_unknown='ignore', sparse_output=False), 'sparse_numeric': SparseContScaler()}, auto_encode: bool = True, deduplicate_features: bool = True, train_size: float = 0.7, test_size: float = 0.15, val_size: float = 0.15, verbose: bool = True, labels: torch.Tensor | None = None, labeling_manager: LabelingManager | None = None, relation_type: torch.Tensor | None = None, relation_manager: LabelingManager | None = None) → Dict[str, NapistuData]: Create PyG Data objects from a NapistuGraph with an inductive split into train, test, and validation sets.

napistu_torch.load.napistu_graphs._napistu_graph_to_unmasked_napistu_data(napistu_graph: NapistuGraph, name: str, vertex_transforms: Dict[str, Dict] | EncodingManager | None = None, edge_transforms: Dict[str, Dict] | EncodingManager | None = None, vertex_default_transforms: Dict[str, Dict] | EncodingManager = {'categorical': {'node_type'}, 'sparse_categorical': {'species_type'}}, edge_default_transforms: Dict[str, Dict] | EncodingManager = {'binary': {'r_isreversible'}, 'categorical': {'direction', 'sbo_term_downstream', 'sbo_term_upstream'}, 'numeric': {'stoichiometry_downstream', 'stoichiometry_upstream', 'weight', 'weight_upstream'}}, encoders: Dict[str, Any] = {'binary': 'passthrough', 'categorical': OneHotEncoder(drop='if_binary', sparse_output=False), 'numeric': StandardScaler(), 'sparse_categorical': OneHotEncoder(handle_unknown='ignore', sparse_output=False), 'sparse_numeric': SparseContScaler()}, auto_encode: bool = True, deduplicate_features: bool = True, verbose: bool = False, labels: torch.Tensor | None = None, labeling_manager: LabelingManager | None = None, relation_type: torch.Tensor | None = None, relation_manager: LabelingManager | None = None) → NapistuData: Create a PyTorch Geometric Data object from a NapistuGraph without any splitting/masking of vertices or edges

napistu_torch.load.napistu_graphs._napistu_graph_to_vertex_masked_napistu_data(napistu_graph: NapistuGraph, name: str, vertex_default_transforms: Dict[str, Dict] | EncodingManager = {'categorical': {'node_type'}, 'sparse_categorical': {'species_type'}}, vertex_transforms: Dict[str, Dict] | EncodingManager | None = None, edge_default_transforms: Dict[str, Dict] | EncodingManager = {'binary': {'r_isreversible'}, 'categorical': {'direction', 'sbo_term_downstream', 'sbo_term_upstream'}, 'numeric': {'stoichiometry_downstream', 'stoichiometry_upstream', 'weight', 'weight_upstream'}}, edge_transforms: Dict[str, Dict] | EncodingManager | None = None, encoders: Dict[str, Any] = {'binary': 'passthrough', 'categorical': OneHotEncoder(drop='if_binary', sparse_output=False), 'numeric': StandardScaler(), 'sparse_categorical': OneHotEncoder(handle_unknown='ignore', sparse_output=False), 'sparse_numeric': SparseContScaler()}, auto_encode: bool = True, deduplicate_features: bool = True, train_size: float = 0.7, test_size: float = 0.15, val_size: float = 0.15, verbose: bool = True, labels: torch.Tensor | None = None, labeling_manager: LabelingManager | None = None, relation_type: torch.Tensor | None = None, relation_manager: LabelingManager | None = None) → NapistuData: Create PyG Data objects from a NapistuGraph with vertex masks split across train, test, and validation vertex sets.

napistu_torch.load.napistu_graphs._standardize_graph_dfs_and_encodings(napistu_graph: NapistuGraph, vertex_default_transforms: Dict[str, Dict] | EncodingManager, vertex_transforms: Dict[str, Dict] | EncodingManager | None, edge_default_transforms: Dict[str, Dict] | EncodingManager, edge_transforms: Dict[str, Dict] | EncodingManager | None, auto_encode: bool, encoders: Dict[str, Any] = {'binary': 'passthrough', 'categorical': OneHotEncoder(drop='if_binary', sparse_output=False), 'numeric': StandardScaler(), 'sparse_categorical': OneHotEncoder(handle_unknown='ignore', sparse_output=False), 'sparse_numeric': SparseContScaler()}) → tuple[DataFrame, DataFrame, EncodingManager, EncodingManager]

Standardize the node and edge DataFrames and encoding managers for a NapistuGraph.

This is a common pattern to prepare a NapistuGraph for encoding as matrices of vertex and edge features.

Parameters:

napistu_graph (NapistuGraph) – The NapistuGraph to standardize
vertex_default_transforms (Dict[str, Dict]) – The default vertex transformations to apply
vertex_transforms (Dict[str, Dict]) – Additional vertex transformations to apply
edge_default_transforms (Dict[str, Dict]) – The default edge transformations to apply
edge_transforms (Dict[str, Dict]) – Additional edge transformations to apply
auto_encode (bool) – Whether to automatically select an appropriate encoding for unaccounted for attributes
encoders (Dict) – The encoders to use

Returns:

tuple[pd.DataFrame, pd.DataFrame, EncodingManager, EncodingManager]
- vertex_df (pd.DataFrame) – The vertex DataFrame
- edge_df (pd.DataFrame) – The edge DataFrame
- vertex_encoding_manager (EncodingManager) – The vertex encoding manager
- edge_encoding_manager (EncodingManager) – The edge encoding manager

napistu_torch.load.napistu_graphs.augment_napistu_graph(sbml_dfs: SBML_dfs, napistu_graph: NapistuGraph, sbml_dfs_summary_types: list = ['sources', 'ontologies'], ignored_attributes: dict[str, list[str]] = {'edges': ['string_wt', 'IntAct_interaction_method_unknown', 'OmniPath_is_directed', 'OmniPath_is_inhibition', 'OmniPath_is_stimulation', 'sbo_term_downstream_SBO:0000336'], 'vertices': ['ontology_reactome', 'ontology_intact', 'ontology_kegg.drug', 'ontology_smiles', 'ontology_other']}, ignored_if_constant_attributes: dict[str, dict[str, Any]] = {'edges': {'STRING_database_transferred': 0, 'STRING_neighborhood': 0}, 'vertices': {}}, inplace: bool = False) → None

Augment the NapistuGraph with information from the SBML_dfs.

This function adds summaries of the SBML_dfs to the NapistuGraph, and extends the graph with reaction and species data from the SBML_dfs.

Parameters:

sbml_dfs (SBML_dfs) – The SBML_dfs to augment the NapistuGraph with.
napistu_graph (NapistuGraph) – The NapistuGraph to augment.
sbml_dfs_summary_types (list, optional) – Types of summaries to include. Defaults to all valid summary types.
ignored_attributes (dict[str, list[str]], optional) – A dictionary of attribute types and lists of attributes to ignore. Defaults to IGNORED_EDGE_ATTRIBUTES and IGNORED_VERTEX_ATTRIBUTES.
ignored_if_constant_attributes (dict[str, dict[str, Any]], optional) – A dictionary of attribute types and dicts mapping attribute names to values to check against. For example, {“some_attr”: 0} means check if all values are 0 or None. Defaults to IGNORED_IF_CONSTANT_EDGE_ATTRIBUTES and IGNORED_IF_CONSTANT_VERTEX_ATTRIBUTES.
inplace (bool, default=False) – If True, modify the NapistuGraph in place. If False, return a new NapistuGraph with the augmentations.

Returns:

Modifies the NapistuGraph in place.

Return type:

None

napistu_torch.load.napistu_graphs.construct_unlabeled_napistu_data(sbml_dfs: SBML_dfs, napistu_graph: NapistuGraph, splitting_strategy: str = 'no_mask', name: str | None = None, relation_strata_type: str | None = None, min_relation_count: int | None = None, **kwargs) → NapistuData | Dict[str, NapistuData]

Construct a NapistuData object from an SBML_dfs and NapistuGraph.

This function augments the NapistuGraph with SBML_dfs summaries and reaction data, and then encodes the graph into a NapistuData object.

Parameters:

sbml_dfs (SBML_dfs) – The SBML_dfs to augment the NapistuGraph with.
napistu_graph (NapistuGraph) – The NapistuGraph to augment.
splitting_strategy (str, optional) – The splitting strategy to use for the NapistuData object. Defaults to SPLITTING_STRATEGIES.NO_MASK.
name (Optional[str], default=None) – Name for the NapistuData object. If None, uses the default name.
relation_strata_type (Optional[str], default=None) – If provided, creates relation labels based on a edge_strata (combinations of edge and from/to vertex attributes). Must be one of VALID_STRATIFY_BY values (e.g., STRATIFY_BY.NODE_SPECIES_TYPE). Creates relation_strata and relation_manager for relation-aware tasks.
min_relation_count (Optional[int], default=None) – If provided, merge rare strata categories with fewer than min_strata_count edges into a single “other relation” category. This helps prevent issues with rare relation types that may not have sufficient samples for reliable AUC computation. If None, no merging is performed.
**kwargs – Additional keyword arguments to pass to napistu_graph_to_napistu_data.

Returns:

NapistuData – A NapistuData object containing the augmented NapistuGraph. If splitting_strategy is ‘inductive’, returns Dict[str, NapistuData] with keys ‘train’, ‘test’, ‘validation’ (or subset thereof).

Return type:

Union[NapistuData, Dict[str, NapistuData]]

Examples

>>> # Unmasked data with default splits
>>> data = construct_unlabeled_napistu_data(ng, splitting_strategy='no_mask')
>>> # With relation labels
>>> data = construct_unlabeled_napistu_data(
...     ng, splitting_strategy='no_mask', relation_strata_type=STRATIFY_BY.NODE_SPECIES_TYPE
... )

napistu_torch.load.napistu_graphs.construct_vertex_labeled_napistu_data(sbml_dfs: SBML_dfs, napistu_graph: NapistuGraph, splitting_strategy: str = 'vertex_mask', label_type: str | LabelingManager = 'species_type', task_type: str = 'classification', labeling_managers: Dict[str, LabelingManager] | None = {'node_type': LabelingManager(label_attribute='node_type', exclude_vertex_attributes=['node_type', 'species_type'], augment_summary_types=['sources'], label_names=None), 'species_type': LabelingManager(label_attribute='species_type', exclude_vertex_attributes=['species_type'], augment_summary_types=['sources'], label_names=None)}, name: str | None = None, deduplicate_features: bool = True, **kwargs) → NapistuData | Dict[str, NapistuData]

Construct a PyG data object for supervised training tasks using a SBML_dfs and NapistuGraph.

This function handles the workflow for supervised learning tasks where labels are derived from graph attributes. The process is: 1. Extract labels from the original graph (before augmentation) - labels may depend on

attributes that exist in the original graph

Augment the graph with SBML_dfs data (sources, reactions, species) to add features
Remove attributes that should not be encoded as features (e.g., the label attribute itself)
Encode the augmented graph (with excluded attributes removed) into NapistuData

Note: Labels are created from the original graph because they may depend on specific attributes that must be present before augmentation. However, the final NapistuData uses the augmented graph to ensure all SBML_dfs-derived features are included.

Parameters:

sbml_dfs (SBML_dfs) – The SBML_dfs to augment the NapistuGraph with.
napistu_graph (NapistuGraph) – The NapistuGraph to augment.
splitting_strategy (str) – The strategy to use for splitting the data into train/test/val sets.
label_type (Union[str, LabelingManager]) – The type of labels to use for the supervised task.
task_type (str) – The type of task to use for the supervised task (classification or regression)
labeling_managers (Optional[Dict[str, LabelingManager]]) – A dictionary of LabelingManager objects for each label type. If label_type is a LabelingManager, this is ignored.
name (Optional[str], default=None) – Name for the NapistuData object. If None, uses the default name.
**kwargs (dict) – Additional keyword arguments to pass to the NapistuData constructor.

Returns:

NapistuData – A NapistuData object containing the augmented NapistuGraph and labels. The labeling manager is embedded in the NapistuData object. If splitting_strategy is ‘inductive’, returns Dict[str, NapistuData] with keys ‘train’, ‘test’, ‘validation’ (or subset thereof).

Return type:

Union[NapistuData, Dict[str, NapistuData]]

Examples

>>> # Vertex masking with default splits
>>> data = construct_vertex_labeled_napistu_data(ng, splitting_strategy='vertex_mask')

napistu_torch.load.napistu_graphs.napistu_graph_to_napistu_data(napistu_graph: NapistuGraph, splitting_strategy: str, vertex_default_transforms: Dict[str, Dict] | EncodingManager = {'categorical': {'node_type'}, 'sparse_categorical': {'species_type'}}, vertex_transforms: Dict[str, Dict] | EncodingManager | None = None, edge_default_transforms: Dict[str, Dict] | EncodingManager = {'binary': {'r_isreversible'}, 'categorical': {'direction', 'sbo_term_downstream', 'sbo_term_upstream'}, 'numeric': {'stoichiometry_downstream', 'stoichiometry_upstream', 'weight', 'weight_upstream'}}, edge_transforms: Dict[str, Dict] | EncodingManager | None = None, auto_encode: bool = True, encoders: Dict[str, Any] = {'binary': 'passthrough', 'categorical': OneHotEncoder(drop='if_binary', sparse_output=False), 'numeric': StandardScaler(), 'sparse_categorical': OneHotEncoder(handle_unknown='ignore', sparse_output=False), 'sparse_numeric': SparseContScaler()}, deduplicate_features: bool = True, labels: torch.Tensor | None = None, labeling_manager: LabelingManager | None = None, relation_type: torch.Tensor | None = None, relation_manager: LabelingManager | None = None, name: str | None = None, verbose: bool = True, **strategy_kwargs: Any) → NapistuData | Dict[str, NapistuData]

Convert a NapistuGraph to NapistuData object(s) with specified splitting strategy.

This function transforms a NapistuGraph (representing a biological network) into a NapistuData object (based on PyTorch Geometric Data object) suitable for graph neural network training. Node and edge features are automatically encoded using configurable transformers.

Parameters:

napistu_graph (NapistuGraph) – The input graph to convert
splitting_strategy (str) – One of: ‘edge_mask’, ‘vertex_mask’, ‘no_mask’, ‘inductive’
napistu_graph – The NapistuGraph object containing the biological network data. Must have vertices (nodes) and edges with associated attributes.
vertex_transforms (Optional[Union[Dict[str, Dict], EncodingManager]], default=None) – Optional override configuration for vertex (node) feature encoding. If provided, will be merged with vertex_default_transforms using the merge strategy from compose_configs.
edge_transforms (Optional[Union[Dict[str, Dict], EncodingManager]], default=None) – Optional override configuration for edge feature encoding. If provided, will be merged with edge_default_transforms using the merge strategy from compose_configs.
vertex_default_transforms (Union[Dict[str, Dict], EncodingManager], default=VERTEX_DEFAULT_TRANSFORMS) – Default encoding configuration for vertex features. By default, encodes: - node_type and species_type as categorical features using OneHotEncoder
edge_default_transforms (Union[Dict[str, Dict], EncodingManager], default=EDGE_DEFAULT_TRANSFORMS) – Default encoding configuration for edge features. By default, encodes: - direction and sbo_term as categorical features using OneHotEncoder - stoichiometry, weight, and upstream_weight as numerical features using StandardScaler - r_isreversible as boolean features using passthrough
encoders (Dict[str, Any], default=DEFAULT_ENCODERS) – Dictionary of encoders to use for encoding. This is passed to the encoding.compose_encoding_configs function and auto_encode function.
auto_encode (bool, default=True) – If True, autoencode attributes that are not explicitly encoded (and which are not part of NEVER_ENCODE).
deduplicate_features (bool, default=True) – If True, deduplicate identical features and name the resulting columns using the shortest common prefix of the merged columns.
verbose (bool, default=False) – If True, log detailed information about config composition and encoding.
labels (Optional[torch.Tensor], default=None) – Optional labels tensor to set as ‘y’ attribute in the resulting NapistuData object(s).
labeling_manager (Optional[LabelingManager], default=None) – Labeling manager used to encode the labels. Should be provided when labels are present.
relation_type (Optional[torch.Tensor], default=None) – Optional relation type tensor to set as ‘relation_type’ attribute in the resulting NapistuData object(s).
relation_manager (Optional[LabelingManager], default=None) – Relation labeling manager used to encode edge/relation labels. Should be provided when relation labels are created (e.g., from edge_strata).
name (Optional[str], default=None) – Name for the NapistuData object(s). If None, uses the default name.
**strategy_kwargs (Any) – Strategy-specific arguments: - For ‘edge_mask’: train_size=0.8, val_size=0.1, test_size=0.1 - For ‘vertex_mask’: train_size=0.8, val_size=0.1, test_size=0.1 - For ‘inductive’: num_hops=2, train_size=0.8, etc.

Returns:

NapistuData object (subclass of PyTorch Geometric Data) containing: - x : torch.Tensor

Node features tensor of shape (num_nodes, num_node_features)

edge_indextorch.Tensor
Edge connectivity tensor of shape (2, num_edges) with source and target indices
edge_attrtorch.Tensor
Edge features tensor of shape (num_edges, num_edge_features)
edge_weighttorch.Tensor, optional
1D tensor of original edge weights for scalar weight-based models
vertex_feature_namesList[str]
List of vertex feature names
edge_feature_namesList[str]
List of edge feature names
vertex_feature_name_aliasesDict[str, str]
Mapping from vertex feature names to their canonical names (for deduplicated features)
edge_feature_name_aliasesDict[str, str]
Mapping from edge feature names to their canonical names (for deduplicated features)
optional, ytorch.Tensor
Labels tensor if labels parameter was provided

If splitting_strategy is ‘inductive’, returns Dict[str, NapistuData] with keys ‘train’, ‘test’, ‘validation’ (or subset thereof).

Return type:

Union[NapistuData, Dict[str, NapistuData]]

Examples

>>> # Edge masking with custom split ratios
>>> data = napistu_graph_to_napistu_data(
...     ng,
...     splitting_strategy='edge_mask',
...     train_size=0.7,
...     val_size=0.15,
...     test_size=0.15
... )

>>> # Vertex masking with default splits
>>> data = napistu_graph_to_napistu_data(ng, splitting_strategy='vertex_mask')

>>> # Inductive split with custom parameters
>>> data_dict = napistu_graph_to_napistu_data(
...     ng,
...     splitting_strategy='inductive',
...     num_hops=3,
...     train_size=0.8
... )