napistu_torch.napistu_data_store

NapistuDataStore - Management system for Napistu data objects.

This module provides a store system for managing NapistuData, VertexTensor, and pandas DataFrame objects related to a single SBML_dfs/NapistuGraph pair.

Classes

NapistuDataStore

Manage data objects related to a single SBML_dfs/NapistuGraph pair.

Classes

NapistuDataStore(store_dir)

Manage data objects related to a single SBML_dfs/NapistuGraph pair.

class napistu_torch.napistu_data_store.NapistuDataStore(store_dir: str | Path)

Bases: object

Manage data objects related to a single SBML_dfs/NapistuGraph pair.

Directory structure: store_dir/ ├── registry.json # Registry of all objects in this store ├── napistu_raw/ # (optional raw directory) │ ├── sbml_dfs.pkl # (optional copy) │ └── napistu_graph.pkl # (optional copy) ├── napistu_data/ # organizes NapistuData objects | └── (NapistuData .pt files) ├── vertex_tensors/ # organizes VertexTensor objects │ └── (VertexTensor .pt files) └── pandas_dfs/ # organizes pandas DataFrames

└── (DataFrame .parquet files)

Each store manages objects for a single biological network.

Public Methods

create(store_dir, sbml_dfs_path, napistu_graph_path, copy_to_store=False, overwrite=False)

Create a new NapistuDataStore

enable_artifact_creation(sbml_dfs_path, napistu_graph_path, copy_to_store=False)

Convert a read-only store to non-read-only by enabling artifact creation.

ensure_artifacts(artifact_names, artifact_registry=DEFAULT_ARTIFACT_REGISTRY, overwrite=False)

Ensure specified artifacts exist in the store, creating if missing.

get_missing_artifacts(artifact_names)

Check which artifacts are missing from the store.

from_config(config)

Create or load a NapistuDataStore from a DataConfig.

from_huggingface(repo_id, store_dir, revision=None, token=None, sbml_dfs_path=None, napistu_graph_path=None, copy_to_store=False)

Download and create a NapistuDataStore from HuggingFace Hub. Optionally convert from read-only to non-read-only by providing paths.

list_napistu_datas()

List all NapistuData names in the store

list_vertex_tensors()

List all VertexTensor names in the store

list_pandas_dfs()

List all pandas DataFrame names in the store

load_sbml_dfs()

Load the SBML_dfs from disk

load_napistu_data(name, map_location=”cpu”)

Load a NapistuData object from the store

load_napistu_graph()

Load the NapistuGraph from disk

load_vertex_tensor(name, map_location=”cpu”)

Load a VertexTensor from the store

load_pandas_df(name)

Load a pandas DataFrame from the store

publish_store_to_huggingface(repo_id, revision=None, overwrite=False, commit_message=None, token=None)

Publish entire store to HuggingFace Hub as a read-only dataset

save_napistu_data(napistu_data, name=None, overwrite=False)

Save a NapistuData object to the store

save_vertex_tensor(vertex_tensor, name=None, overwrite=False)

Save a VertexTensor to the store

save_pandas_df(dataframe, name=None, overwrite=False)

Save a pandas DataFrame to the store

summary()

Get a summary of the store contents

Private Methods

_load_registry()

Load the registry from disk

_save_registry()

Save the registry to disk

classmethod create(store_dir: str | Path, read_only: bool = False, sbml_dfs_path: str | Path | None = None, napistu_graph_path: str | Path | None = None, copy_to_store: bool = False, overwrite: bool = False) NapistuDataStore

Create a new NapistuDataStore.

Parameters:
  • store_dir (Union[str, Path]) – Root directory for this store

  • read_only (bool, default=False) – If True, the store will be read-only which means that no new artifacts can be created. If True, sbml_dfs_path and napistu_graph_path will be ignored.

  • sbml_dfs_path (Optional[Union[str, Path]], default=None) – Path to the SBML_dfs pickle file. Ignored if read_only is True.

  • napistu_graph_path (Optional[Union[str, Path]], default=None) – Path to the NapistuGraph pickle file. Ignored if read_only is True.

  • copy_to_store (bool, default=False) – If True, copy the files into the store directory and store relative paths. If False, store absolute paths to the original files. Ignored if read_only is True.

  • overwrite (bool, default=False) – If True, remove existing store_dir if it exists before creating new store. If False, raise FileExistsError if store_dir already exists.

Returns:

The newly created store

Return type:

NapistuDataStore

Raises:
  • FileExistsError – If a registry.json already exists at store_dir and overwrite=False

  • FileNotFoundError – If the specified napistu files don’t exist (only when read_only=False)

Examples

>>> # Create a new store with external paths
>>> store = NapistuDataStore.create(
...     store_dir='./stores/ecoli',
...     sbml_dfs_path='/data/ecoli_sbml_dfs.pkl',
...     napistu_graph_path='/data/ecoli_ng.pkl',
...     copy_to_store=False
... )
classmethod from_config(config: DataConfig, task_config: TaskConfig | None = None, ensure_artifacts: bool = True) NapistuDataStore

Create or load a NapistuDataStore from a DataConfig.

Flow: 1. If store exists: load existing store 2. If store doesn’t exist:

  • If config.hf_repo_id is provided: load store from HuggingFace Hub * If sbml_dfs_path and napistu_graph_path are provided: convert to non-read-only

  • Otherwise: create new store locally (requires sbml_dfs_path and napistu_graph_path) * Creates regular store with provided paths * Copies to store if config.copy_to_store is True

  1. Ensure napistu_data_name, other_artifacts, and task artifacts exist (always, regardless of store creation)

Parameters:
  • config (DataConfig) – Configuration with store location, artifact paths, and requirements. If hf_repo_id is provided and store doesn’t exist, the store will be loaded from HuggingFace Hub. Otherwise, both sbml_dfs_path and napistu_graph_path are required to create a new store.

  • task_config (Optional[TaskConfig], default=None) – Optional task configuration. If provided, artifacts required by the task will be added to the required artifacts list.

  • ensure_artifacts (bool, default=True) – Whether to ensure that napistu_data_name, other_artifacts, and task artifacts exist. Set to False when side-loading napistu_data directly (e.g., during testing).

Returns:

Ready-to-use store with all required artifacts (if ensure_artifacts=True)

Return type:

NapistuDataStore

Raises:
  • ValueError – If store doesn’t exist and neither hf_repo_id nor both paths are provided If hf_repo_id is provided but only one of sbml_dfs_path or napistu_graph_path is provided

  • FileNotFoundError – If sbml_dfs_path or napistu_graph_path don’t exist when creating new store

Examples

>>> from napistu_torch.configs import DataConfig
>>> from pathlib import Path
>>>
>>> # Create a regular store with paths
>>> config = DataConfig(
...     store_dir=Path(".store"),
...     sbml_dfs_path=Path("/path/to/sbml_dfs.pkl"),
...     napistu_graph_path=Path("/path/to/napistu_graph.pkl"),
...     copy_to_store=True,
...     napistu_data_name="edge_prediction",
...     other_artifacts=["unlabeled"]
... )
>>> store = NapistuDataStore.from_config(config)
>>>
>>> # Load store from HuggingFace Hub
>>> hf_config = DataConfig(
...     store_dir=Path(".store"),
...     hf_repo_id="username/my-dataset",
...     hf_revision="v1.0",
...     napistu_data_name="edge_prediction"
... )
>>> hf_store = NapistuDataStore.from_config(hf_config)
>>>
>>> # Load from HuggingFace and convert to non-read-only
>>> hf_config_with_paths = DataConfig(
...     store_dir=Path(".store"),
...     hf_repo_id="username/my-dataset",
...     sbml_dfs_path=Path("/path/to/sbml_dfs.pkl"),
...     napistu_graph_path=Path("/path/to/napistu_graph.pkl"),
...     napistu_data_name="edge_prediction"
... )
>>> hf_store = NapistuDataStore.from_config(hf_config_with_paths)
classmethod from_huggingface(repo_id: str, store_dir: str | Path, revision: str | None = None, token: str | None = None, sbml_dfs_path: str | Path | None = None, napistu_graph_path: str | Path | None = None, copy_to_store: bool = False) NapistuDataStore

Download and create a NapistuDataStore from HuggingFace Hub.

This is a convenience method that uses HFDatasetLoader to download a store from HuggingFace Hub and create a local NapistuDataStore instance.

If sbml_dfs_path and napistu_graph_path are provided, the store will be converted from read-only to non-read-only, allowing artifact creation.

Parameters:
  • repo_id (str) – HuggingFace repository in format “username/repo-name”

  • store_dir (Union[str, Path]) – Local directory where the store will be created

  • revision (Optional[str]) – Git revision (branch, tag, or commit hash). Defaults to “main”

  • token (Optional[str]) – HuggingFace access token for private repositories

  • sbml_dfs_path (Optional[Union[str, Path]]) – Path to SBML_dfs pickle file. If provided along with napistu_graph_path, converts the store from read-only to non-read-only.

  • napistu_graph_path (Optional[Union[str, Path]]) – Path to NapistuGraph pickle file. If provided along with sbml_dfs_path, converts the store from read-only to non-read-only.

  • copy_to_store (bool, default=False) – If True, copy sbml_dfs and napistu_graph into the store directory and store relative paths. If False, store absolute paths to the original files. Only applies when sbml_dfs_path and napistu_graph_path are provided.

Returns:

Loaded store ready to use

Return type:

NapistuDataStore

Raises:
  • ValueError – If only one of sbml_dfs_path or napistu_graph_path is provided

  • FileNotFoundError – If the provided paths don’t exist

Examples

>>> from napistu_torch.napistu_data_store import NapistuDataStore
>>> from pathlib import Path
>>>
>>> # Load read-only store from HuggingFace Hub
>>> store = NapistuDataStore.from_huggingface(
...     repo_id="username/my-dataset",
...     store_dir=Path("./local_store")
... )
>>>
>>> # Load and convert to non-read-only store
>>> store = NapistuDataStore.from_huggingface(
...     repo_id="username/my-dataset",
...     store_dir=Path("./local_store"),
...     sbml_dfs_path=Path("/data/sbml_dfs.pkl"),
...     napistu_graph_path=Path("/data/graph.pkl")
... )
>>>
>>> # Load and copy raw data into store for portability
>>> store = NapistuDataStore.from_huggingface(
...     repo_id="username/my-dataset",
...     store_dir=Path("./local_store"),
...     sbml_dfs_path=Path("./.napistu/sbml_dfs.pkl"),
...     napistu_graph_path=Path("./.napistu/graph.pkl"),
...     copy_to_store=True
... )
>>>
>>> # Use the store
>>> napistu_data = store.load_napistu_data("edge_prediction")
__init__(store_dir: str | Path)

Initialize the NapistuDataStore from an existing registry.

Parameters:

store_dir (Union[str, Path]) – Root directory for this store. Must contain a registry.json file.

Raises:

FileNotFoundError – If the registry.json file does not exist

Examples

>>> # Load an existing store
>>> store = NapistuDataStore('.store')
_load_registry() dict

Load the registry from disk.

_save_registry() None

Save the registry to disk.

_validate_no_duplicate_names(raise_error: bool = True) None

Check for duplicate names across artifact types and warn if found.

enable_artifact_creation(sbml_dfs_path: str | Path, napistu_graph_path: str | Path, copy_to_store: bool = False) None

Convert a read-only store to non-read-only by enabling artifact creation.

Updates the registry to set READ_ONLY to False and store paths to SBML_dfs and NapistuGraph files, allowing the store to create new artifacts.

Parameters:
  • sbml_dfs_path (Union[str, Path]) – Path to SBML_dfs pickle file

  • napistu_graph_path (Union[str, Path]) – Path to NapistuGraph pickle file

  • copy_to_store (bool, default=False) – If True, copy the files into the store directory and store relative paths. If False, store absolute paths to the original files.

Raises:
  • ValueError – If store is already non-read-only

  • FileNotFoundError – If the provided paths don’t exist

ensure_artifacts(artifact_names: ~typing.List[str], artifact_registry: ~typing.Dict[str, ~napistu_torch.load.artifacts.ArtifactDefinition] = {'comprehensive_pathway_memberships': ArtifactDefinition(name='comprehensive_pathway_memberships', artifact_type='vertex_tensor', creation_func=<function _create_comprehensive_pathway_memberships>, description='VertexTensor containing comprehensive pathway membership features'), 'edge_prediction': ArtifactDefinition(name='edge_prediction', artifact_type='napistu_data', creation_func=<function _create_edge_prediction_data>, description='Unlabeled NapistuData with train/test/val edge masking'), 'edge_strata_by_edge_sbo_terms': ArtifactDefinition(name='edge_strata_by_edge_sbo_terms', artifact_type='pandas_dfs', creation_func=<function _create_edge_strata_by_edge_sbo_terms>, description='Pandas DataFrame containing edge strata by from-to edge SBO terms'), 'edge_strata_by_node_species_type': ArtifactDefinition(name='edge_strata_by_node_species_type', artifact_type='pandas_dfs', creation_func=<function _create_edge_strata_by_node_species_type>, description='Pandas DataFrame containing edge strata by node + species type'), 'edge_strata_by_node_type': ArtifactDefinition(name='edge_strata_by_node_type', artifact_type='pandas_dfs', creation_func=<function _create_edge_strata_by_node_type>, description='Pandas DataFrame containing edge strata by node type'), 'name_to_sid_map': ArtifactDefinition(name='name_to_sid_map', artifact_type='pandas_dfs', creation_func=<function _create_name_to_sid_map>, description='Pandas DataFrame containing a map of vertex names to species ids'), 'relation_prediction': ArtifactDefinition(name='relation_prediction', artifact_type='napistu_data', creation_func=<function _create_relation_prediction_data>, description='Unlabeled NapistuData with train/test/val with edge masking and realtion-type labels'), 'species_identifiers': ArtifactDefinition(name='species_identifiers', artifact_type='pandas_dfs', creation_func=<function _create_species_identifiers>, description='Pandas DataFrame containing species identifiers'), 'species_type_prediction': ArtifactDefinition(name='species_type_prediction', artifact_type='napistu_data', creation_func=<function _create_species_type_prediction_data>, description='NapistuData containing species type labels with train/test/val vertex masking'), 'unlabeled': ArtifactDefinition(name='unlabeled', artifact_type='napistu_data', creation_func=<function _create_unlabeled_data>, description='Unlabeled NapistuData without masking')}, overwrite: bool = False) None

Ensure specified artifacts exist in the store, creating if missing.

This is the key method for efficient batch artifact creation. It loads raw data (sbml_dfs, ng) ONCE and creates all missing artifacts.

Only checks registry for artifacts not already in store. This allows custom artifacts to exist in store without being in registry.

Parameters:
  • artifact_names (List[str]) – Names of artifacts to ensure exist

  • artifact_registry (Dict[str, ArtifactDefinition], default=DEFAULT_ARTIFACT_REGISTRY) – Registry of artifact definitions

  • overwrite (bool, default=False) – If True, recreate artifacts even if they exist

Raises:

KeyError – If artifact not in store and not in registry

Examples

>>> # Works with registry artifacts
>>> store.ensure_artifacts(["unlabeled", "edge_prediction"])
>>>
>>> # Also works with custom artifacts already in store
>>> custom_data = construct_custom_pyg_data(...)
>>> store.save_napistu_data(custom_data, name="my_custom_artifact")
>>> store.ensure_artifacts(["my_custom_artifact"])  # Just verifies existence
get_missing_artifacts(artifact_names: ~typing.List[str], artifact_registry: ~typing.Dict[str, ~napistu_torch.load.artifacts.ArtifactDefinition] = {'comprehensive_pathway_memberships': ArtifactDefinition(name='comprehensive_pathway_memberships', artifact_type='vertex_tensor', creation_func=<function _create_comprehensive_pathway_memberships>, description='VertexTensor containing comprehensive pathway membership features'), 'edge_prediction': ArtifactDefinition(name='edge_prediction', artifact_type='napistu_data', creation_func=<function _create_edge_prediction_data>, description='Unlabeled NapistuData with train/test/val edge masking'), 'edge_strata_by_edge_sbo_terms': ArtifactDefinition(name='edge_strata_by_edge_sbo_terms', artifact_type='pandas_dfs', creation_func=<function _create_edge_strata_by_edge_sbo_terms>, description='Pandas DataFrame containing edge strata by from-to edge SBO terms'), 'edge_strata_by_node_species_type': ArtifactDefinition(name='edge_strata_by_node_species_type', artifact_type='pandas_dfs', creation_func=<function _create_edge_strata_by_node_species_type>, description='Pandas DataFrame containing edge strata by node + species type'), 'edge_strata_by_node_type': ArtifactDefinition(name='edge_strata_by_node_type', artifact_type='pandas_dfs', creation_func=<function _create_edge_strata_by_node_type>, description='Pandas DataFrame containing edge strata by node type'), 'name_to_sid_map': ArtifactDefinition(name='name_to_sid_map', artifact_type='pandas_dfs', creation_func=<function _create_name_to_sid_map>, description='Pandas DataFrame containing a map of vertex names to species ids'), 'relation_prediction': ArtifactDefinition(name='relation_prediction', artifact_type='napistu_data', creation_func=<function _create_relation_prediction_data>, description='Unlabeled NapistuData with train/test/val with edge masking and realtion-type labels'), 'species_identifiers': ArtifactDefinition(name='species_identifiers', artifact_type='pandas_dfs', creation_func=<function _create_species_identifiers>, description='Pandas DataFrame containing species identifiers'), 'species_type_prediction': ArtifactDefinition(name='species_type_prediction', artifact_type='napistu_data', creation_func=<function _create_species_type_prediction_data>, description='NapistuData containing species type labels with train/test/val vertex masking'), 'unlabeled': ArtifactDefinition(name='unlabeled', artifact_type='napistu_data', creation_func=<function _create_unlabeled_data>, description='Unlabeled NapistuData without masking')}) List[str]

Check which artifacts are missing from the store.

Parameters:
  • artifact_names (List[str]) – Names of artifacts to check

  • artifact_registry (Dict[str, ArtifactDefinition], default=DEFAULT_ARTIFACT_REGISTRY) – Registry of artifact definitions

Returns:

Names of artifacts that don’t exist in store

Return type:

List[str]

Raises:

KeyError – If any artifact name is not in the registry

Examples

>>> missing = store.get_missing_artifacts([
...     "unlabeled",
...     "edge_prediction",
...     "custom_artifact"
... ])
>>> print(missing)
['edge_prediction']
list_artifacts(artifact_type: str | None = None) list[str]

List all artifact names in the store.

Parameters:

artifact_type (Optional[str], default=None) – Type of artifact to list. If not provided, all artifact types will be listed.

Returns:

List of artifact names in the store

Return type:

list[str]

list_napistu_datas() list[str]

List all NapistuData names in the store.

Returns:

  • list[str]

  • List of NapistuData names in the store

list_pandas_dfs() list[str]

List all pandas DataFrame names in the store.

Returns:

List of pandas DataFrame names in the store

Return type:

list[str]

list_vertex_tensors() list[str]

List all VertexTensor names in the store.

Returns:

List of VertexTensor names in the store

Return type:

list[str]

load_artifact(name: str, artifact_type: str) NapistuData | VertexTensor | DataFrame

Load an artifact from the store.

Parameters:
  • name (str) – Name of the artifact to load

  • artifact_type (str) – Type of the artifact to load

Returns:

The loaded artifact

Return type:

Union[NapistuData, VertexTensor, pd.DataFrame]

Raises:

ValueError – If invalid artifact type

load_napistu_data(name: str, map_location: str = 'cpu') NapistuData

Load a NapistuData object from the store.

Parameters:
  • name (str) – Name of the NapistuData to load

  • map_location (str, default="cpu") – Device to map tensors to

Returns:

The loaded NapistuData object

Return type:

NapistuData

Raises:
  • KeyError – If name not found in registry

  • FileNotFoundError – If the .pt file doesn’t exist

load_napistu_graph() NapistuGraph

Load the NapistuGraph from disk.

load_pandas_df(name: str) DataFrame

Load a pandas DataFrame from the store.

Parameters:

name (str) – Name of the pandas DataFrame to load

Returns:

The loaded pandas DataFrame

Return type:

pd.DataFrame

Raises:
  • KeyError – If name not found in registry

  • FileNotFoundError – If the .parquet file doesn’t exist

load_sbml_dfs() SBML_dfs

Load the SBML_dfs from disk.

load_vertex_tensor(name: str, map_location: str = 'cpu') VertexTensor

Load a VertexTensor from the store.

Parameters:
  • name (str) – Name of the VertexTensor to load

  • map_location (str, default=DEVICE.CPU) – Device to map tensors to

Returns:

The loaded VertexTensor object

Return type:

VertexTensor

Raises:
  • KeyError – If name not found in registry

  • FileNotFoundError – If the .pt file doesn’t exist

publish_store_to_huggingface(repo_id: str, revision: str | None = None, overwrite: bool = False, commit_message: str | None = None, token: str | None = None, asset_name: str | None = None, asset_version: str | None = None, tag: str | None = None, tag_message: str | None = None) str

Publish this entire store to HuggingFace Hub.

This is a convenience method that uses HFDatasetPublisher to publish all artifacts from this store to HuggingFace Hub as a dataset repository. The published store will be read-only (sbml_dfs_path and napistu_graph_path set to None).

Parameters:
  • repo_id (str) – Repository ID in format “username/repo-name”

  • revision (Optional[str]) – Git revision (branch, tag, or commit hash). Defaults to “main”

  • overwrite (bool) – Explicitly confirm overwriting existing dataset (default: False)

  • commit_message (Optional[str]) – Custom commit message (default: auto-generated)

  • token (Optional[str]) – HuggingFace API token (default: uses huggingface-cli login token)

  • asset_name (Optional[str]) – Name of the GCS asset used to create the store (for documentation)

  • asset_version (Optional[str]) – Version of the GCS asset used to create the store (for documentation)

  • tag (Optional[str]) – Tag name to create after all assets are uploaded (e.g., “v1.0”)

  • tag_message (Optional[str]) – Optional message for the tag

Returns:

URL to the published dataset on HuggingFace Hub

Return type:

str

Examples

>>> store = NapistuDataStore("path/to/store")
>>> url = store.publish_store_to_huggingface(
...     repo_id="username/my-dataset",
...     asset_name="human_consensus",
...     asset_version="v1.0"
... )
save_napistu_data(napistu_data: NapistuData, name: str | None = None, overwrite: bool = False) None

Save a NapistuData object to the store.

Parameters:
  • napistu_data (NapistuData) – The NapistuData object to save. The method will extract the splitting_strategy and labeling_manager from the object’s attributes.

  • name (str, optional) – Name to use for the registry entry and filename. If not provided, uses the napistu_data.name attribute.

  • overwrite (bool, default=False) – If True, overwrite existing entry with same name If False, raise FileExistsError if name already exists

Raises:
  • FileExistsError – If name already exists in registry and overwrite=False

  • ValueError – If the splitting_strategy from the NapistuData object is invalid

save_pandas_df(df: DataFrame, name: str, overwrite: bool = False) None

Save a pandas DataFrame to the store.

Parameters:
  • dataframe (pd.DataFrame) – The pandas DataFrame to save

  • name (str) – Name for storage (registry key and filename stem).

  • overwrite (bool, default=False) – If True, overwrite existing entry with same name If False, raise FileExistsError if name already exists

Raises:

FileExistsError – If name already exists in registry and overwrite=False

save_vertex_tensor(vertex_tensor: VertexTensor, name: str | None = None, overwrite: bool = False) None

Save a VertexTensor to the store.

Parameters:
  • vertex_tensor (VertexTensor) – The VertexTensor object to save

  • name (str, optional) – Name for storage (registry key and filename stem). If not provided, uses the vertex_tensor.name attribute.

  • overwrite (bool, default=False) – If True, overwrite existing entry with same name If False, raise FileExistsError if name already exists

Raises:

FileExistsError – If name already exists in registry and overwrite=False

summary() dict

Get a summary of the store contents.

Returns:

Dictionary containing summary information about the store

Return type:

dict

validate() None

Validate the store contents.

validate_artifact_name(name: str, artifact_registry: Dict[str, ~napistu_torch.load.artifacts.ArtifactDefinition]={'comprehensive_pathway_memberships': ArtifactDefinition(name='comprehensive_pathway_memberships', artifact_type='vertex_tensor', creation_func=<function _create_comprehensive_pathway_memberships>, description='VertexTensor containing comprehensive pathway membership features'), 'edge_prediction': ArtifactDefinition(name = 'edge_prediction', artifact_type='napistu_data', creation_func=<function _create_edge_prediction_data>, description='Unlabeled NapistuData with train/test/val edge masking'), 'edge_strata_by_edge_sbo_terms': ArtifactDefinition(name = 'edge_strata_by_edge_sbo_terms', artifact_type='pandas_dfs', creation_func=<function _create_edge_strata_by_edge_sbo_terms>, description='Pandas DataFrame containing edge strata by from-to edge SBO terms'), 'edge_strata_by_node_species_type': ArtifactDefinition(name = 'edge_strata_by_node_species_type', artifact_type='pandas_dfs', creation_func=<function _create_edge_strata_by_node_species_type>, description='Pandas DataFrame containing edge strata by node + species type'), 'edge_strata_by_node_type': ArtifactDefinition(name = 'edge_strata_by_node_type', artifact_type='pandas_dfs', creation_func=<function _create_edge_strata_by_node_type>, description='Pandas DataFrame containing edge strata by node type'), 'name_to_sid_map': ArtifactDefinition(name = 'name_to_sid_map', artifact_type='pandas_dfs', creation_func=<function _create_name_to_sid_map>, description='Pandas DataFrame containing a map of vertex names to species ids'), 'relation_prediction': ArtifactDefinition(name = 'relation_prediction', artifact_type='napistu_data', creation_func=<function _create_relation_prediction_data>, description='Unlabeled NapistuData with train/test/val with edge masking and realtion-type labels'), 'species_identifiers': ArtifactDefinition(name = 'species_identifiers', artifact_type='pandas_dfs', creation_func=<function _create_species_identifiers>, description='Pandas DataFrame containing species identifiers'), 'species_type_prediction': ArtifactDefinition(name = 'species_type_prediction', artifact_type='napistu_data', creation_func=<function _create_species_type_prediction_data>, description='NapistuData containing species type labels with train/test/val vertex masking'), 'unlabeled': ArtifactDefinition(name = 'unlabeled', artifact_type='napistu_data', creation_func=<function _create_unlabeled_data>, description='Unlabeled NapistuData without masking')}, required_type: str | None = None) None

Validate an artifact name by ensuring that it is either already in the store or available from the registry

Parameters:
  • name (str) – Name of artifact to validate

  • artifact_registry (Dict[str, ArtifactDefinition], default=DEFAULT_ARTIFACT_REGISTRY) – Registry of artifact definitions

  • required_type (Optional[str], default=None) – Type of artifact that is required. If not provided, any type is allowed.

Raises:

KeyError – If artifact is not in store and not in registry

napistu_torch.napistu_data_store._resolve_path(path_str: str, store_dir: Path) Path

Resolve a path string to a normalized absolute Path.

If the path starts with ‘/’, it’s treated as an absolute path. Otherwise, it’s treated as relative to store_dir. All paths are normalized to resolve .. components and symbolic links.

Parameters:
  • path_str (str) – Path string from registry (either absolute or relative)

  • store_dir (Path) – Store directory to resolve relative paths against

Returns:

Resolved and normalized absolute path

Return type:

Path

napistu_torch.napistu_data_store._validate_create_inputs(registry_path: Path, sbml_dfs_path: Path, napistu_graph_path: Path) None

Validate inputs for creating a new NapistuDataStore.

Parameters:
  • registry_path (Path) – Path where the registry file should be created

  • sbml_dfs_path (Union[str, Path]) – Path to the SBML_dfs pickle file

  • napistu_graph_path (Union[str, Path]) – Path to the NapistuGraph pickle file

Raises:
  • FileExistsError – If a registry already exists at registry_path

  • FileNotFoundError – If the specified napistu files don’t exist