napistu_torch.napistu_data_store
NapistuDataStore - Management system for Napistu data objects.
This module provides a store system for managing NapistuData, VertexTensor, and pandas DataFrame objects related to a single SBML_dfs/NapistuGraph pair.
Classes
- NapistuDataStore
Manage data objects related to a single SBML_dfs/NapistuGraph pair.
Classes
|
Manage data objects related to a single SBML_dfs/NapistuGraph pair. |
- class napistu_torch.napistu_data_store.NapistuDataStore(store_dir: str | Path)
Bases:
objectManage data objects related to a single SBML_dfs/NapistuGraph pair.
Directory structure: store_dir/ ├── registry.json # Registry of all objects in this store ├── napistu_raw/ # (optional raw directory) │ ├── sbml_dfs.pkl # (optional copy) │ └── napistu_graph.pkl # (optional copy) ├── napistu_data/ # organizes NapistuData objects | └── (NapistuData .pt files) ├── vertex_tensors/ # organizes VertexTensor objects │ └── (VertexTensor .pt files) └── pandas_dfs/ # organizes pandas DataFrames
└── (DataFrame .parquet files)
Each store manages objects for a single biological network.
Public Methods
- create(store_dir, sbml_dfs_path, napistu_graph_path, copy_to_store=False, overwrite=False)
Create a new NapistuDataStore
- enable_artifact_creation(sbml_dfs_path, napistu_graph_path, copy_to_store=False)
Convert a read-only store to non-read-only by enabling artifact creation.
- ensure_artifacts(artifact_names, artifact_registry=DEFAULT_ARTIFACT_REGISTRY, overwrite=False)
Ensure specified artifacts exist in the store, creating if missing.
- get_missing_artifacts(artifact_names)
Check which artifacts are missing from the store.
- from_config(config)
Create or load a NapistuDataStore from a DataConfig.
- from_huggingface(repo_id, store_dir, revision=None, token=None, sbml_dfs_path=None, napistu_graph_path=None, copy_to_store=False)
Download and create a NapistuDataStore from HuggingFace Hub. Optionally convert from read-only to non-read-only by providing paths.
- list_napistu_datas()
List all NapistuData names in the store
- list_vertex_tensors()
List all VertexTensor names in the store
- list_pandas_dfs()
List all pandas DataFrame names in the store
- load_sbml_dfs()
Load the SBML_dfs from disk
- load_napistu_data(name, map_location=”cpu”)
Load a NapistuData object from the store
- load_napistu_graph()
Load the NapistuGraph from disk
- load_vertex_tensor(name, map_location=”cpu”)
Load a VertexTensor from the store
- load_pandas_df(name)
Load a pandas DataFrame from the store
- publish_store_to_huggingface(repo_id, revision=None, overwrite=False, commit_message=None, token=None)
Publish entire store to HuggingFace Hub as a read-only dataset
- save_napistu_data(napistu_data, name=None, overwrite=False)
Save a NapistuData object to the store
- save_vertex_tensor(vertex_tensor, name=None, overwrite=False)
Save a VertexTensor to the store
- save_pandas_df(dataframe, name=None, overwrite=False)
Save a pandas DataFrame to the store
- summary()
Get a summary of the store contents
Private Methods
- _load_registry()
Load the registry from disk
- _save_registry()
Save the registry to disk
- classmethod create(store_dir: str | Path, read_only: bool = False, sbml_dfs_path: str | Path | None = None, napistu_graph_path: str | Path | None = None, copy_to_store: bool = False, overwrite: bool = False) NapistuDataStore
Create a new NapistuDataStore.
- Parameters:
store_dir (Union[str, Path]) – Root directory for this store
read_only (bool, default=False) – If True, the store will be read-only which means that no new artifacts can be created. If True, sbml_dfs_path and napistu_graph_path will be ignored.
sbml_dfs_path (Optional[Union[str, Path]], default=None) – Path to the SBML_dfs pickle file. Ignored if read_only is True.
napistu_graph_path (Optional[Union[str, Path]], default=None) – Path to the NapistuGraph pickle file. Ignored if read_only is True.
copy_to_store (bool, default=False) – If True, copy the files into the store directory and store relative paths. If False, store absolute paths to the original files. Ignored if read_only is True.
overwrite (bool, default=False) – If True, remove existing store_dir if it exists before creating new store. If False, raise FileExistsError if store_dir already exists.
- Returns:
The newly created store
- Return type:
- Raises:
FileExistsError – If a registry.json already exists at store_dir and overwrite=False
FileNotFoundError – If the specified napistu files don’t exist (only when read_only=False)
Examples
>>> # Create a new store with external paths >>> store = NapistuDataStore.create( ... store_dir='./stores/ecoli', ... sbml_dfs_path='/data/ecoli_sbml_dfs.pkl', ... napistu_graph_path='/data/ecoli_ng.pkl', ... copy_to_store=False ... )
- classmethod from_config(config: DataConfig, task_config: TaskConfig | None = None, ensure_artifacts: bool = True) NapistuDataStore
Create or load a NapistuDataStore from a DataConfig.
Flow: 1. If store exists: load existing store 2. If store doesn’t exist:
If config.hf_repo_id is provided: load store from HuggingFace Hub * If sbml_dfs_path and napistu_graph_path are provided: convert to non-read-only
Otherwise: create new store locally (requires sbml_dfs_path and napistu_graph_path) * Creates regular store with provided paths * Copies to store if config.copy_to_store is True
Ensure napistu_data_name, other_artifacts, and task artifacts exist (always, regardless of store creation)
- Parameters:
config (DataConfig) – Configuration with store location, artifact paths, and requirements. If hf_repo_id is provided and store doesn’t exist, the store will be loaded from HuggingFace Hub. Otherwise, both sbml_dfs_path and napistu_graph_path are required to create a new store.
task_config (Optional[TaskConfig], default=None) – Optional task configuration. If provided, artifacts required by the task will be added to the required artifacts list.
ensure_artifacts (bool, default=True) – Whether to ensure that napistu_data_name, other_artifacts, and task artifacts exist. Set to False when side-loading napistu_data directly (e.g., during testing).
- Returns:
Ready-to-use store with all required artifacts (if ensure_artifacts=True)
- Return type:
- Raises:
ValueError – If store doesn’t exist and neither hf_repo_id nor both paths are provided If hf_repo_id is provided but only one of sbml_dfs_path or napistu_graph_path is provided
FileNotFoundError – If sbml_dfs_path or napistu_graph_path don’t exist when creating new store
Examples
>>> from napistu_torch.configs import DataConfig >>> from pathlib import Path >>> >>> # Create a regular store with paths >>> config = DataConfig( ... store_dir=Path(".store"), ... sbml_dfs_path=Path("/path/to/sbml_dfs.pkl"), ... napistu_graph_path=Path("/path/to/napistu_graph.pkl"), ... copy_to_store=True, ... napistu_data_name="edge_prediction", ... other_artifacts=["unlabeled"] ... ) >>> store = NapistuDataStore.from_config(config) >>> >>> # Load store from HuggingFace Hub >>> hf_config = DataConfig( ... store_dir=Path(".store"), ... hf_repo_id="username/my-dataset", ... hf_revision="v1.0", ... napistu_data_name="edge_prediction" ... ) >>> hf_store = NapistuDataStore.from_config(hf_config) >>> >>> # Load from HuggingFace and convert to non-read-only >>> hf_config_with_paths = DataConfig( ... store_dir=Path(".store"), ... hf_repo_id="username/my-dataset", ... sbml_dfs_path=Path("/path/to/sbml_dfs.pkl"), ... napistu_graph_path=Path("/path/to/napistu_graph.pkl"), ... napistu_data_name="edge_prediction" ... ) >>> hf_store = NapistuDataStore.from_config(hf_config_with_paths)
- classmethod from_huggingface(repo_id: str, store_dir: str | Path, revision: str | None = None, token: str | None = None, sbml_dfs_path: str | Path | None = None, napistu_graph_path: str | Path | None = None, copy_to_store: bool = False) NapistuDataStore
Download and create a NapistuDataStore from HuggingFace Hub.
This is a convenience method that uses HFDatasetLoader to download a store from HuggingFace Hub and create a local NapistuDataStore instance.
If sbml_dfs_path and napistu_graph_path are provided, the store will be converted from read-only to non-read-only, allowing artifact creation.
- Parameters:
repo_id (str) – HuggingFace repository in format “username/repo-name”
store_dir (Union[str, Path]) – Local directory where the store will be created
revision (Optional[str]) – Git revision (branch, tag, or commit hash). Defaults to “main”
token (Optional[str]) – HuggingFace access token for private repositories
sbml_dfs_path (Optional[Union[str, Path]]) – Path to SBML_dfs pickle file. If provided along with napistu_graph_path, converts the store from read-only to non-read-only.
napistu_graph_path (Optional[Union[str, Path]]) – Path to NapistuGraph pickle file. If provided along with sbml_dfs_path, converts the store from read-only to non-read-only.
copy_to_store (bool, default=False) – If True, copy sbml_dfs and napistu_graph into the store directory and store relative paths. If False, store absolute paths to the original files. Only applies when sbml_dfs_path and napistu_graph_path are provided.
- Returns:
Loaded store ready to use
- Return type:
- Raises:
ValueError – If only one of sbml_dfs_path or napistu_graph_path is provided
FileNotFoundError – If the provided paths don’t exist
Examples
>>> from napistu_torch.napistu_data_store import NapistuDataStore >>> from pathlib import Path >>> >>> # Load read-only store from HuggingFace Hub >>> store = NapistuDataStore.from_huggingface( ... repo_id="username/my-dataset", ... store_dir=Path("./local_store") ... ) >>> >>> # Load and convert to non-read-only store >>> store = NapistuDataStore.from_huggingface( ... repo_id="username/my-dataset", ... store_dir=Path("./local_store"), ... sbml_dfs_path=Path("/data/sbml_dfs.pkl"), ... napistu_graph_path=Path("/data/graph.pkl") ... ) >>> >>> # Load and copy raw data into store for portability >>> store = NapistuDataStore.from_huggingface( ... repo_id="username/my-dataset", ... store_dir=Path("./local_store"), ... sbml_dfs_path=Path("./.napistu/sbml_dfs.pkl"), ... napistu_graph_path=Path("./.napistu/graph.pkl"), ... copy_to_store=True ... ) >>> >>> # Use the store >>> napistu_data = store.load_napistu_data("edge_prediction")
- __init__(store_dir: str | Path)
Initialize the NapistuDataStore from an existing registry.
- Parameters:
store_dir (Union[str, Path]) – Root directory for this store. Must contain a registry.json file.
- Raises:
FileNotFoundError – If the registry.json file does not exist
Examples
>>> # Load an existing store >>> store = NapistuDataStore('.store')
- _load_registry() dict
Load the registry from disk.
- _save_registry() None
Save the registry to disk.
- _validate_no_duplicate_names(raise_error: bool = True) None
Check for duplicate names across artifact types and warn if found.
- enable_artifact_creation(sbml_dfs_path: str | Path, napistu_graph_path: str | Path, copy_to_store: bool = False) None
Convert a read-only store to non-read-only by enabling artifact creation.
Updates the registry to set READ_ONLY to False and store paths to SBML_dfs and NapistuGraph files, allowing the store to create new artifacts.
- Parameters:
sbml_dfs_path (Union[str, Path]) – Path to SBML_dfs pickle file
napistu_graph_path (Union[str, Path]) – Path to NapistuGraph pickle file
copy_to_store (bool, default=False) – If True, copy the files into the store directory and store relative paths. If False, store absolute paths to the original files.
- Raises:
ValueError – If store is already non-read-only
FileNotFoundError – If the provided paths don’t exist
- ensure_artifacts(artifact_names: ~typing.List[str], artifact_registry: ~typing.Dict[str, ~napistu_torch.load.artifacts.ArtifactDefinition] = {'comprehensive_pathway_memberships': ArtifactDefinition(name='comprehensive_pathway_memberships', artifact_type='vertex_tensor', creation_func=<function _create_comprehensive_pathway_memberships>, description='VertexTensor containing comprehensive pathway membership features'), 'edge_prediction': ArtifactDefinition(name='edge_prediction', artifact_type='napistu_data', creation_func=<function _create_edge_prediction_data>, description='Unlabeled NapistuData with train/test/val edge masking'), 'edge_strata_by_edge_sbo_terms': ArtifactDefinition(name='edge_strata_by_edge_sbo_terms', artifact_type='pandas_dfs', creation_func=<function _create_edge_strata_by_edge_sbo_terms>, description='Pandas DataFrame containing edge strata by from-to edge SBO terms'), 'edge_strata_by_node_species_type': ArtifactDefinition(name='edge_strata_by_node_species_type', artifact_type='pandas_dfs', creation_func=<function _create_edge_strata_by_node_species_type>, description='Pandas DataFrame containing edge strata by node + species type'), 'edge_strata_by_node_type': ArtifactDefinition(name='edge_strata_by_node_type', artifact_type='pandas_dfs', creation_func=<function _create_edge_strata_by_node_type>, description='Pandas DataFrame containing edge strata by node type'), 'name_to_sid_map': ArtifactDefinition(name='name_to_sid_map', artifact_type='pandas_dfs', creation_func=<function _create_name_to_sid_map>, description='Pandas DataFrame containing a map of vertex names to species ids'), 'relation_prediction': ArtifactDefinition(name='relation_prediction', artifact_type='napistu_data', creation_func=<function _create_relation_prediction_data>, description='Unlabeled NapistuData with train/test/val with edge masking and realtion-type labels'), 'species_identifiers': ArtifactDefinition(name='species_identifiers', artifact_type='pandas_dfs', creation_func=<function _create_species_identifiers>, description='Pandas DataFrame containing species identifiers'), 'species_type_prediction': ArtifactDefinition(name='species_type_prediction', artifact_type='napistu_data', creation_func=<function _create_species_type_prediction_data>, description='NapistuData containing species type labels with train/test/val vertex masking'), 'unlabeled': ArtifactDefinition(name='unlabeled', artifact_type='napistu_data', creation_func=<function _create_unlabeled_data>, description='Unlabeled NapistuData without masking')}, overwrite: bool = False) None
Ensure specified artifacts exist in the store, creating if missing.
This is the key method for efficient batch artifact creation. It loads raw data (sbml_dfs, ng) ONCE and creates all missing artifacts.
Only checks registry for artifacts not already in store. This allows custom artifacts to exist in store without being in registry.
- Parameters:
artifact_names (List[str]) – Names of artifacts to ensure exist
artifact_registry (Dict[str, ArtifactDefinition], default=DEFAULT_ARTIFACT_REGISTRY) – Registry of artifact definitions
overwrite (bool, default=False) – If True, recreate artifacts even if they exist
- Raises:
KeyError – If artifact not in store and not in registry
Examples
>>> # Works with registry artifacts >>> store.ensure_artifacts(["unlabeled", "edge_prediction"]) >>> >>> # Also works with custom artifacts already in store >>> custom_data = construct_custom_pyg_data(...) >>> store.save_napistu_data(custom_data, name="my_custom_artifact") >>> store.ensure_artifacts(["my_custom_artifact"]) # Just verifies existence
- get_missing_artifacts(artifact_names: ~typing.List[str], artifact_registry: ~typing.Dict[str, ~napistu_torch.load.artifacts.ArtifactDefinition] = {'comprehensive_pathway_memberships': ArtifactDefinition(name='comprehensive_pathway_memberships', artifact_type='vertex_tensor', creation_func=<function _create_comprehensive_pathway_memberships>, description='VertexTensor containing comprehensive pathway membership features'), 'edge_prediction': ArtifactDefinition(name='edge_prediction', artifact_type='napistu_data', creation_func=<function _create_edge_prediction_data>, description='Unlabeled NapistuData with train/test/val edge masking'), 'edge_strata_by_edge_sbo_terms': ArtifactDefinition(name='edge_strata_by_edge_sbo_terms', artifact_type='pandas_dfs', creation_func=<function _create_edge_strata_by_edge_sbo_terms>, description='Pandas DataFrame containing edge strata by from-to edge SBO terms'), 'edge_strata_by_node_species_type': ArtifactDefinition(name='edge_strata_by_node_species_type', artifact_type='pandas_dfs', creation_func=<function _create_edge_strata_by_node_species_type>, description='Pandas DataFrame containing edge strata by node + species type'), 'edge_strata_by_node_type': ArtifactDefinition(name='edge_strata_by_node_type', artifact_type='pandas_dfs', creation_func=<function _create_edge_strata_by_node_type>, description='Pandas DataFrame containing edge strata by node type'), 'name_to_sid_map': ArtifactDefinition(name='name_to_sid_map', artifact_type='pandas_dfs', creation_func=<function _create_name_to_sid_map>, description='Pandas DataFrame containing a map of vertex names to species ids'), 'relation_prediction': ArtifactDefinition(name='relation_prediction', artifact_type='napistu_data', creation_func=<function _create_relation_prediction_data>, description='Unlabeled NapistuData with train/test/val with edge masking and realtion-type labels'), 'species_identifiers': ArtifactDefinition(name='species_identifiers', artifact_type='pandas_dfs', creation_func=<function _create_species_identifiers>, description='Pandas DataFrame containing species identifiers'), 'species_type_prediction': ArtifactDefinition(name='species_type_prediction', artifact_type='napistu_data', creation_func=<function _create_species_type_prediction_data>, description='NapistuData containing species type labels with train/test/val vertex masking'), 'unlabeled': ArtifactDefinition(name='unlabeled', artifact_type='napistu_data', creation_func=<function _create_unlabeled_data>, description='Unlabeled NapistuData without masking')}) List[str]
Check which artifacts are missing from the store.
- Parameters:
artifact_names (List[str]) – Names of artifacts to check
artifact_registry (Dict[str, ArtifactDefinition], default=DEFAULT_ARTIFACT_REGISTRY) – Registry of artifact definitions
- Returns:
Names of artifacts that don’t exist in store
- Return type:
List[str]
- Raises:
KeyError – If any artifact name is not in the registry
Examples
>>> missing = store.get_missing_artifacts([ ... "unlabeled", ... "edge_prediction", ... "custom_artifact" ... ]) >>> print(missing) ['edge_prediction']
- list_artifacts(artifact_type: str | None = None) list[str]
List all artifact names in the store.
- Parameters:
artifact_type (Optional[str], default=None) – Type of artifact to list. If not provided, all artifact types will be listed.
- Returns:
List of artifact names in the store
- Return type:
list[str]
- list_napistu_datas() list[str]
List all NapistuData names in the store.
- Returns:
list[str]
List of NapistuData names in the store
- list_pandas_dfs() list[str]
List all pandas DataFrame names in the store.
- Returns:
List of pandas DataFrame names in the store
- Return type:
list[str]
- list_vertex_tensors() list[str]
List all VertexTensor names in the store.
- Returns:
List of VertexTensor names in the store
- Return type:
list[str]
- load_artifact(name: str, artifact_type: str) NapistuData | VertexTensor | DataFrame
Load an artifact from the store.
- Parameters:
name (str) – Name of the artifact to load
artifact_type (str) – Type of the artifact to load
- Returns:
The loaded artifact
- Return type:
Union[NapistuData, VertexTensor, pd.DataFrame]
- Raises:
ValueError – If invalid artifact type
- load_napistu_data(name: str, map_location: str = 'cpu') NapistuData
Load a NapistuData object from the store.
- Parameters:
name (str) – Name of the NapistuData to load
map_location (str, default="cpu") – Device to map tensors to
- Returns:
The loaded NapistuData object
- Return type:
- Raises:
KeyError – If name not found in registry
FileNotFoundError – If the .pt file doesn’t exist
- load_napistu_graph() NapistuGraph
Load the NapistuGraph from disk.
- load_pandas_df(name: str) DataFrame
Load a pandas DataFrame from the store.
- Parameters:
name (str) – Name of the pandas DataFrame to load
- Returns:
The loaded pandas DataFrame
- Return type:
pd.DataFrame
- Raises:
KeyError – If name not found in registry
FileNotFoundError – If the .parquet file doesn’t exist
- load_sbml_dfs() SBML_dfs
Load the SBML_dfs from disk.
- load_vertex_tensor(name: str, map_location: str = 'cpu') VertexTensor
Load a VertexTensor from the store.
- Parameters:
name (str) – Name of the VertexTensor to load
map_location (str, default=DEVICE.CPU) – Device to map tensors to
- Returns:
The loaded VertexTensor object
- Return type:
- Raises:
KeyError – If name not found in registry
FileNotFoundError – If the .pt file doesn’t exist
- publish_store_to_huggingface(repo_id: str, revision: str | None = None, overwrite: bool = False, commit_message: str | None = None, token: str | None = None, asset_name: str | None = None, asset_version: str | None = None, tag: str | None = None, tag_message: str | None = None) str
Publish this entire store to HuggingFace Hub.
This is a convenience method that uses HFDatasetPublisher to publish all artifacts from this store to HuggingFace Hub as a dataset repository. The published store will be read-only (sbml_dfs_path and napistu_graph_path set to None).
- Parameters:
repo_id (str) – Repository ID in format “username/repo-name”
revision (Optional[str]) – Git revision (branch, tag, or commit hash). Defaults to “main”
overwrite (bool) – Explicitly confirm overwriting existing dataset (default: False)
commit_message (Optional[str]) – Custom commit message (default: auto-generated)
token (Optional[str]) – HuggingFace API token (default: uses huggingface-cli login token)
asset_name (Optional[str]) – Name of the GCS asset used to create the store (for documentation)
asset_version (Optional[str]) – Version of the GCS asset used to create the store (for documentation)
tag (Optional[str]) – Tag name to create after all assets are uploaded (e.g., “v1.0”)
tag_message (Optional[str]) – Optional message for the tag
- Returns:
URL to the published dataset on HuggingFace Hub
- Return type:
str
Examples
>>> store = NapistuDataStore("path/to/store") >>> url = store.publish_store_to_huggingface( ... repo_id="username/my-dataset", ... asset_name="human_consensus", ... asset_version="v1.0" ... )
- save_napistu_data(napistu_data: NapistuData, name: str | None = None, overwrite: bool = False) None
Save a NapistuData object to the store.
- Parameters:
napistu_data (NapistuData) – The NapistuData object to save. The method will extract the splitting_strategy and labeling_manager from the object’s attributes.
name (str, optional) – Name to use for the registry entry and filename. If not provided, uses the napistu_data.name attribute.
overwrite (bool, default=False) – If True, overwrite existing entry with same name If False, raise FileExistsError if name already exists
- Raises:
FileExistsError – If name already exists in registry and overwrite=False
ValueError – If the splitting_strategy from the NapistuData object is invalid
- save_pandas_df(df: DataFrame, name: str, overwrite: bool = False) None
Save a pandas DataFrame to the store.
- Parameters:
dataframe (pd.DataFrame) – The pandas DataFrame to save
name (str) – Name for storage (registry key and filename stem).
overwrite (bool, default=False) – If True, overwrite existing entry with same name If False, raise FileExistsError if name already exists
- Raises:
FileExistsError – If name already exists in registry and overwrite=False
- save_vertex_tensor(vertex_tensor: VertexTensor, name: str | None = None, overwrite: bool = False) None
Save a VertexTensor to the store.
- Parameters:
vertex_tensor (VertexTensor) – The VertexTensor object to save
name (str, optional) – Name for storage (registry key and filename stem). If not provided, uses the vertex_tensor.name attribute.
overwrite (bool, default=False) – If True, overwrite existing entry with same name If False, raise FileExistsError if name already exists
- Raises:
FileExistsError – If name already exists in registry and overwrite=False
- summary() dict
Get a summary of the store contents.
- Returns:
Dictionary containing summary information about the store
- Return type:
dict
- validate() None
Validate the store contents.
- validate_artifact_name(name: str, artifact_registry: Dict[str, ~napistu_torch.load.artifacts.ArtifactDefinition]={'comprehensive_pathway_memberships': ArtifactDefinition(name='comprehensive_pathway_memberships', artifact_type='vertex_tensor', creation_func=<function _create_comprehensive_pathway_memberships>, description='VertexTensor containing comprehensive pathway membership features'), 'edge_prediction': ArtifactDefinition(name = 'edge_prediction', artifact_type='napistu_data', creation_func=<function _create_edge_prediction_data>, description='Unlabeled NapistuData with train/test/val edge masking'), 'edge_strata_by_edge_sbo_terms': ArtifactDefinition(name = 'edge_strata_by_edge_sbo_terms', artifact_type='pandas_dfs', creation_func=<function _create_edge_strata_by_edge_sbo_terms>, description='Pandas DataFrame containing edge strata by from-to edge SBO terms'), 'edge_strata_by_node_species_type': ArtifactDefinition(name = 'edge_strata_by_node_species_type', artifact_type='pandas_dfs', creation_func=<function _create_edge_strata_by_node_species_type>, description='Pandas DataFrame containing edge strata by node + species type'), 'edge_strata_by_node_type': ArtifactDefinition(name = 'edge_strata_by_node_type', artifact_type='pandas_dfs', creation_func=<function _create_edge_strata_by_node_type>, description='Pandas DataFrame containing edge strata by node type'), 'name_to_sid_map': ArtifactDefinition(name = 'name_to_sid_map', artifact_type='pandas_dfs', creation_func=<function _create_name_to_sid_map>, description='Pandas DataFrame containing a map of vertex names to species ids'), 'relation_prediction': ArtifactDefinition(name = 'relation_prediction', artifact_type='napistu_data', creation_func=<function _create_relation_prediction_data>, description='Unlabeled NapistuData with train/test/val with edge masking and realtion-type labels'), 'species_identifiers': ArtifactDefinition(name = 'species_identifiers', artifact_type='pandas_dfs', creation_func=<function _create_species_identifiers>, description='Pandas DataFrame containing species identifiers'), 'species_type_prediction': ArtifactDefinition(name = 'species_type_prediction', artifact_type='napistu_data', creation_func=<function _create_species_type_prediction_data>, description='NapistuData containing species type labels with train/test/val vertex masking'), 'unlabeled': ArtifactDefinition(name = 'unlabeled', artifact_type='napistu_data', creation_func=<function _create_unlabeled_data>, description='Unlabeled NapistuData without masking')}, required_type: str | None = None) None
Validate an artifact name by ensuring that it is either already in the store or available from the registry
- Parameters:
name (str) – Name of artifact to validate
artifact_registry (Dict[str, ArtifactDefinition], default=DEFAULT_ARTIFACT_REGISTRY) – Registry of artifact definitions
required_type (Optional[str], default=None) – Type of artifact that is required. If not provided, any type is allowed.
- Raises:
KeyError – If artifact is not in store and not in registry
- napistu_torch.napistu_data_store._resolve_path(path_str: str, store_dir: Path) Path
Resolve a path string to a normalized absolute Path.
If the path starts with ‘/’, it’s treated as an absolute path. Otherwise, it’s treated as relative to store_dir. All paths are normalized to resolve .. components and symbolic links.
- Parameters:
path_str (str) – Path string from registry (either absolute or relative)
store_dir (Path) – Store directory to resolve relative paths against
- Returns:
Resolved and normalized absolute path
- Return type:
Path
- napistu_torch.napistu_data_store._validate_create_inputs(registry_path: Path, sbml_dfs_path: Path, napistu_graph_path: Path) None
Validate inputs for creating a new NapistuDataStore.
- Parameters:
registry_path (Path) – Path where the registry file should be created
sbml_dfs_path (Union[str, Path]) – Path to the SBML_dfs pickle file
napistu_graph_path (Union[str, Path]) – Path to the NapistuGraph pickle file
- Raises:
FileExistsError – If a registry already exists at registry_path
FileNotFoundError – If the specified napistu files don’t exist