napistu_torch.load.artifacts
Artifact registry for predefined NapistuData and VertexTensor objects.
This module defines all standard artifacts that can be created from SBML_dfs and NapistuGraph objects. Each artifact has a creation function that encapsulates the ETL logic.
Classes
- ArtifactDefinition
Definition of an artifact that can be created from SBML_dfs and NapistuGraph.
Public Functions
- create_artifact(name, sbml_dfs, napistu_graph, artifact_registry=None)
Create an artifact by name using the registry.
- ensure_stratify_by_artifact_name(stratify_by)
Ensure the stratify_by value is an artifact name.
- get_artifact_info(name, artifact_registry=None)
Get information about an artifact.
- list_available_artifacts(artifact_registry=None)
List all available artifact names in the registry.
- validate_artifact_registry(artifact_registry)
Validate the artifact registry.
To add a new artifact: 1. Create a creation function (e.g., create_my_artifact) 2. Add an ArtifactDefinition to _ARTIFACTS list 3. The registry will be built automatically
Functions
|
Create an artifact by name using the registry. |
|
Ensure the stratify_by value is an artifact name. |
|
Get information about an artifact. |
|
List all available artifact names in the registry. |
|
Validate the artifact registry. |
Classes
|
Metadata for a predefined artifact. |
- class napistu_torch.load.artifacts.ArtifactDefinition(*, name: str, artifact_type: str, creation_func: Callable, description: str = '')
Bases:
BaseModelMetadata for a predefined artifact.
- classmethod validate_artifact_type(v)
Validate artifact type.
- classmethod validate_name(v)
Validate artifact name format.
- _abc_impl = <_abc._abc_data object>
- artifact_type: str
- creation_func: Callable
- description: str
- model_config = {'arbitrary_types_allowed': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- name: str
- napistu_torch.load.artifacts._create_comprehensive_pathway_memberships(sbml_dfs: SBML_dfs, napistu_graph: NapistuGraph) VertexTensor
Create comprehensive source membership tensor.
- Parameters:
sbml_dfs (SBML_dfs) – SBML data structure
napistu_graph (NapistuGraph) – Napistu graph
- Returns:
Comprehensive pathway membership features for all vertices
- Return type:
- napistu_torch.load.artifacts._create_edge_prediction_data(sbml_dfs: SBML_dfs, napistu_graph: NapistuGraph) NapistuData
Create edge prediction data with edge masking.
- Parameters:
sbml_dfs (SBML_dfs) – SBML data structure
napistu_graph (NapistuGraph) – Napistu graph
- Returns:
Edge prediction data with train/val/test edge masks
- Return type:
- napistu_torch.load.artifacts._create_edge_strata_by_edge_sbo_terms(napistu_graph: NapistuGraph, min_relation_count: int = 1000) DataFrame
Create edge strata by edge SBO terms.
- Parameters:
napistu_graph (NapistuGraph) – Napistu graph
min_relation_count (int) – Minimum number of edges required for a category to be kept separate. Categories with fewer edges will be merged into “other relation”.
- Returns:
Edge strata DataFrame
- Return type:
pd.DataFrame
- napistu_torch.load.artifacts._create_edge_strata_by_node_species_type(napistu_graph: NapistuGraph) DataFrame
Create edge strata.
- Parameters:
napistu_graph (NapistuGraph) – Napistu graph
- Returns:
Edge strata
- Return type:
pd.DataFrame
- napistu_torch.load.artifacts._create_edge_strata_by_node_type(napistu_graph: NapistuGraph) DataFrame
Create edge strata.
- Parameters:
napistu_graph (NapistuGraph) – Napistu graph
- Returns:
Edge strata
- Return type:
pd.DataFrame
- napistu_torch.load.artifacts._create_name_to_sid_map(napistu_graph: NapistuGraph) DataFrame
Create a map of vertex names to species ids.
- napistu_torch.load.artifacts._create_relation_prediction_data(sbml_dfs: SBML_dfs, napistu_graph: NapistuGraph) NapistuData
Create relation prediction data with edge masking and relation-type labels.
- Parameters:
sbml_dfs (SBML_dfs) – SBML data structure
napistu_graph (NapistuGraph) – Napistu graph
- Returns:
Relation prediction data with train/test/val edge masking and relation-type labels
- Return type:
- napistu_torch.load.artifacts._create_species_identifiers(sbml_dfs: SBML_dfs) DataFrame
Create species identifiers.
- napistu_torch.load.artifacts._create_species_type_prediction_data(sbml_dfs: SBML_dfs, napistu_graph: NapistuGraph) NapistuData
Create supervised data for species type classification.
- Parameters:
sbml_dfs (SBML_dfs) – SBML data structure
napistu_graph (NapistuGraph) – Napistu graph
- Returns:
Supervised node classification data with species type labels
- Return type:
- napistu_torch.load.artifacts._create_unlabeled_data(sbml_dfs: SBML_dfs, napistu_graph: NapistuGraph) NapistuData
Create unlabeled data with no masking.
- Parameters:
sbml_dfs (SBML_dfs) – SBML data structure
napistu_graph (NapistuGraph) – Napistu graph
- Returns:
Unlabeled data suitable for full-graph training
- Return type:
- napistu_torch.load.artifacts.create_artifact(name: str, sbml_dfs: SBML_dfs, napistu_graph: NapistuGraph, artifact_registry: dict[str, ArtifactDefinition] = None) NapistuData | VertexTensor | DataFrame
Create an artifact by name using the registry.
- Parameters:
name (str) – Name of artifact to create
sbml_dfs (SBML_dfs) – SBML data structure
napistu_graph (NapistuGraph) – Napistu graph
artifact_registry (dict[str, ArtifactDefinition]) – Artifact registry to use. If not provided, the default registry will be used.
- Returns:
The created artifact
- Return type:
Union[NapistuData, VertexTensor]
- Raises:
KeyError – If artifact name not in registry
Examples
>>> sbml_dfs = SBML_dfs.from_pickle("data.pkl") >>> napistu_graph = NapistuGraph.from_pickle("graph.pkl") >>> artifact = create_artifact("unlabeled", sbml_dfs, napistu_graph)
- napistu_torch.load.artifacts.ensure_stratify_by_artifact_name(stratify_by: str) str
Ensure the stratify_by value is an artifact name.
This supports naming either by short-hand alias or the full artifact name.
- Parameters:
stratify_by (str) – Stratify by value
- Returns:
Artifact name
- Return type:
str
- Raises:
ValueError – If invalid stratify_by value
- napistu_torch.load.artifacts.get_artifact_info(name: str, artifact_registry: dict[str, ArtifactDefinition] = None) ArtifactDefinition
Get information about an artifact.
- Parameters:
name (str) – Artifact name
artifact_registry (dict[str, ArtifactDefinition]) – Artifact registry to use. If not provided, the default registry will be used.
- Returns:
Artifact metadata including type and description
- Return type:
- Raises:
KeyError – If artifact not in registry
Examples
>>> info = get_artifact_info("unlabeled") >>> print(info.artifact_type) 'napistu_data' >>> print(info.description) 'Unlabeled learning data without masking'
- napistu_torch.load.artifacts.list_available_artifacts(artifact_registry: dict[str, ArtifactDefinition] = None) List[str]
List all available artifact names in the registry.
- Parameters:
artifact_registry (dict[str, ArtifactDefinition]) – Artifact registry to use. If not provided, the default registry will be used.
- Returns:
Sorted list of artifact names
- Return type:
List[str]
Examples
>>> artifacts = list_available_artifacts() >>> print(artifacts) ['comprehensive_pathway_memberships', 'edge_prediction', 'supervised_species_type', 'unlabeled']
- napistu_torch.load.artifacts.validate_artifact_registry(artifact_registry: dict[str, ArtifactDefinition]) None
Validate the artifact registry.
Ensures: - Registry is not empty - Registry keys match definition names - No duplicate names
- Raises:
ValueError – If validation fails