napistu_torch.load.artifacts

Artifact registry for predefined NapistuData and VertexTensor objects.

This module defines all standard artifacts that can be created from SBML_dfs and NapistuGraph objects. Each artifact has a creation function that encapsulates the ETL logic.

Classes

ArtifactDefinition

Definition of an artifact that can be created from SBML_dfs and NapistuGraph.

Public Functions

create_artifact(name, sbml_dfs, napistu_graph, artifact_registry=None)

Create an artifact by name using the registry.

ensure_stratify_by_artifact_name(stratify_by)

Ensure the stratify_by value is an artifact name.

get_artifact_info(name, artifact_registry=None)

Get information about an artifact.

list_available_artifacts(artifact_registry=None)

List all available artifact names in the registry.

validate_artifact_registry(artifact_registry)

Validate the artifact registry.

To add a new artifact: 1. Create a creation function (e.g., create_my_artifact) 2. Add an ArtifactDefinition to _ARTIFACTS list 3. The registry will be built automatically

Functions

create_artifact(name, sbml_dfs, napistu_graph)

Create an artifact by name using the registry.

ensure_stratify_by_artifact_name(stratify_by)

Ensure the stratify_by value is an artifact name.

get_artifact_info(name[, artifact_registry])

Get information about an artifact.

list_available_artifacts([artifact_registry])

List all available artifact names in the registry.

validate_artifact_registry(artifact_registry)

Validate the artifact registry.

Classes

ArtifactDefinition(*, name, artifact_type, ...)

Metadata for a predefined artifact.

class napistu_torch.load.artifacts.ArtifactDefinition(*, name: str, artifact_type: str, creation_func: Callable, description: str = '')

Bases: BaseModel

Metadata for a predefined artifact.

classmethod validate_artifact_type(v)

Validate artifact type.

classmethod validate_name(v)

Validate artifact name format.

_abc_impl = <_abc._abc_data object>
artifact_type: str
creation_func: Callable
description: str
model_config = {'arbitrary_types_allowed': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

name: str
napistu_torch.load.artifacts._create_comprehensive_pathway_memberships(sbml_dfs: SBML_dfs, napistu_graph: NapistuGraph) VertexTensor

Create comprehensive source membership tensor.

Parameters:
  • sbml_dfs (SBML_dfs) – SBML data structure

  • napistu_graph (NapistuGraph) – Napistu graph

Returns:

Comprehensive pathway membership features for all vertices

Return type:

VertexTensor

napistu_torch.load.artifacts._create_edge_prediction_data(sbml_dfs: SBML_dfs, napistu_graph: NapistuGraph) NapistuData

Create edge prediction data with edge masking.

Parameters:
  • sbml_dfs (SBML_dfs) – SBML data structure

  • napistu_graph (NapistuGraph) – Napistu graph

Returns:

Edge prediction data with train/val/test edge masks

Return type:

NapistuData

napistu_torch.load.artifacts._create_edge_strata_by_edge_sbo_terms(napistu_graph: NapistuGraph, min_relation_count: int = 1000) DataFrame

Create edge strata by edge SBO terms.

Parameters:
  • napistu_graph (NapistuGraph) – Napistu graph

  • min_relation_count (int) – Minimum number of edges required for a category to be kept separate. Categories with fewer edges will be merged into “other relation”.

Returns:

Edge strata DataFrame

Return type:

pd.DataFrame

napistu_torch.load.artifacts._create_edge_strata_by_node_species_type(napistu_graph: NapistuGraph) DataFrame

Create edge strata.

Parameters:

napistu_graph (NapistuGraph) – Napistu graph

Returns:

Edge strata

Return type:

pd.DataFrame

napistu_torch.load.artifacts._create_edge_strata_by_node_type(napistu_graph: NapistuGraph) DataFrame

Create edge strata.

Parameters:

napistu_graph (NapistuGraph) – Napistu graph

Returns:

Edge strata

Return type:

pd.DataFrame

napistu_torch.load.artifacts._create_name_to_sid_map(napistu_graph: NapistuGraph) DataFrame

Create a map of vertex names to species ids.

napistu_torch.load.artifacts._create_relation_prediction_data(sbml_dfs: SBML_dfs, napistu_graph: NapistuGraph) NapistuData

Create relation prediction data with edge masking and relation-type labels.

Parameters:
  • sbml_dfs (SBML_dfs) – SBML data structure

  • napistu_graph (NapistuGraph) – Napistu graph

Returns:

Relation prediction data with train/test/val edge masking and relation-type labels

Return type:

NapistuData

napistu_torch.load.artifacts._create_species_identifiers(sbml_dfs: SBML_dfs) DataFrame

Create species identifiers.

napistu_torch.load.artifacts._create_species_type_prediction_data(sbml_dfs: SBML_dfs, napistu_graph: NapistuGraph) NapistuData

Create supervised data for species type classification.

Parameters:
  • sbml_dfs (SBML_dfs) – SBML data structure

  • napistu_graph (NapistuGraph) – Napistu graph

Returns:

Supervised node classification data with species type labels

Return type:

NapistuData

napistu_torch.load.artifacts._create_unlabeled_data(sbml_dfs: SBML_dfs, napistu_graph: NapistuGraph) NapistuData

Create unlabeled data with no masking.

Parameters:
  • sbml_dfs (SBML_dfs) – SBML data structure

  • napistu_graph (NapistuGraph) – Napistu graph

Returns:

Unlabeled data suitable for full-graph training

Return type:

NapistuData

napistu_torch.load.artifacts.create_artifact(name: str, sbml_dfs: SBML_dfs, napistu_graph: NapistuGraph, artifact_registry: dict[str, ArtifactDefinition] = None) NapistuData | VertexTensor | DataFrame

Create an artifact by name using the registry.

Parameters:
  • name (str) – Name of artifact to create

  • sbml_dfs (SBML_dfs) – SBML data structure

  • napistu_graph (NapistuGraph) – Napistu graph

  • artifact_registry (dict[str, ArtifactDefinition]) – Artifact registry to use. If not provided, the default registry will be used.

Returns:

The created artifact

Return type:

Union[NapistuData, VertexTensor]

Raises:

KeyError – If artifact name not in registry

Examples

>>> sbml_dfs = SBML_dfs.from_pickle("data.pkl")
>>> napistu_graph = NapistuGraph.from_pickle("graph.pkl")
>>> artifact = create_artifact("unlabeled", sbml_dfs, napistu_graph)
napistu_torch.load.artifacts.ensure_stratify_by_artifact_name(stratify_by: str) str

Ensure the stratify_by value is an artifact name.

This supports naming either by short-hand alias or the full artifact name.

Parameters:

stratify_by (str) – Stratify by value

Returns:

Artifact name

Return type:

str

Raises:

ValueError – If invalid stratify_by value

napistu_torch.load.artifacts.get_artifact_info(name: str, artifact_registry: dict[str, ArtifactDefinition] = None) ArtifactDefinition

Get information about an artifact.

Parameters:
  • name (str) – Artifact name

  • artifact_registry (dict[str, ArtifactDefinition]) – Artifact registry to use. If not provided, the default registry will be used.

Returns:

Artifact metadata including type and description

Return type:

ArtifactDefinition

Raises:

KeyError – If artifact not in registry

Examples

>>> info = get_artifact_info("unlabeled")
>>> print(info.artifact_type)
'napistu_data'
>>> print(info.description)
'Unlabeled learning data without masking'
napistu_torch.load.artifacts.list_available_artifacts(artifact_registry: dict[str, ArtifactDefinition] = None) List[str]

List all available artifact names in the registry.

Parameters:

artifact_registry (dict[str, ArtifactDefinition]) – Artifact registry to use. If not provided, the default registry will be used.

Returns:

Sorted list of artifact names

Return type:

List[str]

Examples

>>> artifacts = list_available_artifacts()
>>> print(artifacts)
['comprehensive_pathway_memberships', 'edge_prediction', 'supervised_species_type', 'unlabeled']
napistu_torch.load.artifacts.validate_artifact_registry(artifact_registry: dict[str, ArtifactDefinition]) None

Validate the artifact registry.

Ensures: - Registry is not empty - Registry keys match definition names - No duplicate names

Raises:

ValueError – If validation fails