napistu_torch.evaluation.manager

Manager for organizing experiments’ metadata, data, models, and evaluation results.

This module provides managers for accessing experiment artifacts, loading models, and managing experiment metadata for both local and remote (HuggingFace) experiments.

Classes

EvaluationManager

Base class for evaluation managers.

LocalEvaluationManager

Manager for local experiments with file system access.

RemoteEvaluationManager

Manager for remote experiments stored on HuggingFace Hub.

Public Functions

find_best_checkpoint(checkpoint_dir)

Find the best checkpoint in a directory based on validation metrics.

Functions

find_best_checkpoint(checkpoint_dir)

Get the best checkpoint from a directory of checkpoints.

Classes

EvaluationManager()

Base class for evaluation managers.

LocalEvaluationManager(experiment_dir)

Manager for post-training evaluation of a locally-stored model.

RemoteEvaluationManager(repo_id, model_loader)

Manager for evaluation of models loaded from HuggingFace Hub.

class napistu_torch.evaluation.manager.EvaluationManager

Bases: ABC

Base class for evaluation managers.

Provides a unified interface for accessing experiment artifacts, loading models, and managing experiment metadata. Both local and remote (HuggingFace) evaluation managers share this common interface built around the RunManifest abstraction.

manifest

The experiment manifest containing metadata and configuration

Type:

RunManifest

napistu_data_store

Data store for accessing NapistuData objects and other artifacts

Type:

Optional[NapistuDataStore]

experiment_dict

Cached experiment dictionary with model, data module, trainer, etc.

Type:

Optional[dict]

Properties(derived from manifest)
----------------------------------
experiment_config

The experiment configuration

Type:

ExperimentConfig

experiment_name

Name of the experiment

Type:

Optional[str]

wandb_run_id

WandB run ID

Type:

Optional[str]

wandb_run_url

WandB run URL

Type:

Optional[str]

wandb_project

WandB project name

Type:

Optional[str]

wandb_entity

WandB entity (username/team)

Type:

Optional[str]

Public Methods
--------------
get_experiment_dict

Get the experiment dictionary with model, data module, trainer, etc.

get_store

Get the NapistuDataStore for this experiment’s data

get_summary_string

Generate a descriptive summary string from experiment metadata

get_run_summary

Get summary metrics from WandB for this experiment

load_model_from_checkpoint(checkpoint_name=None)

Load a trained model from a checkpoint file (abstract, subclass-specific)

load_napistu_data(napistu_data_name=None)

Load the NapistuData object used for this experiment

get_experiment_dict(skip_wandb: bool = False) dict

Get the experiment dictionary with all experiment components.

The experiment dictionary contains the model, data module, trainer, run manifest, and WandB logger. This is lazily loaded and cached.

Parameters:

skip_wandb (bool, optional) – If True, skip creating WandB logger. Useful for remote models to avoid creating directories. Default is False.

Returns:

Experiment dictionary containing: - data_module : Union[FullGraphDataModule, EdgeBatchDataModule] - model : pl.LightningModule (e.g., EdgePredictionLightning) - trainer : NapistuTrainer - run_manifest : RunManifest - wandb_logger : Optional[WandbLogger]

Return type:

dict

Examples

>>> manager = LocalEvaluationManager("experiments/my_run")
>>> experiment_dict = manager.get_experiment_dict()
>>> model = experiment_dict[EXPERIMENT_DICT.MODEL]
get_run_summary() dict

Get summary metrics from WandB for this experiment.

Retrieves the summary metrics (final values) from the WandB run associated with this experiment.

Returns:

Dictionary containing summary metrics from WandB (e.g., final validation AUC, training loss, etc.)

Return type:

dict

Raises:
  • ValueError – If WandB run ID is not available

  • RuntimeError – If WandB API access fails

Examples

>>> manager = LocalEvaluationManager("experiments/my_run")
>>> summary = manager.get_run_summary()
>>> print(summary["val_auc"])  # Final validation AUC
get_store() NapistuDataStore

Get the NapistuDataStore for this experiment’s data.

The data store is lazily loaded and cached. For LocalEvaluationManager, it loads from experiment_config.data.store_dir on first access. For RemoteEvaluationManager, it’s already loaded during initialization.

Returns:

The data store instance for this experiment

Return type:

NapistuDataStore

Raises:

ValueError – If no data store is available

Examples

>>> manager = LocalEvaluationManager("experiments/my_run")
>>> store = manager.get_store()
>>> napistu_data = store.load_napistu_data("edge_prediction")
>>>
>>> manager = RemoteEvaluationManager.from_huggingface(
...     repo_id="shackett/sage-octopus",
...     data_store_dir=Path("./store")
... )
>>> store = manager.get_store()
get_summary_string() str

Generate a descriptive summary string from experiment metadata.

Examples: - “model: sage-octopus-baseline (sage-dot_product_h128_l3) | WandB: abc123” - “model: transe-256-hidden”

Returns:

Formatted summary string

Return type:

str

abstractmethod load_model_from_checkpoint(checkpoint_name: Path | str | None = None) object

Load a trained model from a checkpoint file.

This method is implemented differently for local and remote managers: - LocalEvaluationManager: discovers and loads from checkpoint directory - RemoteEvaluationManager: loads the single published checkpoint from HuggingFace

Parameters:

checkpoint_name (Optional[Union[Path, str]], default=None) – Checkpoint identifier (interpretation depends on subclass)

Returns:

The loaded model in evaluation mode

Return type:

LightningModule

Raises:

NotImplementedError – This is an abstract method and must be implemented by subclasses

load_napistu_data(napistu_data_name: str | None = None) NapistuData

Load the NapistuData object used for this experiment.

Loads the NapistuData object from the experiment’s data store. If no name is provided, uses the name from the experiment configuration.

Parameters:

napistu_data_name (Optional[str], default=None) – Name of the NapistuData object to load. If None, uses the name from the experiment configuration.

Returns:

The loaded NapistuData object

Return type:

NapistuData

Examples

>>> manager = LocalEvaluationManager("experiments/my_run")
>>> # Load using name from config
>>> data = manager.load_napistu_data()
>>> # Load specific artifact
>>> data = manager.load_napistu_data("edge_prediction")
_abc_impl = <_abc._abc_data object>
property experiment_config: ExperimentConfig

Get the experiment configuration from the manifest.

experiment_dict: dict | None
property experiment_name: str | None

Get the experiment name from the manifest.

manifest: RunManifest
napistu_data_store: NapistuDataStore | None
property wandb_entity: str | None

Get the WandB entity from the manifest.

property wandb_project: str | None

Get the WandB project name from the manifest.

property wandb_run_id: str | None

Get the WandB run ID from the manifest.

property wandb_run_url: str | None

Get the WandB run URL from the manifest.

class napistu_torch.evaluation.manager.LocalEvaluationManager(experiment_dir: Path | str)

Bases: EvaluationManager

Manager for post-training evaluation of a locally-stored model.

This class provides a unified interface for accessing experiment artifacts, loading models from checkpoints, publishing to HuggingFace Hub, and managing experiment metadata. It loads the experiment manifest from a local directory and provides convenient access to checkpoints, WandB information, and data stores.

Parameters:

experiment_dir (Union[Path, str]) – Path to the experiment directory containing the manifest file and checkpoints. Must contain a run_manifest.yaml file.

experiment_dir

Path to the experiment directory

Type:

Path

manifest

The experiment manifest containing metadata and configuration

Type:

RunManifest

checkpoint_dir

Directory containing model checkpoints

Type:

Path

best_checkpoint_path

Path to the best checkpoint (highest validation AUC)

Type:

Optional[Path]

best_checkpoint_val_auc

Validation AUC of the best checkpoint

Type:

Optional[float]

napistu_data_store

The data store instance (lazily loaded)

Type:

Optional[NapistuDataStore]

experiment_dict

Cached experiment dictionary (lazily loaded)

Type:

Optional[dict]

Public Methods
--------------
load_model_from_checkpoint(checkpoint_name=None)

Load a trained model from a checkpoint file

publish_to_huggingface(repo_id, checkpoint_path=None, commit_message=None, overwrite=False, token=None)

Publish this experiment’s model to HuggingFace Hub

Private Methods
---------------
_resolve_checkpoint_path(checkpoint_name=None)

Resolve a checkpoint name or path to an actual checkpoint file path

Examples

>>> # Load an experiment
>>> manager = LocalEvaluationManager("experiments/my_run")
>>>
>>> # Load the model from best checkpoint
>>> model = manager.load_model_from_checkpoint()
>>>
>>> # Load from specific checkpoint
>>> model = manager.load_model_from_checkpoint("last")
>>> model = manager.load_model_from_checkpoint("best-epoch=50-val_auc=0.85.ckpt")
>>>
>>> # Get experiment summary
>>> summary = manager.get_summary_string()
>>> print(summary)  # "model: sage-octopus-baseline (sage-dot_product_h128_l3) | WandB: abc123"
>>>
>>> # Publish to HuggingFace
>>> url = manager.publish_to_huggingface("username/model-name")
__init__(experiment_dir: Path | str)

Initialize LocalEvaluationManager from an experiment directory.

Parameters:

experiment_dir (Union[Path, str]) – Path to experiment directory containing manifest and checkpoints. Must contain a run_manifest.yaml file.

Raises:
  • FileNotFoundError – If experiment directory or manifest file doesn’t exist

  • ValueError – If manifest file is invalid or cannot be parsed

_resolve_checkpoint_path(checkpoint_name: Path | str | None = None) Path

Resolve a checkpoint name or path to an actual checkpoint file path.

Handles various input formats: - None: Uses the best checkpoint (highest validation AUC) - String matching a checkpoint filename in checkpoint_dir (e.g., “last.ckpt”, “best-epoch=50-val_auc=0.85.ckpt”) - The string “last”: Resolves to “last.ckpt” in checkpoint_dir - A Path object or string path to a checkpoint file

Parameters:

checkpoint_name (Optional[Union[Path, str]], default=None) – Checkpoint name or path. If None, uses best checkpoint. If a string, first checks if it matches a file in checkpoint_dir, otherwise treats it as a file path.

Returns:

Resolved path to the checkpoint file

Return type:

Path

Raises:
  • ValueError – If no checkpoint is found and none is provided

  • FileNotFoundError – If the specified checkpoint file doesn’t exist

load_model_from_checkpoint(checkpoint_name: Path | str | None = None) object

Load a trained model from a checkpoint file.

The checkpoint name can be: - None: Uses the best checkpoint (highest validation AUC) - A string matching a checkpoint filename in checkpoint_dir (e.g., “last.ckpt”, “best-epoch=50-val_auc=0.85.ckpt”) - The string “last”: Resolves to “last.ckpt” in checkpoint_dir - A Path object or string path to a checkpoint file

Parameters:

checkpoint_name (Optional[Union[Path, str]], default=None) – Checkpoint name or path. If None, uses best checkpoint. If a string, first checks if it matches a file in checkpoint_dir, otherwise treats it as a file path.

Returns:

The loaded model in evaluation mode

Return type:

LightningModule

Raises:
  • ValueError – If no checkpoint is found and none is provided

  • FileNotFoundError – If the specified checkpoint file doesn’t exist

Examples

>>> manager = LocalEvaluationManager("experiments/my_run")
>>>
>>> # Load from best checkpoint
>>> model = manager.load_model_from_checkpoint()
>>>
>>> # Load from last checkpoint
>>> model = manager.load_model_from_checkpoint("last")
>>>
>>> # Load from specific checkpoint by name
>>> model = manager.load_model_from_checkpoint("best-epoch=50-val_auc=0.85.ckpt")
>>>
>>> # Load from absolute path
>>> model = manager.load_model_from_checkpoint("/path/to/checkpoint.ckpt")
publish_to_huggingface(repo_id: str, checkpoint_path: Path | None = None, commit_message: str | None = None, overwrite: bool = False, token: str | None = None, tag: str | None = None, tag_message: str | None = None) str

Publish this experiment’s model to HuggingFace Hub.

Creates a private repository if it doesn’t exist. Repositories can be made public manually on huggingface.co after curation.

Parameters:
  • repo_id (str) – Repository ID in format “username/repo-name”

  • checkpoint_path (Optional[Path]) – Checkpoint to publish. If None, uses best checkpoint.

  • commit_message (Optional[str]) – Custom commit message (default: auto-generated)

  • overwrite (bool) – Explicitly confirm overwriting existing model (default: False)

  • token (Optional[str]) – HuggingFace API token (default: uses huggingface-cli login token)

  • tag (Optional[str]) – Tag name to create after all assets are uploaded (e.g., “v1.0”)

  • tag_message (Optional[str]) – Optional message for the tag

Returns:

URL to the published model on HuggingFace Hub

Return type:

str

Examples

>>> manager = LocalEvaluationManager("experiments/my_run")
>>> # First upload
>>> url = manager.publish_to_huggingface("shackett/napistu-sage-octopus")
>>> # Update same repo
>>> url = manager.publish_to_huggingface("shackett/napistu-sage-octopus", overwrite=True)
_abc_impl = <_abc._abc_data object>
class napistu_torch.evaluation.manager.RemoteEvaluationManager(repo_id: str, model_loader: HFModelLoader, data_store: NapistuDataStore | None = None)

Bases: EvaluationManager

Manager for evaluation of models loaded from HuggingFace Hub.

This class provides evaluation capabilities for models published to HuggingFace, loading the model checkpoint, configuration, and optionally data from remote repositories. It shares the same interface as LocalEvaluationManager but with remote-specific implementation details.

Parameters:
  • repo_id (str) – HuggingFace model repository ID (e.g., “username/model-name”)

  • model_loader (HFModelLoader) – Loader instance with downloaded model artifacts

  • data_store (Optional[NapistuDataStore]) – Data store for accessing NapistuData objects (optional)

repo_id

HuggingFace repository ID

Type:

str

revision

Git revision (branch, tag, or commit) used for loading

Type:

str

manifest

Reconstructed run manifest from HuggingFace artifacts

Type:

RunManifest

checkpoint

Loaded model checkpoint

Type:

Checkpoint

checkpoint_path

Path to cached checkpoint file

Type:

Path

napistu_data_store

The data store instance (may be None)

Type:

Optional[NapistuDataStore]

experiment_dict

Cached experiment dictionary (lazily loaded)

Type:

Optional[dict]

Public Methods
--------------
from_huggingface(repo_id, revision=None, data_repo_id=None, data_store_dir=None, ...)

Load a model and optionally data from HuggingFace Hub

load_model_from_checkpoint(checkpoint_name=None)

Load the published model checkpoint

Properties(Remote-specific, raise AttributeError)
--------------------------------------------------
experiment_dir

Not available for remote models

checkpoint_dir

Not available for remote models

best_checkpoint_path

Not available for remote models (only one checkpoint exists)

Examples

>>> # Load model only
>>> manager = RemoteEvaluationManager.from_huggingface(
...     repo_id="shackett/sage-octopus"
... )
>>>
>>> # Load model with data
>>> manager = RemoteEvaluationManager.from_huggingface(
...     repo_id="shackett/sage-octopus",
...     data_repo_id="shackett/octopus-consensus-v1"
... )
>>>
>>> # Load model and use it
>>> model = manager.load_model_from_checkpoint()
>>> summary = manager.get_summary_string()
classmethod from_huggingface(repo_id: str, data_store_dir: str | Path, revision: str | None = None, data_repo_id: str | None = None, data_revision: str | None = None, model_cache_dir: Path | None = None, token: str | None = None) RemoteEvaluationManager

Load a model and data from HuggingFace Hub.

Parameters:
  • repo_id (str) – Model repository ID (e.g., “shackett/sage-octopus”)

  • data_store_dir (Union[str, Path]) – Directory for the data store. Can be a string (e.g., “~/data/store”) or Path. Tildes (~) will be expanded to the user’s home directory. If it exists, will be loaded as-is. If it doesn’t exist, will be downloaded from HuggingFace using data_repo_id (or config.data.hf_repo_id if not provided).

  • revision (Optional[str]) – Model revision (branch/tag/commit). Default: “main”

  • data_repo_id (Optional[str]) – Data repository ID. If None and data_store_dir doesn’t exist, will try to use config.data.hf_repo_id.

  • data_revision (Optional[str]) – Data revision (branch/tag/commit). Default: uses config.data.hf_revision or “main” if not specified.

  • model_cache_dir (Optional[Path]) – Where to cache model files. Default: HF default cache

  • token (Optional[str]) – HuggingFace token for private repos

Returns:

Manager instance with model and data loaded

Return type:

RemoteEvaluationManager

Raises:

ValueError – If data_store_dir doesn’t exist and no HF repo info is available

Examples

>>> # Use existing local data store
>>> manager = RemoteEvaluationManager.from_huggingface(
...     repo_id="shackett/sage-octopus",
...     data_store_dir=Path("./existing_store")
... )
>>>
>>> # Download data from HF to new location
>>> manager = RemoteEvaluationManager.from_huggingface(
...     repo_id="shackett/sage-octopus",
...     data_store_dir=Path("./new_store"),
...     data_repo_id="shackett/octopus-consensus-v1"
... )
>>>
>>> # Let config determine data repo (if available)
>>> manager = RemoteEvaluationManager.from_huggingface(
...     repo_id="shackett/sage-octopus",
...     data_store_dir=Path("./new_store")
... )
>>>
>>> # Pinned versions
>>> manager = RemoteEvaluationManager.from_huggingface(
...     repo_id="shackett/sage-octopus",
...     revision="v1.0",
...     data_store_dir=Path("./store"),
...     data_repo_id="shackett/octopus-consensus-v1",
...     data_revision="v1.0"
... )
__init__(repo_id: str, model_loader: HFModelLoader, data_store: NapistuDataStore | None = None)

Initialize RemoteEvaluationManager from HuggingFace artifacts.

Parameters:
  • repo_id (str) – HuggingFace repository ID

  • model_loader (HFModelLoader) – Loader instance with downloaded model artifacts

  • data_store (Optional[NapistuDataStore]) – Data store for accessing NapistuData objects

get_run_summary(from_wandb: bool = False) dict

Get summary metrics from HuggingFace (default) or WandB for this experiment.

By default, retrieves the summary metrics from the HuggingFace repository by loading the wandb_run_info.yaml file. This avoids needing WandB API access for remote models. Optionally, can load directly from WandB if from_wandb=True.

Parameters:

from_wandb (bool, optional) – If True, load summary from WandB API instead of HuggingFace. Default is False (load from HuggingFace).

Returns:

Dictionary containing summary metrics (e.g., final validation AUC, training loss, etc.)

Return type:

dict

Raises:
  • RuntimeError – If HuggingFace API access fails or run info is not available

  • ValueError – If from_wandb=True but WandB run ID is not available

  • RuntimeError – If WandB API access fails when from_wandb=True

Examples

>>> manager = RemoteEvaluationManager.from_huggingface(
...     repo_id="shackett/sage-octopus",
...     data_store_dir=Path("./store")
... )
>>> # Load from HuggingFace (default)
>>> summary = manager.get_run_summary()
>>> print(summary["val_auc"])  # Final validation AUC
>>>
>>> # Load from WandB instead
>>> summary = manager.get_run_summary(from_wandb=True)
load_model_from_checkpoint(checkpoint_name: Path | str | None = None) object

Load the published model checkpoint from HuggingFace Hub.

RemoteEvaluationManager only contains the single published checkpoint, so checkpoint_name should not be provided.

Parameters:

checkpoint_name (Optional[Union[Path, str]], default=None) – Must be None. Remote models only have one checkpoint.

Returns:

The loaded model in evaluation mode

Return type:

LightningModule

Raises:

ValueError – If checkpoint_name is provided (not supported for remote models)

Examples

>>> manager = RemoteEvaluationManager.from_huggingface("shackett/sage-octopus")
>>> model = manager.load_model_from_checkpoint()
publish_to_huggingface(*args, **kwargs) str

Not supported for remote models (already published).

_abc_impl = <_abc._abc_data object>
property best_checkpoint_path: Path | None

Not available for remote models.

property best_checkpoint_val_auc: float | None

Not available for remote models.

property checkpoint_dir: Path

Not available for remote models.

property experiment_dir: Path

Not available for remote models.

property repo_url: str

URL to the HuggingFace model repository.

napistu_torch.evaluation.manager._parse_checkpoint_filename(filename: str | Path) tuple[int, float] | None

Extract epoch number and validation AUC from checkpoint filename.

Parameters:

filename (str | Path) – Checkpoint filename like “best-epoch=120-val_auc=0.7604.ckpt”

Returns:

  • epoch (int) – Epoch number

  • val_auc (float) – Validation AUC

  • Example – >>> _parse_checkpoint_filename(“best-epoch=120-val_auc=0.7604.ckpt”) (120, 0.7604)

napistu_torch.evaluation.manager._resolve_data_store_for_remote(data_store_dir: Path, experiment_config: ExperimentConfig, data_repo_id: str | None = None, data_revision: str | None = None, token: str | None = None) NapistuDataStore

Resolve and load a NapistuDataStore for remote evaluation.

Handles three scenarios: 1. data_store_dir exists -> load existing store 2. data_store_dir missing + HF repo info available -> download from HF 3. data_store_dir missing + no HF info -> error

Parameters:
  • data_store_dir (Path) – Directory for the data store (may or may not exist)

  • experiment_config (ExperimentConfig) – Experiment configuration (may contain HF repo info)

  • data_repo_id (Optional[str]) – HuggingFace data repository ID (overrides config)

  • data_revision (Optional[str]) – Data revision (overrides config)

  • token (Optional[str]) – HuggingFace token for private repos

Returns:

Loaded data store

Return type:

NapistuDataStore

Raises:

ValueError – If data_store_dir doesn’t exist and no HF repo info is available

Examples

>>> # Existing store
>>> store = _resolve_data_store_for_remote(
...     data_store_dir=Path("./existing_store"),
...     experiment_config=config
... )
>>>
>>> # Download from HF
>>> store = _resolve_data_store_for_remote(
...     data_store_dir=Path("./new_store"),
...     experiment_config=config,
...     data_repo_id="username/data-repo"
... )
napistu_torch.evaluation.manager.find_best_checkpoint(checkpoint_dir: Path) tuple[Path, float] | None

Get the best checkpoint from a directory of checkpoints.