napistu_torch.evaluation.manager

Manager for organizing experiments’ metadata, data, models, and evaluation results.

This module provides managers for accessing experiment artifacts, loading models, and managing experiment metadata for both local and remote (HuggingFace) experiments.

Classes

EvaluationManager: Base class for evaluation managers.
LocalEvaluationManager: Manager for local experiments with file system access.
RemoteEvaluationManager: Manager for remote experiments stored on HuggingFace Hub.

Public Functions

find_best_checkpoint(checkpoint_dir): Find the best checkpoint in a directory based on validation metrics.

Functions

find_best_checkpoint(checkpoint_dir)

Get the best checkpoint from a directory of checkpoints.

Classes

`EvaluationManager`()	Base class for evaluation managers.
`LocalEvaluationManager`(experiment_dir)	Manager for post-training evaluation of a locally-stored model.
`RemoteEvaluationManager`(repo_id, model_loader)	Manager for evaluation of models loaded from HuggingFace Hub.

class napistu_torch.evaluation.manager.EvaluationManager

Bases: ABC

Base class for evaluation managers.

Provides a unified interface for accessing experiment artifacts, loading models, and managing experiment metadata. Both local and remote (HuggingFace) evaluation managers share this common interface built around the RunManifest abstraction.

manifest

The experiment manifest containing metadata and configuration

Type:: RunManifest

napistu_data_store

Data store for accessing NapistuData objects and other artifacts

Type:: Optional[NapistuDataStore]

experiment_dict

Cached experiment dictionary with model, data module, trainer, etc.

Type:: Optional[dict]

Properties(derived from manifest)

----------------------------------

experiment_config

The experiment configuration

Type:: ExperimentConfig

experiment_name

Name of the experiment

Type:: Optional[str]

wandb_run_id

WandB run ID

Type:: Optional[str]

wandb_run_url

WandB run URL

Type:: Optional[str]

wandb_project

WandB project name

Type:: Optional[str]

wandb_entity

WandB entity (username/team)

Type:: Optional[str]

Public Methods

--------------

get_experiment_dict: Get the experiment dictionary with model, data module, trainer, etc.

get_store: Get the NapistuDataStore for this experiment’s data

get_summary_string: Generate a descriptive summary string from experiment metadata

get_run_summary: Get summary metrics from WandB for this experiment

load_model_from_checkpoint(checkpoint_name=None): Load a trained model from a checkpoint file (abstract, subclass-specific)

load_napistu_data(napistu_data_name=None): Load the NapistuData object used for this experiment

get_experiment_dict(skip_wandb: bool = False) → dict

Get the experiment dictionary with all experiment components.

The experiment dictionary contains the model, data module, trainer, run manifest, and WandB logger. This is lazily loaded and cached.

Parameters:: skip_wandb (bool, optional) – If True, skip creating WandB logger. Useful for remote models to avoid creating directories. Default is False.
Returns:: Experiment dictionary containing: - data_module : Union[FullGraphDataModule, EdgeBatchDataModule] - model : pl.LightningModule (e.g., EdgePredictionLightning) - trainer : NapistuTrainer - run_manifest : RunManifest - wandb_logger : Optional[WandbLogger]
Return type:: dict

Examples

>>> manager = LocalEvaluationManager("experiments/my_run")
>>> experiment_dict = manager.get_experiment_dict()
>>> model = experiment_dict[EXPERIMENT_DICT.MODEL]

get_run_summary() → dict

Get summary metrics from WandB for this experiment.

Retrieves the summary metrics (final values) from the WandB run associated with this experiment.

Returns:

Dictionary containing summary metrics from WandB (e.g., final validation AUC, training loss, etc.)

Return type:

dict

Raises:

ValueError – If WandB run ID is not available
RuntimeError – If WandB API access fails

Examples

>>> manager = LocalEvaluationManager("experiments/my_run")
>>> summary = manager.get_run_summary()
>>> print(summary["val_auc"])  # Final validation AUC

get_store() → NapistuDataStore

Get the NapistuDataStore for this experiment’s data.

The data store is lazily loaded and cached. For LocalEvaluationManager, it loads from experiment_config.data.store_dir on first access. For RemoteEvaluationManager, it’s already loaded during initialization.

Returns:: The data store instance for this experiment
Return type:: NapistuDataStore
Raises:: ValueError – If no data store is available

Examples

>>> manager = LocalEvaluationManager("experiments/my_run")
>>> store = manager.get_store()
>>> napistu_data = store.load_napistu_data("edge_prediction")
>>>
>>> manager = RemoteEvaluationManager.from_huggingface(
...     repo_id="shackett/sage-octopus",
...     data_store_dir=Path("./store")
... )
>>> store = manager.get_store()

get_summary_string() → str

Generate a descriptive summary string from experiment metadata.

Examples: - “model: sage-octopus-baseline (sage-dot_product_h128_l3) | WandB: abc123” - “model: transe-256-hidden”

Returns:: Formatted summary string
Return type:: str

abstractmethod load_model_from_checkpoint(checkpoint_name: Path | str | None = None) → object

Load a trained model from a checkpoint file.

This method is implemented differently for local and remote managers: - LocalEvaluationManager: discovers and loads from checkpoint directory - RemoteEvaluationManager: loads the single published checkpoint from HuggingFace

Parameters:: checkpoint_name (Optional[Union[Path, str]], default=None) – Checkpoint identifier (interpretation depends on subclass)
Returns:: The loaded model in evaluation mode
Return type:: LightningModule
Raises:: NotImplementedError – This is an abstract method and must be implemented by subclasses

load_napistu_data(napistu_data_name: str | None = None) → NapistuData

Load the NapistuData object used for this experiment.

Loads the NapistuData object from the experiment’s data store. If no name is provided, uses the name from the experiment configuration.

Parameters:: napistu_data_name (Optional[str], default=None) – Name of the NapistuData object to load. If None, uses the name from the experiment configuration.
Returns:: The loaded NapistuData object
Return type:: NapistuData

Examples

>>> manager = LocalEvaluationManager("experiments/my_run")
>>> # Load using name from config
>>> data = manager.load_napistu_data()
>>> # Load specific artifact
>>> data = manager.load_napistu_data("edge_prediction")

_abc_impl = <_abc._abc_data object>

property experiment_config: ExperimentConfig: Get the experiment configuration from the manifest.

experiment_dict: dict | None

property experiment_name: str | None: Get the experiment name from the manifest.

manifest: RunManifest

napistu_data_store: NapistuDataStore | None

property wandb_entity: str | None: Get the WandB entity from the manifest.

property wandb_project: str | None: Get the WandB project name from the manifest.

property wandb_run_id: str | None: Get the WandB run ID from the manifest.

property wandb_run_url: str | None: Get the WandB run URL from the manifest.

class napistu_torch.evaluation.manager.LocalEvaluationManager(experiment_dir: Path | str)

Bases: EvaluationManager

Manager for post-training evaluation of a locally-stored model.

This class provides a unified interface for accessing experiment artifacts, loading models from checkpoints, publishing to HuggingFace Hub, and managing experiment metadata. It loads the experiment manifest from a local directory and provides convenient access to checkpoints, WandB information, and data stores.

Parameters:: experiment_dir (Union[Path, str]) – Path to the experiment directory containing the manifest file and checkpoints. Must contain a run_manifest.yaml file.

experiment_dir

Path to the experiment directory

Type:: Path

manifest

The experiment manifest containing metadata and configuration

Type:: RunManifest

checkpoint_dir

Directory containing model checkpoints

Type:: Path

best_checkpoint_path

Path to the best checkpoint (highest validation AUC)

Type:: Optional[Path]

best_checkpoint_val_auc

Validation AUC of the best checkpoint

Type:: Optional[float]

napistu_data_store

The data store instance (lazily loaded)

Type:: Optional[NapistuDataStore]

experiment_dict

Cached experiment dictionary (lazily loaded)

Type:: Optional[dict]

Public Methods

--------------

load_model_from_checkpoint(checkpoint_name=None): Load a trained model from a checkpoint file

publish_to_huggingface(repo_id, checkpoint_path=None, commit_message=None, overwrite=False, token=None): Publish this experiment’s model to HuggingFace Hub

Private Methods

---------------

_resolve_checkpoint_path(checkpoint_name=None): Resolve a checkpoint name or path to an actual checkpoint file path

Examples

>>> # Load an experiment
>>> manager = LocalEvaluationManager("experiments/my_run")
>>>
>>> # Load the model from best checkpoint
>>> model = manager.load_model_from_checkpoint()
>>>
>>> # Load from specific checkpoint
>>> model = manager.load_model_from_checkpoint("last")
>>> model = manager.load_model_from_checkpoint("best-epoch=50-val_auc=0.85.ckpt")
>>>
>>> # Get experiment summary
>>> summary = manager.get_summary_string()
>>> print(summary)  # "model: sage-octopus-baseline (sage-dot_product_h128_l3) | WandB: abc123"
>>>
>>> # Publish to HuggingFace
>>> url = manager.publish_to_huggingface("username/model-name")

__init__(experiment_dir: Path | str)

Initialize LocalEvaluationManager from an experiment directory.

Parameters:

experiment_dir (Union[Path, str]) – Path to experiment directory containing manifest and checkpoints. Must contain a run_manifest.yaml file.

Raises:

FileNotFoundError – If experiment directory or manifest file doesn’t exist
ValueError – If manifest file is invalid or cannot be parsed

_resolve_checkpoint_path(checkpoint_name: Path | str | None = None) → Path

Resolve a checkpoint name or path to an actual checkpoint file path.

Handles various input formats: - None: Uses the best checkpoint (highest validation AUC) - String matching a checkpoint filename in checkpoint_dir (e.g., “last.ckpt”, “best-epoch=50-val_auc=0.85.ckpt”) - The string “last”: Resolves to “last.ckpt” in checkpoint_dir - A Path object or string path to a checkpoint file

Parameters:

checkpoint_name (Optional[Union[Path, str]], default=None) – Checkpoint name or path. If None, uses best checkpoint. If a string, first checks if it matches a file in checkpoint_dir, otherwise treats it as a file path.

Returns:

Resolved path to the checkpoint file

Return type:

Path

Raises:

ValueError – If no checkpoint is found and none is provided
FileNotFoundError – If the specified checkpoint file doesn’t exist

load_model_from_checkpoint(checkpoint_name: Path | str | None = None) → object

Load a trained model from a checkpoint file.

The checkpoint name can be: - None: Uses the best checkpoint (highest validation AUC) - A string matching a checkpoint filename in checkpoint_dir (e.g., “last.ckpt”, “best-epoch=50-val_auc=0.85.ckpt”) - The string “last”: Resolves to “last.ckpt” in checkpoint_dir - A Path object or string path to a checkpoint file

Parameters:

checkpoint_name (Optional[Union[Path, str]], default=None) – Checkpoint name or path. If None, uses best checkpoint. If a string, first checks if it matches a file in checkpoint_dir, otherwise treats it as a file path.

Returns:

The loaded model in evaluation mode

Return type:

LightningModule

Raises:

ValueError – If no checkpoint is found and none is provided
FileNotFoundError – If the specified checkpoint file doesn’t exist

Examples

>>> manager = LocalEvaluationManager("experiments/my_run")
>>>
>>> # Load from best checkpoint
>>> model = manager.load_model_from_checkpoint()
>>>
>>> # Load from last checkpoint
>>> model = manager.load_model_from_checkpoint("last")
>>>
>>> # Load from specific checkpoint by name
>>> model = manager.load_model_from_checkpoint("best-epoch=50-val_auc=0.85.ckpt")
>>>
>>> # Load from absolute path
>>> model = manager.load_model_from_checkpoint("/path/to/checkpoint.ckpt")

Publish this experiment’s model to HuggingFace Hub.

Creates a private repository if it doesn’t exist. Repositories can be made public manually on huggingface.co after curation.

Parameters:

repo_id (str) – Repository ID in format “username/repo-name”
checkpoint_path (Optional[Path]) – Checkpoint to publish. If None, uses best checkpoint.
commit_message (Optional[str]) – Custom commit message (default: auto-generated)
overwrite (bool) – Explicitly confirm overwriting existing model (default: False)
token (Optional[str]) – HuggingFace API token (default: uses huggingface-cli login token)
tag (Optional[str]) – Tag name to create after all assets are uploaded (e.g., “v1.0”)
tag_message (Optional[str]) – Optional message for the tag

Returns:

URL to the published model on HuggingFace Hub

Return type:

str

Examples

>>> manager = LocalEvaluationManager("experiments/my_run")
>>> # First upload
>>> url = manager.publish_to_huggingface("shackett/napistu-sage-octopus")
>>> # Update same repo
>>> url = manager.publish_to_huggingface("shackett/napistu-sage-octopus", overwrite=True)

_abc_impl = <_abc._abc_data object>

class napistu_torch.evaluation.manager.RemoteEvaluationManager(repo_id: str, model_loader: HFModelLoader, data_store: NapistuDataStore | None = None)

Bases: EvaluationManager

Manager for evaluation of models loaded from HuggingFace Hub.

This class provides evaluation capabilities for models published to HuggingFace, loading the model checkpoint, configuration, and optionally data from remote repositories. It shares the same interface as LocalEvaluationManager but with remote-specific implementation details.

Parameters:

repo_id (str) – HuggingFace model repository ID (e.g., “username/model-name”)
model_loader (HFModelLoader) – Loader instance with downloaded model artifacts
data_store (Optional[NapistuDataStore]) – Data store for accessing NapistuData objects (optional)

repo_id

HuggingFace repository ID

Type:: str

revision

Git revision (branch, tag, or commit) used for loading

Type:: str

manifest

Reconstructed run manifest from HuggingFace artifacts

Type:: RunManifest

checkpoint

Loaded model checkpoint

Type:: Checkpoint

checkpoint_path

Path to cached checkpoint file

Type:: Path

napistu_data_store

The data store instance (may be None)

Type:: Optional[NapistuDataStore]

experiment_dict

Cached experiment dictionary (lazily loaded)

Type:: Optional[dict]

Public Methods

--------------

from_huggingface(repo_id, revision=None, data_repo_id=None, data_store_dir=None, ...): Load a model and optionally data from HuggingFace Hub

load_model_from_checkpoint(checkpoint_name=None): Load the published model checkpoint

Properties(Remote-specific, raise AttributeError)

--------------------------------------------------

experiment_dir: Not available for remote models

checkpoint_dir: Not available for remote models

best_checkpoint_path: Not available for remote models (only one checkpoint exists)

Examples

>>> # Load model only
>>> manager = RemoteEvaluationManager.from_huggingface(
...     repo_id="shackett/sage-octopus"
... )
>>>
>>> # Load model with data
>>> manager = RemoteEvaluationManager.from_huggingface(
...     repo_id="shackett/sage-octopus",
...     data_repo_id="shackett/octopus-consensus-v1"
... )
>>>
>>> # Load model and use it
>>> model = manager.load_model_from_checkpoint()
>>> summary = manager.get_summary_string()

Load a model and data from HuggingFace Hub.

Parameters:

repo_id (str) – Model repository ID (e.g., “shackett/sage-octopus”)
data_store_dir (Union[str, Path]) – Directory for the data store. Can be a string (e.g., “~/data/store”) or Path. Tildes (~) will be expanded to the user’s home directory. If it exists, will be loaded as-is. If it doesn’t exist, will be downloaded from HuggingFace using data_repo_id (or config.data.hf_repo_id if not provided).
revision (Optional[str]) – Model revision (branch/tag/commit). Default: “main”
data_repo_id (Optional[str]) – Data repository ID. If None and data_store_dir doesn’t exist, will try to use config.data.hf_repo_id.
data_revision (Optional[str]) – Data revision (branch/tag/commit). Default: uses config.data.hf_revision or “main” if not specified.
model_cache_dir (Optional[Path]) – Where to cache model files. Default: HF default cache
token (Optional[str]) – HuggingFace token for private repos

Returns:

Manager instance with model and data loaded

Return type:

RemoteEvaluationManager

Raises:

ValueError – If data_store_dir doesn’t exist and no HF repo info is available

Examples

>>> # Use existing local data store
>>> manager = RemoteEvaluationManager.from_huggingface(
...     repo_id="shackett/sage-octopus",
...     data_store_dir=Path("./existing_store")
... )
>>>
>>> # Download data from HF to new location
>>> manager = RemoteEvaluationManager.from_huggingface(
...     repo_id="shackett/sage-octopus",
...     data_store_dir=Path("./new_store"),
...     data_repo_id="shackett/octopus-consensus-v1"
... )
>>>
>>> # Let config determine data repo (if available)
>>> manager = RemoteEvaluationManager.from_huggingface(
...     repo_id="shackett/sage-octopus",
...     data_store_dir=Path("./new_store")
... )
>>>
>>> # Pinned versions
>>> manager = RemoteEvaluationManager.from_huggingface(
...     repo_id="shackett/sage-octopus",
...     revision="v1.0",
...     data_store_dir=Path("./store"),
...     data_repo_id="shackett/octopus-consensus-v1",
...     data_revision="v1.0"
... )

__init__(repo_id: str, model_loader: HFModelLoader, data_store: NapistuDataStore | None = None)

Initialize RemoteEvaluationManager from HuggingFace artifacts.

Parameters:

repo_id (str) – HuggingFace repository ID
model_loader (HFModelLoader) – Loader instance with downloaded model artifacts
data_store (Optional[NapistuDataStore]) – Data store for accessing NapistuData objects

get_run_summary(from_wandb: bool = False) → dict

Get summary metrics from HuggingFace (default) or WandB for this experiment.

By default, retrieves the summary metrics from the HuggingFace repository by loading the wandb_run_info.yaml file. This avoids needing WandB API access for remote models. Optionally, can load directly from WandB if from_wandb=True.

Parameters:

from_wandb (bool, optional) – If True, load summary from WandB API instead of HuggingFace. Default is False (load from HuggingFace).

Returns:

Dictionary containing summary metrics (e.g., final validation AUC, training loss, etc.)

Return type:

dict

Raises:

RuntimeError – If HuggingFace API access fails or run info is not available
ValueError – If from_wandb=True but WandB run ID is not available
RuntimeError – If WandB API access fails when from_wandb=True

Examples

>>> manager = RemoteEvaluationManager.from_huggingface(
...     repo_id="shackett/sage-octopus",
...     data_store_dir=Path("./store")
... )
>>> # Load from HuggingFace (default)
>>> summary = manager.get_run_summary()
>>> print(summary["val_auc"])  # Final validation AUC
>>>
>>> # Load from WandB instead
>>> summary = manager.get_run_summary(from_wandb=True)

load_model_from_checkpoint(checkpoint_name: Path | str | None = None) → object

Load the published model checkpoint from HuggingFace Hub.

RemoteEvaluationManager only contains the single published checkpoint, so checkpoint_name should not be provided.

Parameters:: checkpoint_name (Optional[Union[Path, str]], default=None) – Must be None. Remote models only have one checkpoint.
Returns:: The loaded model in evaluation mode
Return type:: LightningModule
Raises:: ValueError – If checkpoint_name is provided (not supported for remote models)

Examples

>>> manager = RemoteEvaluationManager.from_huggingface("shackett/sage-octopus")
>>> model = manager.load_model_from_checkpoint()

publish_to_huggingface(*args, **kwargs) → str: Not supported for remote models (already published).

_abc_impl = <_abc._abc_data object>

property best_checkpoint_path: Path | None: Not available for remote models.

property best_checkpoint_val_auc: float | None: Not available for remote models.

property checkpoint_dir: Path: Not available for remote models.

property experiment_dir: Path: Not available for remote models.

property repo_url: str: URL to the HuggingFace model repository.

napistu_torch.evaluation.manager._parse_checkpoint_filename(filename: str | Path) → tuple[int, float] | None

Extract epoch number and validation AUC from checkpoint filename.

Parameters:

filename (str | Path) – Checkpoint filename like “best-epoch=120-val_auc=0.7604.ckpt”

Returns:

epoch (int) – Epoch number
val_auc (float) – Validation AUC
Example – >>> _parse_checkpoint_filename(“best-epoch=120-val_auc=0.7604.ckpt”) (120, 0.7604)

napistu_torch.evaluation.manager._resolve_data_store_for_remote(data_store_dir: Path, experiment_config: ExperimentConfig, data_repo_id: str | None = None, data_revision: str | None = None, token: str | None = None) → NapistuDataStore

Resolve and load a NapistuDataStore for remote evaluation.

Handles three scenarios: 1. data_store_dir exists -> load existing store 2. data_store_dir missing + HF repo info available -> download from HF 3. data_store_dir missing + no HF info -> error

Parameters:

data_store_dir (Path) – Directory for the data store (may or may not exist)
experiment_config (ExperimentConfig) – Experiment configuration (may contain HF repo info)
data_repo_id (Optional[str]) – HuggingFace data repository ID (overrides config)
data_revision (Optional[str]) – Data revision (overrides config)
token (Optional[str]) – HuggingFace token for private repos

Returns:

Loaded data store

Return type:

NapistuDataStore

Raises:

ValueError – If data_store_dir doesn’t exist and no HF repo info is available

Examples

>>> # Existing store
>>> store = _resolve_data_store_for_remote(
...     data_store_dir=Path("./existing_store"),
...     experiment_config=config
... )
>>>
>>> # Download from HF
>>> store = _resolve_data_store_for_remote(
...     data_store_dir=Path("./new_store"),
...     experiment_config=config,
...     data_repo_id="username/data-repo"
... )

napistu_torch.evaluation.manager.find_best_checkpoint(checkpoint_dir: Path) → tuple[Path, float] | None: Get the best checkpoint from a directory of checkpoints.