napistu_torch.evaluation.manager
Manager for organizing experiments’ metadata, data, models, and evaluation results.
This module provides managers for accessing experiment artifacts, loading models, and managing experiment metadata for both local and remote (HuggingFace) experiments.
Classes
- EvaluationManager
Base class for evaluation managers.
- LocalEvaluationManager
Manager for local experiments with file system access.
- RemoteEvaluationManager
Manager for remote experiments stored on HuggingFace Hub.
Public Functions
- find_best_checkpoint(checkpoint_dir)
Find the best checkpoint in a directory based on validation metrics.
Functions
|
Get the best checkpoint from a directory of checkpoints. |
Classes
Base class for evaluation managers. |
|
|
Manager for post-training evaluation of a locally-stored model. |
|
Manager for evaluation of models loaded from HuggingFace Hub. |
- class napistu_torch.evaluation.manager.EvaluationManager
Bases:
ABCBase class for evaluation managers.
Provides a unified interface for accessing experiment artifacts, loading models, and managing experiment metadata. Both local and remote (HuggingFace) evaluation managers share this common interface built around the RunManifest abstraction.
- manifest
The experiment manifest containing metadata and configuration
- Type:
- napistu_data_store
Data store for accessing NapistuData objects and other artifacts
- Type:
Optional[NapistuDataStore]
- experiment_dict
Cached experiment dictionary with model, data module, trainer, etc.
- Type:
Optional[dict]
- Properties(derived from manifest)
- ----------------------------------
- experiment_config
The experiment configuration
- Type:
- experiment_name
Name of the experiment
- Type:
Optional[str]
- wandb_run_id
WandB run ID
- Type:
Optional[str]
- wandb_run_url
WandB run URL
- Type:
Optional[str]
- wandb_project
WandB project name
- Type:
Optional[str]
- wandb_entity
WandB entity (username/team)
- Type:
Optional[str]
- Public Methods
- --------------
- get_experiment_dict
Get the experiment dictionary with model, data module, trainer, etc.
- get_store
Get the NapistuDataStore for this experiment’s data
- get_summary_string
Generate a descriptive summary string from experiment metadata
- get_run_summary
Get summary metrics from WandB for this experiment
- load_model_from_checkpoint(checkpoint_name=None)
Load a trained model from a checkpoint file (abstract, subclass-specific)
- load_napistu_data(napistu_data_name=None)
Load the NapistuData object used for this experiment
- get_experiment_dict(skip_wandb: bool = False) dict
Get the experiment dictionary with all experiment components.
The experiment dictionary contains the model, data module, trainer, run manifest, and WandB logger. This is lazily loaded and cached.
- Parameters:
skip_wandb (bool, optional) – If True, skip creating WandB logger. Useful for remote models to avoid creating directories. Default is False.
- Returns:
Experiment dictionary containing: - data_module : Union[FullGraphDataModule, EdgeBatchDataModule] - model : pl.LightningModule (e.g., EdgePredictionLightning) - trainer : NapistuTrainer - run_manifest : RunManifest - wandb_logger : Optional[WandbLogger]
- Return type:
dict
Examples
>>> manager = LocalEvaluationManager("experiments/my_run") >>> experiment_dict = manager.get_experiment_dict() >>> model = experiment_dict[EXPERIMENT_DICT.MODEL]
- get_run_summary() dict
Get summary metrics from WandB for this experiment.
Retrieves the summary metrics (final values) from the WandB run associated with this experiment.
- Returns:
Dictionary containing summary metrics from WandB (e.g., final validation AUC, training loss, etc.)
- Return type:
dict
- Raises:
ValueError – If WandB run ID is not available
RuntimeError – If WandB API access fails
Examples
>>> manager = LocalEvaluationManager("experiments/my_run") >>> summary = manager.get_run_summary() >>> print(summary["val_auc"]) # Final validation AUC
- get_store() NapistuDataStore
Get the NapistuDataStore for this experiment’s data.
The data store is lazily loaded and cached. For LocalEvaluationManager, it loads from experiment_config.data.store_dir on first access. For RemoteEvaluationManager, it’s already loaded during initialization.
- Returns:
The data store instance for this experiment
- Return type:
- Raises:
ValueError – If no data store is available
Examples
>>> manager = LocalEvaluationManager("experiments/my_run") >>> store = manager.get_store() >>> napistu_data = store.load_napistu_data("edge_prediction") >>> >>> manager = RemoteEvaluationManager.from_huggingface( ... repo_id="shackett/sage-octopus", ... data_store_dir=Path("./store") ... ) >>> store = manager.get_store()
- get_summary_string() str
Generate a descriptive summary string from experiment metadata.
Examples: - “model: sage-octopus-baseline (sage-dot_product_h128_l3) | WandB: abc123” - “model: transe-256-hidden”
- Returns:
Formatted summary string
- Return type:
str
- abstractmethod load_model_from_checkpoint(checkpoint_name: Path | str | None = None) object
Load a trained model from a checkpoint file.
This method is implemented differently for local and remote managers: - LocalEvaluationManager: discovers and loads from checkpoint directory - RemoteEvaluationManager: loads the single published checkpoint from HuggingFace
- Parameters:
checkpoint_name (Optional[Union[Path, str]], default=None) – Checkpoint identifier (interpretation depends on subclass)
- Returns:
The loaded model in evaluation mode
- Return type:
LightningModule
- Raises:
NotImplementedError – This is an abstract method and must be implemented by subclasses
- load_napistu_data(napistu_data_name: str | None = None) NapistuData
Load the NapistuData object used for this experiment.
Loads the NapistuData object from the experiment’s data store. If no name is provided, uses the name from the experiment configuration.
- Parameters:
napistu_data_name (Optional[str], default=None) – Name of the NapistuData object to load. If None, uses the name from the experiment configuration.
- Returns:
The loaded NapistuData object
- Return type:
Examples
>>> manager = LocalEvaluationManager("experiments/my_run") >>> # Load using name from config >>> data = manager.load_napistu_data() >>> # Load specific artifact >>> data = manager.load_napistu_data("edge_prediction")
- _abc_impl = <_abc._abc_data object>
- property experiment_config: ExperimentConfig
Get the experiment configuration from the manifest.
- experiment_dict: dict | None
- property experiment_name: str | None
Get the experiment name from the manifest.
- manifest: RunManifest
- napistu_data_store: NapistuDataStore | None
- property wandb_entity: str | None
Get the WandB entity from the manifest.
- property wandb_project: str | None
Get the WandB project name from the manifest.
- property wandb_run_id: str | None
Get the WandB run ID from the manifest.
- property wandb_run_url: str | None
Get the WandB run URL from the manifest.
- class napistu_torch.evaluation.manager.LocalEvaluationManager(experiment_dir: Path | str)
Bases:
EvaluationManagerManager for post-training evaluation of a locally-stored model.
This class provides a unified interface for accessing experiment artifacts, loading models from checkpoints, publishing to HuggingFace Hub, and managing experiment metadata. It loads the experiment manifest from a local directory and provides convenient access to checkpoints, WandB information, and data stores.
- Parameters:
experiment_dir (Union[Path, str]) – Path to the experiment directory containing the manifest file and checkpoints. Must contain a run_manifest.yaml file.
- experiment_dir
Path to the experiment directory
- Type:
Path
- manifest
The experiment manifest containing metadata and configuration
- Type:
- checkpoint_dir
Directory containing model checkpoints
- Type:
Path
- best_checkpoint_path
Path to the best checkpoint (highest validation AUC)
- Type:
Optional[Path]
- best_checkpoint_val_auc
Validation AUC of the best checkpoint
- Type:
Optional[float]
- napistu_data_store
The data store instance (lazily loaded)
- Type:
Optional[NapistuDataStore]
- experiment_dict
Cached experiment dictionary (lazily loaded)
- Type:
Optional[dict]
- Public Methods
- --------------
- load_model_from_checkpoint(checkpoint_name=None)
Load a trained model from a checkpoint file
- publish_to_huggingface(repo_id, checkpoint_path=None, commit_message=None, overwrite=False, token=None)
Publish this experiment’s model to HuggingFace Hub
- Private Methods
- ---------------
- _resolve_checkpoint_path(checkpoint_name=None)
Resolve a checkpoint name or path to an actual checkpoint file path
Examples
>>> # Load an experiment >>> manager = LocalEvaluationManager("experiments/my_run") >>> >>> # Load the model from best checkpoint >>> model = manager.load_model_from_checkpoint() >>> >>> # Load from specific checkpoint >>> model = manager.load_model_from_checkpoint("last") >>> model = manager.load_model_from_checkpoint("best-epoch=50-val_auc=0.85.ckpt") >>> >>> # Get experiment summary >>> summary = manager.get_summary_string() >>> print(summary) # "model: sage-octopus-baseline (sage-dot_product_h128_l3) | WandB: abc123" >>> >>> # Publish to HuggingFace >>> url = manager.publish_to_huggingface("username/model-name")
- __init__(experiment_dir: Path | str)
Initialize LocalEvaluationManager from an experiment directory.
- Parameters:
experiment_dir (Union[Path, str]) – Path to experiment directory containing manifest and checkpoints. Must contain a run_manifest.yaml file.
- Raises:
FileNotFoundError – If experiment directory or manifest file doesn’t exist
ValueError – If manifest file is invalid or cannot be parsed
- _resolve_checkpoint_path(checkpoint_name: Path | str | None = None) Path
Resolve a checkpoint name or path to an actual checkpoint file path.
Handles various input formats: - None: Uses the best checkpoint (highest validation AUC) - String matching a checkpoint filename in checkpoint_dir (e.g., “last.ckpt”, “best-epoch=50-val_auc=0.85.ckpt”) - The string “last”: Resolves to “last.ckpt” in checkpoint_dir - A Path object or string path to a checkpoint file
- Parameters:
checkpoint_name (Optional[Union[Path, str]], default=None) – Checkpoint name or path. If None, uses best checkpoint. If a string, first checks if it matches a file in checkpoint_dir, otherwise treats it as a file path.
- Returns:
Resolved path to the checkpoint file
- Return type:
Path
- Raises:
ValueError – If no checkpoint is found and none is provided
FileNotFoundError – If the specified checkpoint file doesn’t exist
- load_model_from_checkpoint(checkpoint_name: Path | str | None = None) object
Load a trained model from a checkpoint file.
The checkpoint name can be: - None: Uses the best checkpoint (highest validation AUC) - A string matching a checkpoint filename in checkpoint_dir (e.g., “last.ckpt”, “best-epoch=50-val_auc=0.85.ckpt”) - The string “last”: Resolves to “last.ckpt” in checkpoint_dir - A Path object or string path to a checkpoint file
- Parameters:
checkpoint_name (Optional[Union[Path, str]], default=None) – Checkpoint name or path. If None, uses best checkpoint. If a string, first checks if it matches a file in checkpoint_dir, otherwise treats it as a file path.
- Returns:
The loaded model in evaluation mode
- Return type:
LightningModule
- Raises:
ValueError – If no checkpoint is found and none is provided
FileNotFoundError – If the specified checkpoint file doesn’t exist
Examples
>>> manager = LocalEvaluationManager("experiments/my_run") >>> >>> # Load from best checkpoint >>> model = manager.load_model_from_checkpoint() >>> >>> # Load from last checkpoint >>> model = manager.load_model_from_checkpoint("last") >>> >>> # Load from specific checkpoint by name >>> model = manager.load_model_from_checkpoint("best-epoch=50-val_auc=0.85.ckpt") >>> >>> # Load from absolute path >>> model = manager.load_model_from_checkpoint("/path/to/checkpoint.ckpt")
- publish_to_huggingface(repo_id: str, checkpoint_path: Path | None = None, commit_message: str | None = None, overwrite: bool = False, token: str | None = None, tag: str | None = None, tag_message: str | None = None) str
Publish this experiment’s model to HuggingFace Hub.
Creates a private repository if it doesn’t exist. Repositories can be made public manually on huggingface.co after curation.
- Parameters:
repo_id (str) – Repository ID in format “username/repo-name”
checkpoint_path (Optional[Path]) – Checkpoint to publish. If None, uses best checkpoint.
commit_message (Optional[str]) – Custom commit message (default: auto-generated)
overwrite (bool) – Explicitly confirm overwriting existing model (default: False)
token (Optional[str]) – HuggingFace API token (default: uses huggingface-cli login token)
tag (Optional[str]) – Tag name to create after all assets are uploaded (e.g., “v1.0”)
tag_message (Optional[str]) – Optional message for the tag
- Returns:
URL to the published model on HuggingFace Hub
- Return type:
str
Examples
>>> manager = LocalEvaluationManager("experiments/my_run") >>> # First upload >>> url = manager.publish_to_huggingface("shackett/napistu-sage-octopus") >>> # Update same repo >>> url = manager.publish_to_huggingface("shackett/napistu-sage-octopus", overwrite=True)
- _abc_impl = <_abc._abc_data object>
- class napistu_torch.evaluation.manager.RemoteEvaluationManager(repo_id: str, model_loader: HFModelLoader, data_store: NapistuDataStore | None = None)
Bases:
EvaluationManagerManager for evaluation of models loaded from HuggingFace Hub.
This class provides evaluation capabilities for models published to HuggingFace, loading the model checkpoint, configuration, and optionally data from remote repositories. It shares the same interface as LocalEvaluationManager but with remote-specific implementation details.
- Parameters:
repo_id (str) – HuggingFace model repository ID (e.g., “username/model-name”)
model_loader (HFModelLoader) – Loader instance with downloaded model artifacts
data_store (Optional[NapistuDataStore]) – Data store for accessing NapistuData objects (optional)
- repo_id
HuggingFace repository ID
- Type:
str
- revision
Git revision (branch, tag, or commit) used for loading
- Type:
str
- manifest
Reconstructed run manifest from HuggingFace artifacts
- Type:
- checkpoint
Loaded model checkpoint
- Type:
- checkpoint_path
Path to cached checkpoint file
- Type:
Path
- napistu_data_store
The data store instance (may be None)
- Type:
Optional[NapistuDataStore]
- experiment_dict
Cached experiment dictionary (lazily loaded)
- Type:
Optional[dict]
- Public Methods
- --------------
- from_huggingface(repo_id, revision=None, data_repo_id=None, data_store_dir=None, ...)
Load a model and optionally data from HuggingFace Hub
- load_model_from_checkpoint(checkpoint_name=None)
Load the published model checkpoint
- Properties(Remote-specific, raise AttributeError)
- --------------------------------------------------
- experiment_dir
Not available for remote models
- checkpoint_dir
Not available for remote models
- best_checkpoint_path
Not available for remote models (only one checkpoint exists)
Examples
>>> # Load model only >>> manager = RemoteEvaluationManager.from_huggingface( ... repo_id="shackett/sage-octopus" ... ) >>> >>> # Load model with data >>> manager = RemoteEvaluationManager.from_huggingface( ... repo_id="shackett/sage-octopus", ... data_repo_id="shackett/octopus-consensus-v1" ... ) >>> >>> # Load model and use it >>> model = manager.load_model_from_checkpoint() >>> summary = manager.get_summary_string()
- classmethod from_huggingface(repo_id: str, data_store_dir: str | Path, revision: str | None = None, data_repo_id: str | None = None, data_revision: str | None = None, model_cache_dir: Path | None = None, token: str | None = None) RemoteEvaluationManager
Load a model and data from HuggingFace Hub.
- Parameters:
repo_id (str) – Model repository ID (e.g., “shackett/sage-octopus”)
data_store_dir (Union[str, Path]) – Directory for the data store. Can be a string (e.g., “~/data/store”) or Path. Tildes (~) will be expanded to the user’s home directory. If it exists, will be loaded as-is. If it doesn’t exist, will be downloaded from HuggingFace using data_repo_id (or config.data.hf_repo_id if not provided).
revision (Optional[str]) – Model revision (branch/tag/commit). Default: “main”
data_repo_id (Optional[str]) – Data repository ID. If None and data_store_dir doesn’t exist, will try to use config.data.hf_repo_id.
data_revision (Optional[str]) – Data revision (branch/tag/commit). Default: uses config.data.hf_revision or “main” if not specified.
model_cache_dir (Optional[Path]) – Where to cache model files. Default: HF default cache
token (Optional[str]) – HuggingFace token for private repos
- Returns:
Manager instance with model and data loaded
- Return type:
- Raises:
ValueError – If data_store_dir doesn’t exist and no HF repo info is available
Examples
>>> # Use existing local data store >>> manager = RemoteEvaluationManager.from_huggingface( ... repo_id="shackett/sage-octopus", ... data_store_dir=Path("./existing_store") ... ) >>> >>> # Download data from HF to new location >>> manager = RemoteEvaluationManager.from_huggingface( ... repo_id="shackett/sage-octopus", ... data_store_dir=Path("./new_store"), ... data_repo_id="shackett/octopus-consensus-v1" ... ) >>> >>> # Let config determine data repo (if available) >>> manager = RemoteEvaluationManager.from_huggingface( ... repo_id="shackett/sage-octopus", ... data_store_dir=Path("./new_store") ... ) >>> >>> # Pinned versions >>> manager = RemoteEvaluationManager.from_huggingface( ... repo_id="shackett/sage-octopus", ... revision="v1.0", ... data_store_dir=Path("./store"), ... data_repo_id="shackett/octopus-consensus-v1", ... data_revision="v1.0" ... )
- __init__(repo_id: str, model_loader: HFModelLoader, data_store: NapistuDataStore | None = None)
Initialize RemoteEvaluationManager from HuggingFace artifacts.
- Parameters:
repo_id (str) – HuggingFace repository ID
model_loader (HFModelLoader) – Loader instance with downloaded model artifacts
data_store (Optional[NapistuDataStore]) – Data store for accessing NapistuData objects
- get_run_summary(from_wandb: bool = False) dict
Get summary metrics from HuggingFace (default) or WandB for this experiment.
By default, retrieves the summary metrics from the HuggingFace repository by loading the wandb_run_info.yaml file. This avoids needing WandB API access for remote models. Optionally, can load directly from WandB if from_wandb=True.
- Parameters:
from_wandb (bool, optional) – If True, load summary from WandB API instead of HuggingFace. Default is False (load from HuggingFace).
- Returns:
Dictionary containing summary metrics (e.g., final validation AUC, training loss, etc.)
- Return type:
dict
- Raises:
RuntimeError – If HuggingFace API access fails or run info is not available
ValueError – If from_wandb=True but WandB run ID is not available
RuntimeError – If WandB API access fails when from_wandb=True
Examples
>>> manager = RemoteEvaluationManager.from_huggingface( ... repo_id="shackett/sage-octopus", ... data_store_dir=Path("./store") ... ) >>> # Load from HuggingFace (default) >>> summary = manager.get_run_summary() >>> print(summary["val_auc"]) # Final validation AUC >>> >>> # Load from WandB instead >>> summary = manager.get_run_summary(from_wandb=True)
- load_model_from_checkpoint(checkpoint_name: Path | str | None = None) object
Load the published model checkpoint from HuggingFace Hub.
RemoteEvaluationManager only contains the single published checkpoint, so checkpoint_name should not be provided.
- Parameters:
checkpoint_name (Optional[Union[Path, str]], default=None) – Must be None. Remote models only have one checkpoint.
- Returns:
The loaded model in evaluation mode
- Return type:
LightningModule
- Raises:
ValueError – If checkpoint_name is provided (not supported for remote models)
Examples
>>> manager = RemoteEvaluationManager.from_huggingface("shackett/sage-octopus") >>> model = manager.load_model_from_checkpoint()
- publish_to_huggingface(*args, **kwargs) str
Not supported for remote models (already published).
- _abc_impl = <_abc._abc_data object>
- property best_checkpoint_path: Path | None
Not available for remote models.
- property best_checkpoint_val_auc: float | None
Not available for remote models.
- property checkpoint_dir: Path
Not available for remote models.
- property experiment_dir: Path
Not available for remote models.
- property repo_url: str
URL to the HuggingFace model repository.
- napistu_torch.evaluation.manager._parse_checkpoint_filename(filename: str | Path) tuple[int, float] | None
Extract epoch number and validation AUC from checkpoint filename.
- Parameters:
filename (str | Path) – Checkpoint filename like “best-epoch=120-val_auc=0.7604.ckpt”
- Returns:
epoch (int) – Epoch number
val_auc (float) – Validation AUC
Example – >>> _parse_checkpoint_filename(“best-epoch=120-val_auc=0.7604.ckpt”) (120, 0.7604)
- napistu_torch.evaluation.manager._resolve_data_store_for_remote(data_store_dir: Path, experiment_config: ExperimentConfig, data_repo_id: str | None = None, data_revision: str | None = None, token: str | None = None) NapistuDataStore
Resolve and load a NapistuDataStore for remote evaluation.
Handles three scenarios: 1. data_store_dir exists -> load existing store 2. data_store_dir missing + HF repo info available -> download from HF 3. data_store_dir missing + no HF info -> error
- Parameters:
data_store_dir (Path) – Directory for the data store (may or may not exist)
experiment_config (ExperimentConfig) – Experiment configuration (may contain HF repo info)
data_repo_id (Optional[str]) – HuggingFace data repository ID (overrides config)
data_revision (Optional[str]) – Data revision (overrides config)
token (Optional[str]) – HuggingFace token for private repos
- Returns:
Loaded data store
- Return type:
- Raises:
ValueError – If data_store_dir doesn’t exist and no HF repo info is available
Examples
>>> # Existing store >>> store = _resolve_data_store_for_remote( ... data_store_dir=Path("./existing_store"), ... experiment_config=config ... ) >>> >>> # Download from HF >>> store = _resolve_data_store_for_remote( ... data_store_dir=Path("./new_store"), ... experiment_config=config, ... data_repo_id="username/data-repo" ... )
- napistu_torch.evaluation.manager.find_best_checkpoint(checkpoint_dir: Path) tuple[Path, float] | None
Get the best checkpoint from a directory of checkpoints.