napistu_torch.data.compare_napistu_data

Utilities for comparing and validating NapistuData compatibility.

This module provides functions for validating that two NapistuData objects are compatible for training/inference, including feature alignment, split consistency, and structural compatibility checks.

Public Functions

validate_same_data(current_summary, reference_summary, allow_missing_keys=None, verbose=False): Validate that data summaries from reference and current data are compatible.

Functions

validate_same_data(current_summary, ...[, ...])

Validate that data summaries from reference and current data are compatible.

napistu_torch.data.compare_napistu_data._is_comparable_value(val1: Any, val2: Any) → bool

Check if two values should be compared for equality.

Only compares simple scalar types (int, float, str, bool). Skips lists, dicts, None, and other complex types.

Parameters:

val1 (Any) – First value
val2 (Any) – Second value

Returns:

True if values should be compared, False otherwise

Return type:

bool

napistu_torch.data.compare_napistu_data._validate_feature_aliases(current_aliases: Dict[str, str] | None, reference_aliases: Dict[str, str] | None, reference_feature_names: List[str] | None, feature_type: str, verbose: bool = False) → None

Validate feature aliases match and reference valid canonical features.

Parameters:

current_aliases (Optional[Dict[str, str]]) – Aliases from current data
reference_aliases (Optional[Dict[str, str]]) – Aliases from reference
reference_feature_names (Optional[List[str]]) – Feature names from reference (for validating canonical references)
feature_type (str) – Type of features (‘vertex’ or ‘edge’) for error messages
verbose (bool) – Whether to print verbose output

Return type:

None

Raises:

ValueError – If aliases don’t match or reference invalid canonical features

napistu_torch.data.compare_napistu_data._validate_feature_names(current_names: List[str] | None, reference_names: List[str] | None, feature_type: str, verbose: bool = False) → None

Validate feature names match exactly (identical order).

Parameters:

reference_names (Optional[List[str]]) – Feature names from reference
current_names (Optional[List[str]]) – Feature names from current data
feature_type (str) – Type of features (‘vertex’ or ‘edge’) for error messages
verbose (bool) – Whether to print verbose output

Return type:

None

Raises:

ValueError – If feature names don’t match exactly or are in different order

napistu_torch.data.compare_napistu_data._validate_keys(current_summary: Dict[str, Any], reference_summary: Dict[str, Any], allow_missing_keys: List[str]) → None

Validate that required keys are present in both summaries.

Parameters:

current_summary (Dict[str, Any]) – Data summary from current NapistuData
reference_summary (Dict[str, Any]) – Data summary from reference
allow_missing_keys (List[str]) – Keys allowed to be missing in either summary

Raises:

ValueError – If required keys are missing

napistu_torch.data.compare_napistu_data._validate_mask_hashes(current_summary: Dict[str, Any], reference_summary: Dict[str, Any], verbose: bool = False) → None

Validate train/val/test mask hashes match (warns on mismatch).

This checks whether the data splits are identical between reference and current data. Mismatches are logged as warnings rather than errors since different splits may be intentional for evaluation.

Parameters:

current_summary (Dict[str, Any]) – Data summary from current NapistuData
reference_summary (Dict[str, Any]) – Data summary from reference
verbose (bool) – Whether to print verbose output

Return type:

None

napistu_torch.data.compare_napistu_data._validate_relation_labels(reference_labels: List[str] | None, current_labels: List[str] | None) → None

Validate relation type labels match exactly (identical order).

Parameters:

reference_labels (Optional[List[str]]) – Relation labels from reference
current_labels (Optional[List[str]]) – Relation labels from current data

Return type:

None

Raises:

ValueError – If relation labels don’t match exactly or are in different order

napistu_torch.data.compare_napistu_data._validate_structural_attributes(current_summary: Dict[str, Any], reference_summary: Dict[str, Any]) → None

Validate that structural numeric attributes match.

Checks that core attributes like num_nodes, num_edges, etc. have identical values in both summaries.

Parameters:

current_summary (Dict[str, Any]) – Data summary from current NapistuData
reference_summary (Dict[str, Any]) – Data summary from reference

Return type:

None

Raises:

ValueError – If any structural attributes don’t match

napistu_torch.data.compare_napistu_data.validate_same_data(current_summary: Dict[str, Any], reference_summary: Dict[str, Any], allow_missing_keys: List[str] | None = None, verbose: bool = False) → None

Validate that data summaries from reference and current data are compatible.

Performs comprehensive validation including: - Structural attributes (num_nodes, num_edges, num_features) - Feature names and ordering (vertex and edge) - Feature aliases and canonical mappings - Relation type labels - Train/val/test split consistency (warnings only)

Parameters:

current_summary (Dict[str, Any]) – Data summary from current NapistuData (e.g., inference data)
reference_summary (Dict[str, Any]) – Data summary from reference (e.g., checkpoint training data)
allow_missing_keys (Optional[List[str]]) – Keys that are allowed to be missing in either summary. If present in both, values must still match. Defaults to [num_edge_features, num_unique_relations, num_unique_classes, train_mask_hash, val_mask_hash, test_mask_hash]
verbose (bool) – Whether to print verbose output

Return type:

None

Raises:

ValueError – If summaries are incompatible in any way

Examples

>>> current_summary = napistu_data.get_summary("validation")
>>> reference_summary = checkpoint.get_data_summary()
>>> validate_same_data(current_summary, reference_summary)