napistu_torch.data.compare_napistu_data

Utilities for comparing and validating NapistuData compatibility.

This module provides functions for validating that two NapistuData objects are compatible for training/inference, including feature alignment, split consistency, and structural compatibility checks.

Public Functions

validate_same_data(current_summary, reference_summary, allow_missing_keys=None, verbose=False)

Validate that data summaries from reference and current data are compatible.

Functions

validate_same_data(current_summary, ...[, ...])

Validate that data summaries from reference and current data are compatible.

napistu_torch.data.compare_napistu_data._is_comparable_value(val1: Any, val2: Any) bool

Check if two values should be compared for equality.

Only compares simple scalar types (int, float, str, bool). Skips lists, dicts, None, and other complex types.

Parameters:
  • val1 (Any) – First value

  • val2 (Any) – Second value

Returns:

True if values should be compared, False otherwise

Return type:

bool

napistu_torch.data.compare_napistu_data._validate_feature_aliases(current_aliases: Dict[str, str] | None, reference_aliases: Dict[str, str] | None, reference_feature_names: List[str] | None, feature_type: str, verbose: bool = False) None

Validate feature aliases match and reference valid canonical features.

Parameters:
  • current_aliases (Optional[Dict[str, str]]) – Aliases from current data

  • reference_aliases (Optional[Dict[str, str]]) – Aliases from reference

  • reference_feature_names (Optional[List[str]]) – Feature names from reference (for validating canonical references)

  • feature_type (str) – Type of features (‘vertex’ or ‘edge’) for error messages

  • verbose (bool) – Whether to print verbose output

Return type:

None

Raises:

ValueError – If aliases don’t match or reference invalid canonical features

napistu_torch.data.compare_napistu_data._validate_feature_names(current_names: List[str] | None, reference_names: List[str] | None, feature_type: str, verbose: bool = False) None

Validate feature names match exactly (identical order).

Parameters:
  • reference_names (Optional[List[str]]) – Feature names from reference

  • current_names (Optional[List[str]]) – Feature names from current data

  • feature_type (str) – Type of features (‘vertex’ or ‘edge’) for error messages

  • verbose (bool) – Whether to print verbose output

Return type:

None

Raises:

ValueError – If feature names don’t match exactly or are in different order

napistu_torch.data.compare_napistu_data._validate_keys(current_summary: Dict[str, Any], reference_summary: Dict[str, Any], allow_missing_keys: List[str]) None

Validate that required keys are present in both summaries.

Parameters:
  • current_summary (Dict[str, Any]) – Data summary from current NapistuData

  • reference_summary (Dict[str, Any]) – Data summary from reference

  • allow_missing_keys (List[str]) – Keys allowed to be missing in either summary

Raises:

ValueError – If required keys are missing

napistu_torch.data.compare_napistu_data._validate_mask_hashes(current_summary: Dict[str, Any], reference_summary: Dict[str, Any], verbose: bool = False) None

Validate train/val/test mask hashes match (warns on mismatch).

This checks whether the data splits are identical between reference and current data. Mismatches are logged as warnings rather than errors since different splits may be intentional for evaluation.

Parameters:
  • current_summary (Dict[str, Any]) – Data summary from current NapistuData

  • reference_summary (Dict[str, Any]) – Data summary from reference

  • verbose (bool) – Whether to print verbose output

Return type:

None

napistu_torch.data.compare_napistu_data._validate_relation_labels(reference_labels: List[str] | None, current_labels: List[str] | None) None

Validate relation type labels match exactly (identical order).

Parameters:
  • reference_labels (Optional[List[str]]) – Relation labels from reference

  • current_labels (Optional[List[str]]) – Relation labels from current data

Return type:

None

Raises:

ValueError – If relation labels don’t match exactly or are in different order

napistu_torch.data.compare_napistu_data._validate_structural_attributes(current_summary: Dict[str, Any], reference_summary: Dict[str, Any]) None

Validate that structural numeric attributes match.

Checks that core attributes like num_nodes, num_edges, etc. have identical values in both summaries.

Parameters:
  • current_summary (Dict[str, Any]) – Data summary from current NapistuData

  • reference_summary (Dict[str, Any]) – Data summary from reference

Return type:

None

Raises:

ValueError – If any structural attributes don’t match

napistu_torch.data.compare_napistu_data.validate_same_data(current_summary: Dict[str, Any], reference_summary: Dict[str, Any], allow_missing_keys: List[str] | None = None, verbose: bool = False) None

Validate that data summaries from reference and current data are compatible.

Performs comprehensive validation including: - Structural attributes (num_nodes, num_edges, num_features) - Feature names and ordering (vertex and edge) - Feature aliases and canonical mappings - Relation type labels - Train/val/test split consistency (warnings only)

Parameters:
  • current_summary (Dict[str, Any]) – Data summary from current NapistuData (e.g., inference data)

  • reference_summary (Dict[str, Any]) – Data summary from reference (e.g., checkpoint training data)

  • allow_missing_keys (Optional[List[str]]) – Keys that are allowed to be missing in either summary. If present in both, values must still match. Defaults to [num_edge_features, num_unique_relations, num_unique_classes, train_mask_hash, val_mask_hash, test_mask_hash]

  • verbose (bool) – Whether to print verbose output

Return type:

None

Raises:

ValueError – If summaries are incompatible in any way

Examples

>>> current_summary = napistu_data.get_summary("validation")
>>> reference_summary = checkpoint.get_data_summary()
>>> validate_same_data(current_summary, reference_summary)