napistu_torch.data.compare_napistu_data
Utilities for comparing and validating NapistuData compatibility.
This module provides functions for validating that two NapistuData objects are compatible for training/inference, including feature alignment, split consistency, and structural compatibility checks.
Public Functions
- validate_same_data(current_summary, reference_summary, allow_missing_keys=None, verbose=False)
Validate that data summaries from reference and current data are compatible.
Functions
|
Validate that data summaries from reference and current data are compatible. |
- napistu_torch.data.compare_napistu_data._is_comparable_value(val1: Any, val2: Any) bool
Check if two values should be compared for equality.
Only compares simple scalar types (int, float, str, bool). Skips lists, dicts, None, and other complex types.
- Parameters:
val1 (Any) – First value
val2 (Any) – Second value
- Returns:
True if values should be compared, False otherwise
- Return type:
bool
- napistu_torch.data.compare_napistu_data._validate_feature_aliases(current_aliases: Dict[str, str] | None, reference_aliases: Dict[str, str] | None, reference_feature_names: List[str] | None, feature_type: str, verbose: bool = False) None
Validate feature aliases match and reference valid canonical features.
- Parameters:
current_aliases (Optional[Dict[str, str]]) – Aliases from current data
reference_aliases (Optional[Dict[str, str]]) – Aliases from reference
reference_feature_names (Optional[List[str]]) – Feature names from reference (for validating canonical references)
feature_type (str) – Type of features (‘vertex’ or ‘edge’) for error messages
verbose (bool) – Whether to print verbose output
- Return type:
None
- Raises:
ValueError – If aliases don’t match or reference invalid canonical features
- napistu_torch.data.compare_napistu_data._validate_feature_names(current_names: List[str] | None, reference_names: List[str] | None, feature_type: str, verbose: bool = False) None
Validate feature names match exactly (identical order).
- Parameters:
reference_names (Optional[List[str]]) – Feature names from reference
current_names (Optional[List[str]]) – Feature names from current data
feature_type (str) – Type of features (‘vertex’ or ‘edge’) for error messages
verbose (bool) – Whether to print verbose output
- Return type:
None
- Raises:
ValueError – If feature names don’t match exactly or are in different order
- napistu_torch.data.compare_napistu_data._validate_keys(current_summary: Dict[str, Any], reference_summary: Dict[str, Any], allow_missing_keys: List[str]) None
Validate that required keys are present in both summaries.
- Parameters:
current_summary (Dict[str, Any]) – Data summary from current NapistuData
reference_summary (Dict[str, Any]) – Data summary from reference
allow_missing_keys (List[str]) – Keys allowed to be missing in either summary
- Raises:
ValueError – If required keys are missing
- napistu_torch.data.compare_napistu_data._validate_mask_hashes(current_summary: Dict[str, Any], reference_summary: Dict[str, Any], verbose: bool = False) None
Validate train/val/test mask hashes match (warns on mismatch).
This checks whether the data splits are identical between reference and current data. Mismatches are logged as warnings rather than errors since different splits may be intentional for evaluation.
- Parameters:
current_summary (Dict[str, Any]) – Data summary from current NapistuData
reference_summary (Dict[str, Any]) – Data summary from reference
verbose (bool) – Whether to print verbose output
- Return type:
None
- napistu_torch.data.compare_napistu_data._validate_relation_labels(reference_labels: List[str] | None, current_labels: List[str] | None) None
Validate relation type labels match exactly (identical order).
- Parameters:
reference_labels (Optional[List[str]]) – Relation labels from reference
current_labels (Optional[List[str]]) – Relation labels from current data
- Return type:
None
- Raises:
ValueError – If relation labels don’t match exactly or are in different order
- napistu_torch.data.compare_napistu_data._validate_structural_attributes(current_summary: Dict[str, Any], reference_summary: Dict[str, Any]) None
Validate that structural numeric attributes match.
Checks that core attributes like num_nodes, num_edges, etc. have identical values in both summaries.
- Parameters:
current_summary (Dict[str, Any]) – Data summary from current NapistuData
reference_summary (Dict[str, Any]) – Data summary from reference
- Return type:
None
- Raises:
ValueError – If any structural attributes don’t match
- napistu_torch.data.compare_napistu_data.validate_same_data(current_summary: Dict[str, Any], reference_summary: Dict[str, Any], allow_missing_keys: List[str] | None = None, verbose: bool = False) None
Validate that data summaries from reference and current data are compatible.
Performs comprehensive validation including: - Structural attributes (num_nodes, num_edges, num_features) - Feature names and ordering (vertex and edge) - Feature aliases and canonical mappings - Relation type labels - Train/val/test split consistency (warnings only)
- Parameters:
current_summary (Dict[str, Any]) – Data summary from current NapistuData (e.g., inference data)
reference_summary (Dict[str, Any]) – Data summary from reference (e.g., checkpoint training data)
allow_missing_keys (Optional[List[str]]) – Keys that are allowed to be missing in either summary. If present in both, values must still match. Defaults to [num_edge_features, num_unique_relations, num_unique_classes, train_mask_hash, val_mask_hash, test_mask_hash]
verbose (bool) – Whether to print verbose output
- Return type:
None
- Raises:
ValueError – If summaries are incompatible in any way
Examples
>>> current_summary = napistu_data.get_summary("validation") >>> reference_summary = checkpoint.get_data_summary() >>> validate_same_data(current_summary, reference_summary)