napistu_torch.load.encoding_manager
Configuration management for DataFrame encoding transformations.
This module provides configuration management for DataFrame encoding transformations, allowing flexible specification of how columns should be encoded.
Classes
- EncodingManager
Configuration manager for DataFrame encoding transformations.
- TransformConfig
Configuration for a single transform.
- EncodingConfig
Complex encoding configuration format.
- SimpleEncodingConfig
Simple encoding configuration format.
Public Functions
- detect_config_format(config)
Detect whether a config dict is in simple or complex format.
Functions
|
Detect whether a config dict is in simple or complex format. |
Classes
|
Complete encoding configuration with conflict validation. |
|
Configuration manager for DataFrame encoding transformations. |
|
Simple encoding configuration format validator. |
|
Configuration for a single transformation. |
- class napistu_torch.load.encoding_manager.EncodingConfig(root: RootModelRootType = PydanticUndefined)
Bases:
RootModel[Dict[str, TransformConfig]]Complete encoding configuration with conflict validation.
- Parameters:
root (Dict[str, TransformConfig]) – Dictionary mapping transform names to their configurations.
- check_no_column_conflicts()
Ensure no column appears in multiple transforms.
- _abc_impl = <_abc._abc_data object>
- model_config = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class napistu_torch.load.encoding_manager.EncodingManager(config: Dict[str, Dict] | Dict[str, set], encoders: Dict[str, Any] | None = None)
Bases:
objectConfiguration manager for DataFrame encoding transformations.
This class manages encoding configurations, validates them, and provides utilities for inspecting and composing configurations.
- Parameters:
config (Dict[str, Dict] or Dict[str, set]) –
Encoding configuration dictionary. Supports two formats:
- Complex format (when encoders=None):
Each key is a transform name and each value is a dict with ‘columns’ and ‘transformer’ keys. Example: {
- ’categorical’: {
‘columns’: [‘col1’, ‘col2’], ‘transformer’: OneHotEncoder()
}, ‘numerical’: {
’columns’: [‘col3’], ‘transformer’: StandardScaler()
}
}
- Simple format (when encoders is provided):
Each key is an encoding type and each value is a set/list of column names. Example: {
’categorical’: {‘col1’, ‘col2’}, ‘numerical’: {‘col3’}
}
encoders (Dict[str, Any], optional) –
Mapping from encoding type to transformer instance. Only used with simple format. If provided, config is treated as simple format and converted to complex format internally. Example: {
’categorical’: OneHotEncoder(), ‘numerical’: StandardScaler()
}
- config_
The validated configuration dictionary (always in complex format).
- Type:
Dict[str, Dict]
- compose(override_config, verbose=False)
Compose this configuration with another configuration using merge strategy.
- ensure(config, encoders=None)
Class method to ensure config is an EncodingManager instance. Supports both simple and complex dict formats via encoders parameter.
- get_config()
Get the encoding configuration dictionary.
- get_encoding_table()
Get a summary table of all configured transformations.
- log_summary()
Log a summary of all configured transformations.
- validate(config)
Validate a configuration dictionary.
- Private Methods
- ---------------
- _create_encoding_table(config)
Create transform table from validated config.
- Raises:
ValueError – If the configuration is invalid or has column conflicts.
Examples
Complex format:
>>> from sklearn.preprocessing import OneHotEncoder, StandardScaler >>> >>> config_dict = { ... 'categorical': { ... 'columns': ['category'], ... 'transformer': OneHotEncoder(sparse_output=False) ... }, ... 'numerical': { ... 'columns': ['value'], ... 'transformer': StandardScaler() ... } ... } >>> >>> config = EncodingManager(config_dict) >>> config.log_summary() >>> print(config.get_encoding_table())
Simple format:
>>> simple_spec = { ... 'categorical': {'category'}, ... 'numerical': {'value'} ... } >>> encoders = { ... 'categorical': OneHotEncoder(sparse_output=False), ... 'numerical': StandardScaler() ... } >>> config = EncodingManager(simple_spec, encoders=encoders) >>> print(config.get_encoding_table())
- classmethod ensure(config: dict | EncodingManager, encoders: Dict[str, Any] | None = None) EncodingManager
Ensure that config is an EncodingManager object.
If config is a dict, it will be converted to an EncodingManager. If it’s already an EncodingManager, it will be returned as-is.
- Parameters:
config (Union[dict, EncodingManager]) – Either a dict (simple or complex format) or an EncodingManager object.
encoders (Dict[str, Any], optional) – Mapping from encoding type to transformer instance. Only used when config is a dict in simple format. Ignored if config is already an EncodingManager.
- Returns:
The EncodingManager object
- Return type:
- Raises:
ValueError – If config is neither a dict nor an EncodingManager
Examples
Complex format dict:
>>> config = EncodingManager.ensure({ ... "foo": {"columns": ["bar"], "transformer": StandardScaler()} ... }) >>> isinstance(config, EncodingManager) True
Simple format dict:
>>> config = EncodingManager.ensure( ... {"categorical": {"col1", "col2"}}, ... encoders={"categorical": OneHotEncoder()} ... ) >>> isinstance(config, EncodingManager) True
EncodingManager passthrough:
>>> manager = EncodingManager({"foo": {"columns": ["bar"], "transformer": StandardScaler()}}) >>> result = EncodingManager.ensure(manager) >>> result is manager True
- static _convert_simple_to_complex(simple_spec: Dict[str, set], encoders: Dict[str, Any]) Dict[str, Dict]
Convert simple spec format to complex format.
- Parameters:
simple_spec (Dict[str, set]) – Mapping from encoding type to set of column names.
encoders (Dict[str, Any]) – Mapping from encoding type to transformer instance.
- Returns:
Complex format configuration.
- Return type:
Dict[str, Dict]
- __init__(config: Dict[str, Dict] | Dict[str, set], encoders: Dict[str, Any] | None = None)
- _create_encoding_table(config: Dict[str, TransformConfig]) DataFrame
Create transform table from validated config.
- Parameters:
config (Dict[str, TransformConfig]) – Dictionary mapping transform names to TransformConfig objects.
- Returns:
DataFrame with columns ‘transform_name’, ‘column’, and ‘transformer_type’.
- Return type:
pd.DataFrame
- compose(override_config: EncodingConfig, verbose: bool = False) EncodingConfig
Compose this configuration with another configuration using merge strategy.
Merges configs at the transform level. For cross-config column conflicts, the override config takes precedence while preserving non-conflicted columns from this (base) config.
- Parameters:
override_config (EncodingConfig) – Configuration to merge in, taking precedence over this config.
verbose (bool, default=False) – If True, log detailed information about conflicts and final transformations.
- Returns:
New EncodingConfig instance with the composed configuration.
- Return type:
Examples
>>> base = EncodingConfig({'num': {'columns': ['a', 'b'], 'transformer': StandardScaler()}}) >>> override = EncodingConfig({'cat': {'columns': ['c'], 'transformer': OneHotEncoder()}}) >>> composed = base.compose(override) >>> print(composed) # EncodingConfig(transforms=2, columns=3)
- get_config() Dict[str, Dict]
Get the encoding configuration dictionary.
- Returns:
The validated configuration dictionary in complex format.
- Return type:
Dict[str, Dict]
- get_encoding_table() DataFrame
Get a summary table of all configured transformations.
- Returns:
DataFrame with columns ‘transform_name’, ‘column’, and ‘transformer_type’ showing which columns are assigned to which transformers.
- Return type:
pd.DataFrame
Examples
>>> config = EncodingConfig(config_dict) >>> table = config.get_encoding_table() >>> print(table) transform_name column transformer_type 0 categorical col1 OneHotEncoder 1 categorical col2 OneHotEncoder 2 numerical col3 StandardScaler
- log_summary() None
Log a summary of all configured transformations.
Logs one message per transformation showing the transformer type and the columns it will transform.
Examples
>>> config = EncodingConfig(config_dict) >>> config.log_summary() INFO:__main__:categorical (OneHotEncoder): ['col1', 'col2'] INFO:__main__:numerical (StandardScaler): ['col3']
- validate(config: Dict[str, Dict]) Dict[str, Dict]
Validate a configuration dictionary.
- Parameters:
config (Dict[str, Dict]) – Configuration dictionary to validate.
- Returns:
The validated configuration dictionary (same as input if valid).
- Return type:
Dict[str, Dict]
- Raises:
ValueError – If configuration structure is invalid or column conflicts exist.
Examples
>>> config_mgr = EncodingConfig({}) >>> validated = config_mgr.validate(config_dict)
- class napistu_torch.load.encoding_manager.SimpleEncodingConfig(root: RootModelRootType = PydanticUndefined)
Bases:
RootModel[Dict[str, Union[List[str], set]]]Simple encoding configuration format validator.
Validates that each value is a list or set of column names (strings).
- Parameters:
root (Dict[str, Union[List[str], set]]) – Dictionary mapping transform names to column name collections.
- validate_all_values_are_column_collections()
Ensure all values are lists or sets of strings.
- _abc_impl = <_abc._abc_data object>
- model_config = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class napistu_torch.load.encoding_manager.TransformConfig(*, columns: Annotated[list[str], MinLen(min_length=1)], transformer: Any)
Bases:
BaseModelConfiguration for a single transformation.
- Parameters:
columns (List[str]) – Column names to transform. Must be non-empty strings.
transformer (Any) – sklearn transformer object or ‘passthrough’.
- classmethod validate_columns(v)
- classmethod validate_transformer(v)
- _abc_impl = <_abc._abc_data object>
- columns: list[str]
- model_config = {'arbitrary_types_allowed': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- transformer: Any
- napistu_torch.load.encoding_manager._find_cross_config_conflicts(base_table: DataFrame, override_table: DataFrame) Dict[str, Dict]
Find columns that appear in both config tables.
- napistu_torch.load.encoding_manager._merge_configs(base_config: Dict, override_config: Dict, cross_conflicts: Dict) Dict
Merge configs with merge strategy.
- napistu_torch.load.encoding_manager.detect_config_format(config: Dict) str
Detect whether a config dict is in simple or complex format.
- Parameters:
config (Dict) – Configuration dictionary to analyze.
- Returns:
ENCODING_CONFIG_FORMAT.SIMPLE or ENCODING_CONFIG_FORMAT.COMPLEX
- Return type:
str
- Raises:
ValueError – If config doesn’t match either format specification.
Examples
>>> detect_config_format({'categorical': ['col1', 'col2']}) 'simple'
>>> detect_config_format({'categorical': {'columns': ['col1'], 'transformer': OneHotEncoder()}}) 'complex'