napistu_torch.load.encoding_manager

Configuration management for DataFrame encoding transformations.

This module provides configuration management for DataFrame encoding transformations, allowing flexible specification of how columns should be encoded.

Classes

EncodingManager

Configuration manager for DataFrame encoding transformations.

TransformConfig

Configuration for a single transform.

EncodingConfig

Complex encoding configuration format.

SimpleEncodingConfig

Simple encoding configuration format.

Public Functions

detect_config_format(config)

Detect whether a config dict is in simple or complex format.

Functions

detect_config_format(config)

Detect whether a config dict is in simple or complex format.

Classes

EncodingConfig([root])

Complete encoding configuration with conflict validation.

EncodingManager(config[, encoders])

Configuration manager for DataFrame encoding transformations.

SimpleEncodingConfig([root])

Simple encoding configuration format validator.

TransformConfig(*, columns, transformer)

Configuration for a single transformation.

class napistu_torch.load.encoding_manager.EncodingConfig(root: RootModelRootType = PydanticUndefined)

Bases: RootModel[Dict[str, TransformConfig]]

Complete encoding configuration with conflict validation.

Parameters:

root (Dict[str, TransformConfig]) – Dictionary mapping transform names to their configurations.

check_no_column_conflicts()

Ensure no column appears in multiple transforms.

_abc_impl = <_abc._abc_data object>
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class napistu_torch.load.encoding_manager.EncodingManager(config: Dict[str, Dict] | Dict[str, set], encoders: Dict[str, Any] | None = None)

Bases: object

Configuration manager for DataFrame encoding transformations.

This class manages encoding configurations, validates them, and provides utilities for inspecting and composing configurations.

Parameters:
  • config (Dict[str, Dict] or Dict[str, set]) –

    Encoding configuration dictionary. Supports two formats:

    Complex format (when encoders=None):

    Each key is a transform name and each value is a dict with ‘columns’ and ‘transformer’ keys. Example: {

    ’categorical’: {

    ‘columns’: [‘col1’, ‘col2’], ‘transformer’: OneHotEncoder()

    }, ‘numerical’: {

    ’columns’: [‘col3’], ‘transformer’: StandardScaler()

    }

    }

    Simple format (when encoders is provided):

    Each key is an encoding type and each value is a set/list of column names. Example: {

    ’categorical’: {‘col1’, ‘col2’}, ‘numerical’: {‘col3’}

    }

  • encoders (Dict[str, Any], optional) –

    Mapping from encoding type to transformer instance. Only used with simple format. If provided, config is treated as simple format and converted to complex format internally. Example: {

    ’categorical’: OneHotEncoder(), ‘numerical’: StandardScaler()

    }

config_

The validated configuration dictionary (always in complex format).

Type:

Dict[str, Dict]

compose(override_config, verbose=False)

Compose this configuration with another configuration using merge strategy.

ensure(config, encoders=None)

Class method to ensure config is an EncodingManager instance. Supports both simple and complex dict formats via encoders parameter.

get_config()

Get the encoding configuration dictionary.

get_encoding_table()

Get a summary table of all configured transformations.

log_summary()

Log a summary of all configured transformations.

validate(config)

Validate a configuration dictionary.

Private Methods
---------------
_create_encoding_table(config)

Create transform table from validated config.

Raises:

ValueError – If the configuration is invalid or has column conflicts.

Examples

Complex format:

>>> from sklearn.preprocessing import OneHotEncoder, StandardScaler
>>>
>>> config_dict = {
...     'categorical': {
...         'columns': ['category'],
...         'transformer': OneHotEncoder(sparse_output=False)
...     },
...     'numerical': {
...         'columns': ['value'],
...         'transformer': StandardScaler()
...     }
... }
>>>
>>> config = EncodingManager(config_dict)
>>> config.log_summary()
>>> print(config.get_encoding_table())

Simple format:

>>> simple_spec = {
...     'categorical': {'category'},
...     'numerical': {'value'}
... }
>>> encoders = {
...     'categorical': OneHotEncoder(sparse_output=False),
...     'numerical': StandardScaler()
... }
>>> config = EncodingManager(simple_spec, encoders=encoders)
>>> print(config.get_encoding_table())
classmethod ensure(config: dict | EncodingManager, encoders: Dict[str, Any] | None = None) EncodingManager

Ensure that config is an EncodingManager object.

If config is a dict, it will be converted to an EncodingManager. If it’s already an EncodingManager, it will be returned as-is.

Parameters:
  • config (Union[dict, EncodingManager]) – Either a dict (simple or complex format) or an EncodingManager object.

  • encoders (Dict[str, Any], optional) – Mapping from encoding type to transformer instance. Only used when config is a dict in simple format. Ignored if config is already an EncodingManager.

Returns:

The EncodingManager object

Return type:

EncodingManager

Raises:

ValueError – If config is neither a dict nor an EncodingManager

Examples

Complex format dict:

>>> config = EncodingManager.ensure({
...     "foo": {"columns": ["bar"], "transformer": StandardScaler()}
... })
>>> isinstance(config, EncodingManager)
True

Simple format dict:

>>> config = EncodingManager.ensure(
...     {"categorical": {"col1", "col2"}},
...     encoders={"categorical": OneHotEncoder()}
... )
>>> isinstance(config, EncodingManager)
True

EncodingManager passthrough:

>>> manager = EncodingManager({"foo": {"columns": ["bar"], "transformer": StandardScaler()}})
>>> result = EncodingManager.ensure(manager)
>>> result is manager
True
static _convert_simple_to_complex(simple_spec: Dict[str, set], encoders: Dict[str, Any]) Dict[str, Dict]

Convert simple spec format to complex format.

Parameters:
  • simple_spec (Dict[str, set]) – Mapping from encoding type to set of column names.

  • encoders (Dict[str, Any]) – Mapping from encoding type to transformer instance.

Returns:

Complex format configuration.

Return type:

Dict[str, Dict]

__init__(config: Dict[str, Dict] | Dict[str, set], encoders: Dict[str, Any] | None = None)
_create_encoding_table(config: Dict[str, TransformConfig]) DataFrame

Create transform table from validated config.

Parameters:

config (Dict[str, TransformConfig]) – Dictionary mapping transform names to TransformConfig objects.

Returns:

DataFrame with columns ‘transform_name’, ‘column’, and ‘transformer_type’.

Return type:

pd.DataFrame

compose(override_config: EncodingConfig, verbose: bool = False) EncodingConfig

Compose this configuration with another configuration using merge strategy.

Merges configs at the transform level. For cross-config column conflicts, the override config takes precedence while preserving non-conflicted columns from this (base) config.

Parameters:
  • override_config (EncodingConfig) – Configuration to merge in, taking precedence over this config.

  • verbose (bool, default=False) – If True, log detailed information about conflicts and final transformations.

Returns:

New EncodingConfig instance with the composed configuration.

Return type:

EncodingConfig

Examples

>>> base = EncodingConfig({'num': {'columns': ['a', 'b'], 'transformer': StandardScaler()}})
>>> override = EncodingConfig({'cat': {'columns': ['c'], 'transformer': OneHotEncoder()}})
>>> composed = base.compose(override)
>>> print(composed)  # EncodingConfig(transforms=2, columns=3)
get_config() Dict[str, Dict]

Get the encoding configuration dictionary.

Returns:

The validated configuration dictionary in complex format.

Return type:

Dict[str, Dict]

get_encoding_table() DataFrame

Get a summary table of all configured transformations.

Returns:

DataFrame with columns ‘transform_name’, ‘column’, and ‘transformer_type’ showing which columns are assigned to which transformers.

Return type:

pd.DataFrame

Examples

>>> config = EncodingConfig(config_dict)
>>> table = config.get_encoding_table()
>>> print(table)
   transform_name    column transformer_type
0     categorical      col1    OneHotEncoder
1     categorical      col2    OneHotEncoder
2       numerical      col3   StandardScaler
log_summary() None

Log a summary of all configured transformations.

Logs one message per transformation showing the transformer type and the columns it will transform.

Examples

>>> config = EncodingConfig(config_dict)
>>> config.log_summary()
INFO:__main__:categorical (OneHotEncoder): ['col1', 'col2']
INFO:__main__:numerical (StandardScaler): ['col3']
validate(config: Dict[str, Dict]) Dict[str, Dict]

Validate a configuration dictionary.

Parameters:

config (Dict[str, Dict]) – Configuration dictionary to validate.

Returns:

The validated configuration dictionary (same as input if valid).

Return type:

Dict[str, Dict]

Raises:

ValueError – If configuration structure is invalid or column conflicts exist.

Examples

>>> config_mgr = EncodingConfig({})
>>> validated = config_mgr.validate(config_dict)
class napistu_torch.load.encoding_manager.SimpleEncodingConfig(root: RootModelRootType = PydanticUndefined)

Bases: RootModel[Dict[str, Union[List[str], set]]]

Simple encoding configuration format validator.

Validates that each value is a list or set of column names (strings).

Parameters:

root (Dict[str, Union[List[str], set]]) – Dictionary mapping transform names to column name collections.

validate_all_values_are_column_collections()

Ensure all values are lists or sets of strings.

_abc_impl = <_abc._abc_data object>
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class napistu_torch.load.encoding_manager.TransformConfig(*, columns: Annotated[list[str], MinLen(min_length=1)], transformer: Any)

Bases: BaseModel

Configuration for a single transformation.

Parameters:
  • columns (List[str]) – Column names to transform. Must be non-empty strings.

  • transformer (Any) – sklearn transformer object or ‘passthrough’.

classmethod validate_columns(v)
classmethod validate_transformer(v)
_abc_impl = <_abc._abc_data object>
columns: list[str]
model_config = {'arbitrary_types_allowed': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

transformer: Any
napistu_torch.load.encoding_manager._find_cross_config_conflicts(base_table: DataFrame, override_table: DataFrame) Dict[str, Dict]

Find columns that appear in both config tables.

napistu_torch.load.encoding_manager._merge_configs(base_config: Dict, override_config: Dict, cross_conflicts: Dict) Dict

Merge configs with merge strategy.

napistu_torch.load.encoding_manager.detect_config_format(config: Dict) str

Detect whether a config dict is in simple or complex format.

Parameters:

config (Dict) – Configuration dictionary to analyze.

Returns:

ENCODING_CONFIG_FORMAT.SIMPLE or ENCODING_CONFIG_FORMAT.COMPLEX

Return type:

str

Raises:

ValueError – If config doesn’t match either format specification.

Examples

>>> detect_config_format({'categorical': ['col1', 'col2']})
'simple'
>>> detect_config_format({'categorical': {'columns': ['col1'], 'transformer': OneHotEncoder()}})
'complex'