Skip to content

Referencia API

mlschema

Production-grade orchestration for translating pandas DataFrames into validated JSON field schemas.

The package exports a single façade—:class:mlschema.MLSchema (alias :pydata:mlform.MLForm) which wraps the internal Service/Registry subsystem. Client code typically:

  1. Registers concrete field strategies.
  2. Builds a JSON-serialisable schema from a DataFrame.
Public surface
  • MLSchema — canonical entry point
  • mlschema.core.Strategy — extension contract (advanced)
  • mlschema.core.BaseField — Pydantic base model (advanced)
  • All runtime errors derive from mlschema.core.MLSchemaError.
Example
from mlschema import MLSchema
from mlschema.strategies import NumberStrategy
import pandas as pd

ms = MLSchema()
ms.register(NumberStrategy())

df = pd.DataFrame({"age": [22, 37, 29]})
schema = ms.build(df)

MLSchema()

Facade that orchestrates strategy registration and schema generation.

The class wraps an internal :class:mlschema.core.app.Service instance and surfaces a minimal, stable API for client code. It is therefore the canonical entry point when integrating mlschema into an application or pipeline.

Attributes:

Name Type Description
field_service

Internal service component that performs the heavy lifting (registry management and JSON payload generation).

build(df: DataFrame) -> dict[str, list[dict[str, Any]]]

Translate a DataFrame into a JSON-serialisable form schema

Parameters:

Name Type Description Default
df DataFrame

Source data whose columns will be analysed and mapped to field definitions.

required

Returns:

Type Description
dict[str, list[dict[str, Any]]]

Dictionary with the schema information, where keys are field names

Raises:

Type Description
EmptyDataFrameError

If the DataFrame is empty.

FallbackStrategyMissingError

If no fallback strategy is available for the DataFrame.

PydanticCustomError

If there are validation errors in the schema.

register(strategy: Strategy) -> None

Register a new strategy.

Parameters:

Name Type Description Default
strategy Strategy

Instance of a concrete :class:mlschema.core.app.Strategy.

required

Raises:

Type Description
StrategyNameAlreadyRegisteredError

If a strategy with the same name is already registered.

StrategyDtypeAlreadyRegisteredError

If a strategy with the same dtype is already registered.

unregister(strategy: Strategy) -> None

Remove a previously registered strategy.

Parameters:

Name Type Description Default
strategy Strategy

Strategy instance to be removed from the registry.

required

update(strategy: Strategy) -> None

Replace an existing strategy in-place.

If either the type_name or any of the advertised dtypes already exist, they are overwritten with the supplied strategy.

Parameters:

Name Type Description Default
strategy Strategy

Instance of Strategy to update.

required

mlschema.core

Core abstractions and error contracts for MLSchema.

This module defines the extension surface on which all custom behaviour is built. Integrators subclass the abstractions below to introduce new data types or override default processing logic, and they trap the accompanying exceptions to maintain deterministic error handling across the pipeline.

BaseField

Bases: BaseModel

Standard metadata present in all fields.

Extend this class to define custom field types.

Attributes:

Name Type Description
title Annotated[str, Field(min_length=1, max_length=100)]

Human-readable field identifier (1-100 characters).

description Annotated[str | None, Field(max_length=500)]

Optional description (max. 500 characters).

required bool

True if the original column contains no null values.

EmptyDataFrameError(df)

Bases: FieldServiceError

Conflict: Service received a DataFrame with zero rows or columns.

FallbackStrategyMissingError(dtype_str: str)

Bases: FieldServiceError

Conflict: No registered strategy matches the column dtype, and the fallback "text" strategy is absent.

FieldRegistryError(param: str, value: Any, message: str | None = None, *, context: dict[str, Any] | None = None)

Bases: InvalidValueError

Domain-root for registry-layer validation failures.

Triggered when an operation on the strategy registry receives an invalid, conflicting, or otherwise disallowed value.

Parameters:

Name Type Description Default
param str

Logical argument name that caused the fault

required
value Any

Offending value (already normalised by the caller).

required
message str | None

Human-readable description. If None, a neutral default is autogenerated.

None
context dict[str, Any] | None

Arbitrary diagnostics—module, strategy ID, etc.

None

Attributes:

Name Type Description
param str

Same as the param constructor argument.

value Any

Same as the value constructor argument.

context dict[str, Any] | None

Same as the context constructor argument.

Example
if type_name in registry:
    raise FieldRegistryError(
        param="type_name",
        value=type_name,
        message=f"Strategy {type_name!r} already exists",
        context={"existing_cls": registry[type_name]},
    )

FieldServiceError(param: str, value: Any, message: str | None = None, *, context: dict[str, Any] | None = None)

Bases: InvalidValueError

Domain-root for service-layer validation failures.

Triggered when runtime data or configuration supplied to the Service component is missing, malformed, or otherwise unusable.

Parameters:

Name Type Description Default
param str

Logical argument name that caused the fault (e.g., "dataframe", "dtype", "fallback_strategy").

required
value Any

Offending value—typically the incoming DataFrame, a dtype string, or a strategy identifier.

required
message str | None

Human-readable description. If None, a neutral default is autogenerated.

None
context dict[str, Any] | None

Arbitrary diagnostics for logs or metrics (row/column counts, offending dtype, etc.).

None

Attributes:

Name Type Description
param str

Mirrors the param constructor argument.

value Any

Mirrors the value constructor argument.

context dict[str, Any] | None

Mirrors the context constructor argument.

Example
if df.empty:
    raise FieldServiceError(
        param="dataframe",
        value=df,
        message="Input DataFrame is empty",
        context={"rows": 0, "cols": 0},
    )

InvalidValueError(param: str, value: Any, message: str | None = None, *, context: dict[str, Any] | None = None)

Bases: MLSchemaError, ValueError

Standard signal for configuration or user-input violations.

Raised when a supplied argument, configuration value, or runtime artefact fails validation. Subclasses narrow the scope to specific domains (e.g., registry vs. service faults).

Args

param: Logical argument name that triggered the failure ("dtype", "type_name", …). value: Offending value already normalised by the caller. message: Human-readable description. If None, a neutral default is auto-generated. context: Arbitrary diagnostics for observability pipelines (e.g., {"strategy": "NumberStrategy"}).

Attributes: param: Same as the param constructor argument. value: Same as the value constructor argument. context: Same as the context constructor argument. Same as the context constructor argument.

Examples
if dtype_key in registry:
    raise InvalidValueError(
        param="dtype",
        value=dtype_key,
        message=f"dtype {dtype_key!r} already mapped",
        context={"registered_strategy": registry[dtype_key]},
    )

MLSchemaError(message: str | None = None, *, context: dict[str, Any] | None = None)

Bases: Exception

Project-root for every mlschema runtime failure.

All domain-specific exceptions ultimately derive from this class, enabling both narrow and broad interception patterns:

try:
    schema = ms.build(df)
except MLSchemaError as exc:  # catch-all
    logger.error("Schema failure: %s", exc, exc_info=True)
    raise HTTPException(422, detail=str(exc)) from exc

Attributes:

Name Type Description
context dict[str, Any] | None

Optional, machine-friendly diagnostics (e.g., offending dtype, column name, strategy ID). Contents are stable only for the public leaf exceptions; treat additional keys as informational.

Strategy(*, type_name: str, schema_cls: type[BaseField], dtypes: Sequence[str | Any])

Abstract base class for all MLSchema field strategies.

Each concrete strategy maps a single pandas dtype (or group of dtypes) to a validated JSON field specification. Strategies are opt-in: they influence schema generation only after being registered via MLSchema.register().

Usage contract
  • Do not mutate the incoming Series; treat it as read-only.
  • Subclasses should override attributes_from_series() to emit extra metadata, but must avoid the reserved keys: "title", "type", "required", "description".
  • Registration is idempotent—duplicate type_name's must be replaced via MLSchema.update().

Parameters:

Name Type Description Default
type_name str

Identifier for the strategy type.

required
schema_cls type[BaseField]

Pydantic class that models the field.

required
dtypes Sequence[str | Any]

Sequence of dtype (instances or names) to which the strategy applies.

required

dtypes: tuple[str, ...] property

Tuple of supported dtype names.

schema_cls: type[BaseField] property

Pydantic class used to serialize the schema.

type_name: str property

Identifier for the strategy type.

attributes_from_series(series: Series) -> dict

Calculate field-specific attributes.

This method can be overridden by subclasses to add implementation-specific metadata to the schema.

Parameters:

Name Type Description Default
series Series

Dataframe column to analyze.

required

Returns:

Type Description
dict

Dictionary with additional attributes; never includes the standard keys title, type, required, description.

build_dict(series: Series) -> dict

Create the JSON representation of the schema.

Combines the standard attributes with those returned by attributes_from_series and serializes the result with the associated Pydantic class.

Parameters:

Name Type Description Default
series Series

Dataframe column to analyze.

required

Returns:

Type Description
dict

JSON with the field schema.

StrategyDtypeAlreadyRegisteredError(dtype_key: str)

Bases: FieldRegistryError

Conflict: two strategies contend for the same dtype key.

StrategyNameAlreadyRegisteredError(type_name: str)

Bases: FieldRegistryError

Conflict: two strategies contend for the same type_name.

mlschema.strategies

Strategies sub-package for MLSchema

This namespace aggregates the concrete Strategy implementations that map pandas dtypes to validated JSON field definitions. All classes inherit from mlschema.core.Strategy and are opt-in, they become active only after an explicit MLSchema.register() call.

Strategies Available
Class Description
BooleanStrategy Strategy for handling boolean data types.
CategoryStrategy Strategy for handling categorical data types.
DateStrategy Strategy for handling date and datetime data types.
NumberStrategy Strategy for handling numeric data types.
TextStrategy Strategy for handling text and string data types.

Design notes

Principle Description
Single-responsibility Each strategy handles one logical field type.
Pluggable New strategies register via MLSchema.register(), replace via MLSchema.update(), and deregister via MLSchema.unregister().
Declarative output Strategies emit validated BaseField subclasses, ensuring schema integrity from ingestion to UI rendering.

BooleanStrategy()

Bases: Strategy

Instance of Strategy for boolean fields.

Name

boolean

Dtypes
Name Type
bool BooleanDtype
boolean BooleanDtype
Model Attributes
Name Type Description
type Literal["boolean"] Fixed type for the strategy.
value bool | None The current value of the field.

CategoryStrategy()

Bases: Strategy

Instance of Strategy for category fields.

Name

category

Dtypes
Name Type
category CategoricalDtype
Model Attributes
Name Type Description
type Literal["category"] Fixed type for the strategy.
options list[str] List of allowed categories.
value str | None Current value of the field.
Model Restrictions
Description Error Type Error Message
value in options PydanticCustomError value {value} must be in options {options}

attributes_from_series(series: Series) -> dict

Derives the list of options from the series.

Parameters:

Name Type Description Default
series Series

Pandas series with categorical values.

required

Returns:

Type Description
dict

Dictionary with the options key and the list of unique values.

DateStrategy()

Bases: Strategy

Instance of Strategy for date fields.

Name

date

Dtypes
Name Type
datetime64[ns] DatetimeTZDtype
datetime64 DatetimeDtype
Model Attributes
Name Type Description
type Literal["date"] Fixed type for the strategy.
value date | None The current value of the field.
min date | None Minimum allowed date.
max date | None Maximum allowed date.
step PositiveInt Increment in days.
Model Restrictions
Description Error Type Error Message
minmax PydanticCustomError min {min} must be ≤ max {max}
valuemin PydanticCustomError value {value} must be ≥ min {min}
valuemax PydanticCustomError value {value} must be ≤ max {max}

NumberStrategy()

Bases: Strategy

Instance of Strategy for number fields.

Name

number

Dtypes
Name Type
int64 Int64Dtype
float64 Float64Dtype
int32 Int32Dtype
float32 Float32Dtype
Model Attributes
Name Type Description
type Literal["number"] Fixed type for the strategy.
value int | float | None The current value of the field.
step float | int Increment for numeric values.
min int | float | None Minimum allowed value.
max int | float | None Maximum allowed value.
unit str | None Unit of measurement for the numeric value.
placeholder str | None Placeholder text for the field.
Model Restrictions
Description Error Type Error Message
minmax PydanticCustomError min {min} must be ≤ max {max}
valuemin PydanticCustomError value {value} must be ≥ min {min}
valuemax PydanticCustomError value {value} must be ≤ max {max}

attributes_from_series(series: Series) -> dict

Derives the step attribute from the dtype.

Parameters:

Name Type Description Default
series Series

Pandas series with numeric values.

required

Returns:

Type Description
dict

Dictionary with the step key.

TextStrategy()

Bases: Strategy

Instance of Strategy for text fields.

Name

text

Dtypes
Name Type
object object
string StringDtype
Model Attributes
Name Type Description
type Literal["text"] Fixed type for the strategy.
value str | None The current value of the field.
placeholder str | None Placeholder text for the field.
min_length int | None Minimum length of the text.
max_length int | None Maximum length of the text.
pattern str | None Regular expression pattern for validation.
Model Restrictions
Description Error Type Error Message
min_lengthmax_length PydanticCustomError minLength {minLength} must be ≤ maxLength {maxLength}
value length ≥ min_length PydanticCustomError value length {value_length} must be ≥ minLength {minLength}
value length ≤ max_length PydanticCustomError value length {value_length} must be ≤ maxLength {maxLength}