Skip to content

Referencia API

mlschema

Strict pandas DataFrame to JSON field-list inference.

BaseField

Bases: BaseModel

Standard metadata present in all fields.

Aligns with mlform's BaseFieldConfig. Extend this class to define custom field types.

Attributes:

Name Type Description
label Annotated[str, Field(min_length=1, max_length=100)]

Human-readable field identifier (1-100 chars).

description Annotated[str | None, Field(max_length=500)]

Optional help text (max 500 chars).

required bool

Field is mandatory (mlform default: false).

mappedTo MappedToTarget

Backend feature name or model input position.

valuePath str | list[str] | None

Key path used when reading the field value on submit.

defaultValue Any | None

Initial value for the field.

FieldContext(name: str, dtype: str, required: bool, index: int, mappedTo: MappedToTarget, infer_field: FieldInfer) dataclass

Column metadata passed to field builders.

Attributes:

Name Type Description
name str

Column name converted to a string for stable JSON labels.

dtype str

Normalised pandas dtype name for the source series.

required bool

True when the source series contains no null values.

index int

Zero-based column position in the input DataFrame.

mappedTo MappedToTarget

Backend feature name or original model input position.

infer_field FieldInfer

Recursive callback for builders that need to infer sub-fields, such as the builtin series builder.

FieldKind(name: str, model: type[BaseField], infer: FieldBuilder) dataclass

Strict field-kind definition.

Attributes:

Name Type Description
name str

Field discriminator extracted from the model's kind default.

model type[BaseField]

Pydantic model used to validate and serialise generated fields.

infer FieldBuilder

Builder callable that can emit fields for this kind.

infer_schema(df: DataFrame, *, builders: Sequence[FieldBuilder] = (), kinds: Sequence[FieldKind] = (), overrides: Mapping[str, Mapping[str, Any]] | None = None, onehot_separator: str = '__') -> list[FieldDict]

Infer a strict field-list schema from a pandas DataFrame.

Parameters:

Name Type Description Default
df DataFrame

Source DataFrame. Each column becomes one field in output order.

required
builders Sequence[FieldBuilder]

Optional callables that can customize inference for already registered kinds. These run before custom kind builders and builtins.

()
kinds Sequence[FieldKind]

Optional strict custom field kinds created with kind(). Each kind contributes a Pydantic validator model and an inference builder.

()
overrides Mapping[str, Mapping[str, Any]] | None

Optional mapping of column name to final field patch. Patches are applied after builder inference and before Pydantic validation.

None
onehot_separator str

Separator for one-hot columns, e.g. feature__value.

'__'

Returns:

Type Description
list[FieldDict]

JSON-serialisable list of validated field dictionaries.

Raises:

Type Description
EmptyDataFrameError

If df has no rows or no columns.

FieldKindAlreadyRegisteredError

If two kinds share the same name.

FieldBuilderError

If an override targets a missing column, a builder returns invalid data, or no builder matches a column.

UnknownFieldKindError

If a builder emits a kind with no registered model.

ValidationError

If generated or overridden field data violates the target field model.

mlschema.core

Core strict inference API and field contracts for MLSchema.

BaseField

Bases: BaseModel

Standard metadata present in all fields.

Aligns with mlform's BaseFieldConfig. Extend this class to define custom field types.

Attributes:

Name Type Description
label Annotated[str, Field(min_length=1, max_length=100)]

Human-readable field identifier (1-100 chars).

description Annotated[str | None, Field(max_length=500)]

Optional help text (max 500 chars).

required bool

Field is mandatory (mlform default: false).

mappedTo MappedToTarget

Backend feature name or model input position.

valuePath str | list[str] | None

Key path used when reading the field value on submit.

defaultValue Any | None

Initial value for the field.

EmptyDataFrameError(df: DataFrame)

Bases: FieldServiceError

Raised when inference receives a DataFrame with no rows or columns.

Parameters:

Name Type Description Default
df DataFrame

DataFrame that failed the minimum shape requirement.

required

Attributes:

Name Type Description
param str

Always "dataframe".

value Any

The invalid DataFrame.

context dict[str, Any] | None

Row and column counts at failure time.

FieldBuilderError(param: str, value: Any, message: str | None = None, *, context: dict[str, Any] | None = None)

Bases: FieldKindError

Raised when a field builder returns unusable data.

Examples:

Builder failures include returning a non-dict value, omitting kind, referencing missing override columns, or leaving a column unmatched after all builders have run.

FieldContext(name: str, dtype: str, required: bool, index: int, mappedTo: MappedToTarget, infer_field: FieldInfer) dataclass

Column metadata passed to field builders.

Attributes:

Name Type Description
name str

Column name converted to a string for stable JSON labels.

dtype str

Normalised pandas dtype name for the source series.

required bool

True when the source series contains no null values.

index int

Zero-based column position in the input DataFrame.

mappedTo MappedToTarget

Backend feature name or original model input position.

infer_field FieldInfer

Recursive callback for builders that need to infer sub-fields, such as the builtin series builder.

FieldKind(name: str, model: type[BaseField], infer: FieldBuilder) dataclass

Strict field-kind definition.

Attributes:

Name Type Description
name str

Field discriminator extracted from the model's kind default.

model type[BaseField]

Pydantic model used to validate and serialise generated fields.

infer FieldBuilder

Builder callable that can emit fields for this kind.

FieldKindAlreadyRegisteredError(kind_name: str)

Bases: FieldKindError

Raised when two field kinds use the same kind name.

Parameters:

Name Type Description Default
kind_name str

Duplicate field-kind discriminator.

required

Attributes:

Name Type Description
param str

Always "kind".

value Any

The duplicate kind name.

context dict[str, Any] | None

Contains {"offender": kind_name}.

FieldKindError(param: str, value: Any, message: str | None = None, *, context: dict[str, Any] | None = None)

Bases: InvalidValueError

Base error for invalid field-kind definitions.

Parameters:

Name Type Description Default
param str

Logical argument or field name that failed validation.

required
value Any

Offending value supplied by the caller.

required
message str | None

Optional human-readable message. A default is generated when omitted.

None
context dict[str, Any] | None

Optional machine-readable diagnostics.

None

Attributes:

Name Type Description
param str

Name of the failing logical parameter.

value Any

Invalid value.

context dict[str, Any] | None

Optional diagnostics inherited from MLSchemaError.

FieldServiceError(param: str, value: Any, message: str | None = None, *, context: dict[str, Any] | None = None)

Bases: InvalidValueError

Base error for invalid runtime inputs supplied to inference.

Parameters:

Name Type Description Default
param str

Logical input name that failed validation.

required
value Any

Offending value supplied to the inference API.

required
message str | None

Optional human-readable message. A default is generated when omitted.

None
context dict[str, Any] | None

Optional machine-readable diagnostics.

None

Attributes:

Name Type Description
param str

Name of the failing logical input.

value Any

Invalid value.

context dict[str, Any] | None

Optional diagnostics inherited from MLSchemaError.

InvalidValueError(param: str, value: Any, message: str | None = None, *, context: dict[str, Any] | None = None)

Bases: MLSchemaError, ValueError

Standard signal for configuration or user-input violations.

Raised when a supplied argument, configuration value, or runtime artefact fails validation. Subclasses narrow the scope to specific domains (e.g., registry vs. service faults).

Args

param: Logical argument name that triggered the failure ("dtype", "kind", …). value: Offending value already normalised by the caller. message: Human-readable description. If None, a neutral default is auto-generated. context: Arbitrary diagnostics for observability pipelines (e.g., {"builder": "number_builder"}).

Attributes: param: Same as the param constructor argument. value: Same as the value constructor argument. context: Same as the context constructor argument. Same as the context constructor argument.

Examples
if kind_name in models:
    raise InvalidValueError(
        param="kind",
        value=kind_name,
        message=f"kind {kind_name!r} already mapped",
        context={"registered_kind": models[kind_name]},
    )

MLSchemaError(message: str | None = None, *, context: dict[str, Any] | None = None)

Bases: Exception

Project-root for every mlschema runtime failure.

All domain-specific exceptions ultimately derive from this class, enabling both narrow and broad interception patterns:

try:
    schema = infer_schema(df)
except MLSchemaError as exc:  # catch-all
    logger.error("Schema failure: %s", exc, exc_info=True)
    raise HTTPException(422, detail=str(exc)) from exc

Attributes:

Name Type Description
context dict[str, Any] | None

Optional, machine-friendly diagnostics (e.g., offending dtype, column name, builder ID). Contents are stable only for the public leaf exceptions; treat additional keys as informational.

UnknownFieldKindError(kind_name: str)

Bases: FieldKindError

Raised when a builder emits an unregistered field kind.

Parameters:

Name Type Description Default
kind_name str

Field-kind discriminator found in builder output.

required

Attributes:

Name Type Description
param str

Always "kind".

value Any

Unknown kind name.

context dict[str, Any] | None

Contains {"offender": kind_name}.

infer_schema(df: DataFrame, *, builders: Sequence[FieldBuilder] = (), kinds: Sequence[FieldKind] = (), overrides: Mapping[str, Mapping[str, Any]] | None = None, onehot_separator: str = '__') -> list[FieldDict]

Infer a strict field-list schema from a pandas DataFrame.

Parameters:

Name Type Description Default
df DataFrame

Source DataFrame. Each column becomes one field in output order.

required
builders Sequence[FieldBuilder]

Optional callables that can customize inference for already registered kinds. These run before custom kind builders and builtins.

()
kinds Sequence[FieldKind]

Optional strict custom field kinds created with kind(). Each kind contributes a Pydantic validator model and an inference builder.

()
overrides Mapping[str, Mapping[str, Any]] | None

Optional mapping of column name to final field patch. Patches are applied after builder inference and before Pydantic validation.

None
onehot_separator str

Separator for one-hot columns, e.g. feature__value.

'__'

Returns:

Type Description
list[FieldDict]

JSON-serialisable list of validated field dictionaries.

Raises:

Type Description
EmptyDataFrameError

If df has no rows or no columns.

FieldKindAlreadyRegisteredError

If two kinds share the same name.

FieldBuilderError

If an override targets a missing column, a builder returns invalid data, or no builder matches a column.

UnknownFieldKindError

If a builder emits a kind with no registered model.

ValidationError

If generated or overridden field data violates the target field model.

mlschema.strategies

Builtin field builders and Pydantic field models.

boolean_builder(_series: Series, ctx: FieldContext) -> FieldDict | None

Infer a boolean field for pandas boolean columns.

Parameters:

Name Type Description Default
_series Series

Source column. The value is unused because dtype metadata in ctx is sufficient for boolean inference.

required
ctx FieldContext

Column metadata including normalised dtype, name, and required flag.

required

Returns:

Type Description
FieldDict | None

A strict field dict for BooleanField when ctx.dtype is bool or

FieldDict | None

boolean; otherwise None so the next builder can try.

builtin_kinds() -> tuple[FieldKind, ...]

Return builtin strict field kinds in inference order.

Returns:

Type Description
FieldKind

Tuple of builtin FieldKind values ordered from most specific to most

...

general. series runs before dtype-based builders, and text runs last

tuple[FieldKind, ...]

as fallback.

category_builder(series: Series, ctx: FieldContext) -> FieldDict | None

Infer a category field from a pandas categorical column.

Parameters:

Name Type Description Default
series Series

Source column whose options are extracted from categorical metadata or non-null unique values.

required
ctx FieldContext

Column metadata including normalised dtype, name, and required flag.

required

Returns:

Type Description
FieldDict | None

A strict field dict for CategoryField when ctx.dtype is category;

FieldDict | None

otherwise None.

date_builder(_series: Series, ctx: FieldContext) -> FieldDict | None

Infer a date field for pandas datetime columns.

Parameters:

Name Type Description Default
_series Series

Source column. The value is unused because dtype metadata in ctx is sufficient for date inference.

required
ctx FieldContext

Column metadata including normalised dtype, name, and required flag.

required

Returns:

Type Description
FieldDict | None

A strict field dict for DateField when ctx.dtype is a supported

FieldDict | None

datetime dtype; otherwise None.

number_builder(series: Series, ctx: FieldContext) -> FieldDict | None

Infer a number field for supported numeric columns.

Parameters:

Name Type Description Default
series Series

Source column inspected with pandas' dtype helpers to decide the generated step.

required
ctx FieldContext

Column metadata including normalised dtype, name, and required flag.

required

Returns:

Type Description
FieldDict | None

A strict field dict for NumberField when ctx.dtype is supported;

FieldDict | None

otherwise None.

series_builder(series: Series, ctx: FieldContext) -> FieldDict | None

Infer a series field for columns containing two-element cells.

Parameters:

Name Type Description Default
series Series

Source column containing tuple, list, or dict cells.

required
ctx FieldContext

Column metadata plus the recursive infer_field callback used for sub-field inference.

required

Returns:

Type Description
FieldDict | None

A strict field dict for SeriesField when all non-null cells are

FieldDict | None

two-element compound values; otherwise None.

text_builder(_series: Series, ctx: FieldContext) -> FieldDict

Infer a text field for any remaining column.

Parameters:

Name Type Description Default
_series Series

Source column. The value is unused because this builder is a fallback and accepts any dtype not claimed earlier.

required
ctx FieldContext

Column metadata including name and required flag.

required

Returns:

Type Description
FieldDict

A strict field dict for TextField.