Referencia API¶
mlschema
¶
Strict pandas DataFrame to JSON field-list inference.
BaseField
¶
Bases: BaseModel
Standard metadata present in all fields.
Aligns with mlform's BaseFieldConfig. Extend this class to define
custom field types.
Attributes:
| Name | Type | Description |
|---|---|---|
label |
Annotated[str, Field(min_length=1, max_length=100)]
|
Human-readable field identifier (1-100 chars). |
description |
Annotated[str | None, Field(max_length=500)]
|
Optional help text (max 500 chars). |
required |
bool
|
Field is mandatory (mlform default: false). |
mappedTo |
MappedToTarget
|
Backend feature name or model input position. |
valuePath |
str | list[str] | None
|
Key path used when reading the field value on submit. |
defaultValue |
Any | None
|
Initial value for the field. |
FieldContext(name: str, dtype: str, required: bool, index: int, mappedTo: MappedToTarget, infer_field: FieldInfer)
dataclass
¶
Column metadata passed to field builders.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Column name converted to a string for stable JSON labels. |
dtype |
str
|
Normalised pandas dtype name for the source series. |
required |
bool
|
|
index |
int
|
Zero-based column position in the input DataFrame. |
mappedTo |
MappedToTarget
|
Backend feature name or original model input position. |
infer_field |
FieldInfer
|
Recursive callback for builders that need to infer sub-fields, such as the builtin series builder. |
FieldKind(name: str, model: type[BaseField], infer: FieldBuilder)
dataclass
¶
Strict field-kind definition.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Field discriminator extracted from the model's |
model |
type[BaseField]
|
Pydantic model used to validate and serialise generated fields. |
infer |
FieldBuilder
|
Builder callable that can emit fields for this kind. |
infer_schema(df: DataFrame, *, builders: Sequence[FieldBuilder] = (), kinds: Sequence[FieldKind] = (), overrides: Mapping[str, Mapping[str, Any]] | None = None, onehot_separator: str = '__') -> list[FieldDict]
¶
Infer a strict field-list schema from a pandas DataFrame.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
Source DataFrame. Each column becomes one field in output order. |
required |
builders
|
Sequence[FieldBuilder]
|
Optional callables that can customize inference for already registered kinds. These run before custom kind builders and builtins. |
()
|
kinds
|
Sequence[FieldKind]
|
Optional strict custom field kinds created with |
()
|
overrides
|
Mapping[str, Mapping[str, Any]] | None
|
Optional mapping of column name to final field patch. Patches are applied after builder inference and before Pydantic validation. |
None
|
onehot_separator
|
str
|
Separator for one-hot columns, e.g. |
'__'
|
Returns:
| Type | Description |
|---|---|
list[FieldDict]
|
JSON-serialisable list of validated field dictionaries. |
Raises:
| Type | Description |
|---|---|
EmptyDataFrameError
|
If |
FieldKindAlreadyRegisteredError
|
If two kinds share the same name. |
FieldBuilderError
|
If an override targets a missing column, a builder returns invalid data, or no builder matches a column. |
UnknownFieldKindError
|
If a builder emits a kind with no registered model. |
ValidationError
|
If generated or overridden field data violates the target field model. |
mlschema.core
¶
Core strict inference API and field contracts for MLSchema.
BaseField
¶
Bases: BaseModel
Standard metadata present in all fields.
Aligns with mlform's BaseFieldConfig. Extend this class to define
custom field types.
Attributes:
| Name | Type | Description |
|---|---|---|
label |
Annotated[str, Field(min_length=1, max_length=100)]
|
Human-readable field identifier (1-100 chars). |
description |
Annotated[str | None, Field(max_length=500)]
|
Optional help text (max 500 chars). |
required |
bool
|
Field is mandatory (mlform default: false). |
mappedTo |
MappedToTarget
|
Backend feature name or model input position. |
valuePath |
str | list[str] | None
|
Key path used when reading the field value on submit. |
defaultValue |
Any | None
|
Initial value for the field. |
EmptyDataFrameError(df: DataFrame)
¶
Bases: FieldServiceError
Raised when inference receives a DataFrame with no rows or columns.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
DataFrame that failed the minimum shape requirement. |
required |
Attributes:
| Name | Type | Description |
|---|---|---|
param |
str
|
Always |
value |
Any
|
The invalid DataFrame. |
context |
dict[str, Any] | None
|
Row and column counts at failure time. |
FieldBuilderError(param: str, value: Any, message: str | None = None, *, context: dict[str, Any] | None = None)
¶
Bases: FieldKindError
Raised when a field builder returns unusable data.
Examples:
Builder failures include returning a non-dict value, omitting kind,
referencing missing override columns, or leaving a column unmatched after
all builders have run.
FieldContext(name: str, dtype: str, required: bool, index: int, mappedTo: MappedToTarget, infer_field: FieldInfer)
dataclass
¶
Column metadata passed to field builders.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Column name converted to a string for stable JSON labels. |
dtype |
str
|
Normalised pandas dtype name for the source series. |
required |
bool
|
|
index |
int
|
Zero-based column position in the input DataFrame. |
mappedTo |
MappedToTarget
|
Backend feature name or original model input position. |
infer_field |
FieldInfer
|
Recursive callback for builders that need to infer sub-fields, such as the builtin series builder. |
FieldKind(name: str, model: type[BaseField], infer: FieldBuilder)
dataclass
¶
Strict field-kind definition.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Field discriminator extracted from the model's |
model |
type[BaseField]
|
Pydantic model used to validate and serialise generated fields. |
infer |
FieldBuilder
|
Builder callable that can emit fields for this kind. |
FieldKindAlreadyRegisteredError(kind_name: str)
¶
Bases: FieldKindError
Raised when two field kinds use the same kind name.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
kind_name
|
str
|
Duplicate field-kind discriminator. |
required |
Attributes:
| Name | Type | Description |
|---|---|---|
param |
str
|
Always |
value |
Any
|
The duplicate kind name. |
context |
dict[str, Any] | None
|
Contains |
FieldKindError(param: str, value: Any, message: str | None = None, *, context: dict[str, Any] | None = None)
¶
Bases: InvalidValueError
Base error for invalid field-kind definitions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
param
|
str
|
Logical argument or field name that failed validation. |
required |
value
|
Any
|
Offending value supplied by the caller. |
required |
message
|
str | None
|
Optional human-readable message. A default is generated when omitted. |
None
|
context
|
dict[str, Any] | None
|
Optional machine-readable diagnostics. |
None
|
Attributes:
| Name | Type | Description |
|---|---|---|
param |
str
|
Name of the failing logical parameter. |
value |
Any
|
Invalid value. |
context |
dict[str, Any] | None
|
Optional diagnostics inherited from |
FieldServiceError(param: str, value: Any, message: str | None = None, *, context: dict[str, Any] | None = None)
¶
Bases: InvalidValueError
Base error for invalid runtime inputs supplied to inference.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
param
|
str
|
Logical input name that failed validation. |
required |
value
|
Any
|
Offending value supplied to the inference API. |
required |
message
|
str | None
|
Optional human-readable message. A default is generated when omitted. |
None
|
context
|
dict[str, Any] | None
|
Optional machine-readable diagnostics. |
None
|
Attributes:
| Name | Type | Description |
|---|---|---|
param |
str
|
Name of the failing logical input. |
value |
Any
|
Invalid value. |
context |
dict[str, Any] | None
|
Optional diagnostics inherited from |
InvalidValueError(param: str, value: Any, message: str | None = None, *, context: dict[str, Any] | None = None)
¶
Bases: MLSchemaError, ValueError
Standard signal for configuration or user-input violations.
Raised when a supplied argument, configuration value, or runtime artefact fails validation. Subclasses narrow the scope to specific domains (e.g., registry vs. service faults).
Args¶
param: Logical argument name that triggered the failure
("dtype", "kind", …).
value: Offending value already normalised by the caller.
message: Human-readable description. If None, a neutral default is
auto-generated.
context: Arbitrary diagnostics for observability pipelines
(e.g., {"builder": "number_builder"}).
Attributes: param: Same as the param constructor argument. value: Same as the value constructor argument. context: Same as the context constructor argument. Same as the context constructor argument.
Examples¶
if kind_name in models:
raise InvalidValueError(
param="kind",
value=kind_name,
message=f"kind {kind_name!r} already mapped",
context={"registered_kind": models[kind_name]},
)
MLSchemaError(message: str | None = None, *, context: dict[str, Any] | None = None)
¶
Bases: Exception
Project-root for every mlschema runtime failure.
All domain-specific exceptions ultimately derive from this class, enabling both narrow and broad interception patterns:
try:
schema = infer_schema(df)
except MLSchemaError as exc: # catch-all
logger.error("Schema failure: %s", exc, exc_info=True)
raise HTTPException(422, detail=str(exc)) from exc
Attributes:
| Name | Type | Description |
|---|---|---|
context |
dict[str, Any] | None
|
Optional, machine-friendly diagnostics (e.g., offending
|
UnknownFieldKindError(kind_name: str)
¶
Bases: FieldKindError
Raised when a builder emits an unregistered field kind.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
kind_name
|
str
|
Field-kind discriminator found in builder output. |
required |
Attributes:
| Name | Type | Description |
|---|---|---|
param |
str
|
Always |
value |
Any
|
Unknown kind name. |
context |
dict[str, Any] | None
|
Contains |
infer_schema(df: DataFrame, *, builders: Sequence[FieldBuilder] = (), kinds: Sequence[FieldKind] = (), overrides: Mapping[str, Mapping[str, Any]] | None = None, onehot_separator: str = '__') -> list[FieldDict]
¶
Infer a strict field-list schema from a pandas DataFrame.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
Source DataFrame. Each column becomes one field in output order. |
required |
builders
|
Sequence[FieldBuilder]
|
Optional callables that can customize inference for already registered kinds. These run before custom kind builders and builtins. |
()
|
kinds
|
Sequence[FieldKind]
|
Optional strict custom field kinds created with |
()
|
overrides
|
Mapping[str, Mapping[str, Any]] | None
|
Optional mapping of column name to final field patch. Patches are applied after builder inference and before Pydantic validation. |
None
|
onehot_separator
|
str
|
Separator for one-hot columns, e.g. |
'__'
|
Returns:
| Type | Description |
|---|---|
list[FieldDict]
|
JSON-serialisable list of validated field dictionaries. |
Raises:
| Type | Description |
|---|---|
EmptyDataFrameError
|
If |
FieldKindAlreadyRegisteredError
|
If two kinds share the same name. |
FieldBuilderError
|
If an override targets a missing column, a builder returns invalid data, or no builder matches a column. |
UnknownFieldKindError
|
If a builder emits a kind with no registered model. |
ValidationError
|
If generated or overridden field data violates the target field model. |
mlschema.strategies
¶
Builtin field builders and Pydantic field models.
boolean_builder(_series: Series, ctx: FieldContext) -> FieldDict | None
¶
Infer a boolean field for pandas boolean columns.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
_series
|
Series
|
Source column. The value is unused because dtype metadata in
|
required |
ctx
|
FieldContext
|
Column metadata including normalised dtype, name, and required flag. |
required |
Returns:
| Type | Description |
|---|---|
FieldDict | None
|
A strict field dict for |
FieldDict | None
|
|
builtin_kinds() -> tuple[FieldKind, ...]
¶
category_builder(series: Series, ctx: FieldContext) -> FieldDict | None
¶
Infer a category field from a pandas categorical column.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
series
|
Series
|
Source column whose options are extracted from categorical metadata or non-null unique values. |
required |
ctx
|
FieldContext
|
Column metadata including normalised dtype, name, and required flag. |
required |
Returns:
| Type | Description |
|---|---|
FieldDict | None
|
A strict field dict for |
FieldDict | None
|
otherwise |
date_builder(_series: Series, ctx: FieldContext) -> FieldDict | None
¶
Infer a date field for pandas datetime columns.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
_series
|
Series
|
Source column. The value is unused because dtype metadata in
|
required |
ctx
|
FieldContext
|
Column metadata including normalised dtype, name, and required flag. |
required |
Returns:
| Type | Description |
|---|---|
FieldDict | None
|
A strict field dict for |
FieldDict | None
|
datetime dtype; otherwise |
number_builder(series: Series, ctx: FieldContext) -> FieldDict | None
¶
Infer a number field for supported numeric columns.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
series
|
Series
|
Source column inspected with pandas' dtype helpers to decide the
generated |
required |
ctx
|
FieldContext
|
Column metadata including normalised dtype, name, and required flag. |
required |
Returns:
| Type | Description |
|---|---|
FieldDict | None
|
A strict field dict for |
FieldDict | None
|
otherwise |
series_builder(series: Series, ctx: FieldContext) -> FieldDict | None
¶
Infer a series field for columns containing two-element cells.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
series
|
Series
|
Source column containing tuple, list, or dict cells. |
required |
ctx
|
FieldContext
|
Column metadata plus the recursive |
required |
Returns:
| Type | Description |
|---|---|
FieldDict | None
|
A strict field dict for |
FieldDict | None
|
two-element compound values; otherwise |
text_builder(_series: Series, ctx: FieldContext) -> FieldDict
¶
Infer a text field for any remaining column.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
_series
|
Series
|
Source column. The value is unused because this builder is a fallback and accepts any dtype not claimed earlier. |
required |
ctx
|
FieldContext
|
Column metadata including name and required flag. |
required |
Returns:
| Type | Description |
|---|---|
FieldDict
|
A strict field dict for |