Skip to content

Referencia API

mlschema

Production-grade orchestration for translating pandas DataFrames into validated JSON field schemas.

The package exports a single façade—:class:mlschema.MLSchema (alias :pydata:mlform.MLForm) which wraps the internal Service/Registry subsystem. Client code typically:

  1. Registers concrete field strategies.
  2. Builds a JSON-serialisable schema from a DataFrame.
Public surface
  • MLSchema — canonical entry point
  • mlschema.core.Strategy — extension contract (advanced)
  • mlschema.core.BaseField — Pydantic base model (advanced)
  • All runtime errors derive from mlschema.core.MLSchemaError.
Example
from mlschema import MLSchema
from mlschema.strategies import NumberStrategy
import pandas as pd

ms = MLSchema()
ms.register(NumberStrategy())

df = pd.DataFrame({"age": [22, 37, 29]})
schema = ms.build(df)

MLSchema()

Facade that orchestrates strategy registration and schema generation.

The class wraps an internal :class:mlschema.core.app.Service instance and surfaces a minimal, stable API for client code. It is therefore the canonical entry point when integrating mlschema into an application or pipeline.

Attributes:

Name Type Description
field_service

Internal service component that performs the heavy lifting (registry management and JSON payload generation).

build(df: DataFrame) -> dict[str, list[dict[str, Any]]]

Translate a DataFrame into a JSON-serialisable form schema

Parameters:

Name Type Description Default
df DataFrame

Source data whose columns will be analysed and mapped to field definitions.

required

Returns:

Type Description
dict[str, list[dict[str, Any]]]

Dictionary with the schema information, where keys are field names

Raises:

Type Description
EmptyDataFrameError

If the DataFrame is empty.

FallbackStrategyMissingError

If no fallback strategy is available for the DataFrame.

PydanticCustomError

If there are validation errors in the schema.

register(strategy: Strategy) -> None

Register a new strategy.

Parameters:

Name Type Description Default
strategy Strategy

Instance of a concrete :class:mlschema.core.app.Strategy.

required

Raises:

Type Description
StrategyNameAlreadyRegisteredError

If a strategy with the same name is already registered.

StrategyDtypeAlreadyRegisteredError

If a strategy with the same dtype is already registered.

unregister(strategy: Strategy) -> None

Remove a previously registered strategy.

Parameters:

Name Type Description Default
strategy Strategy

Strategy instance to be removed from the registry.

required

update(strategy: Strategy) -> None

Replace an existing strategy in-place.

If either the type_name or any of the advertised dtypes already exist, they are overwritten with the supplied strategy.

Parameters:

Name Type Description Default
strategy Strategy

Instance of Strategy to update.

required

mlschema.core

Core abstractions and error contracts for MLSchema.

This module defines the extension surface on which all custom behaviour is built. Integrators subclass the abstractions below to introduce new data types or override default processing logic, and they trap the accompanying exceptions to maintain deterministic error handling across the pipeline.

BaseField

Bases: BaseModel

Standard metadata present in all fields.

Aligns with mlform's BaseFieldConfig. Extend this class to define custom field types.

Attributes:

Name Type Description
label Annotated[str, Field(min_length=1, max_length=100)]

Human-readable field identifier (1-100 chars).

description Annotated[str | None, Field(max_length=500)]

Optional help text (max 500 chars).

required bool

Field is mandatory (mlform default: false).

disabled bool | None

Field is disabled (mlform default: false).

hidden bool | None

Field is hidden (mlform default: false).

readOnly bool | None

Field is read-only (mlform default: false).

disabledWhen Any | None

Declarative condition to disable the field.

hiddenWhen Any | None

Declarative condition to hide the field.

readOnlyWhen Any | None

Declarative condition to make field read-only.

asyncValidationDebounceMs int | None

Debounce in ms for async validation.

inactiveFieldPolicy Literal['include', 'omit', 'reset-on-hide'] | None

Behaviour when field becomes inactive.

valuePath str | list[str] | None

Key path used when reading the field value on submit.

defaultValue Any | None

Initial value for the field.

ui dict[str, Any] | None

Arbitrary UI-layer props forwarded to the component.

BaseReport

Bases: BaseModel

Standard metadata present in all reports.

Aligns with mlform's BaseReportConfig. Extend this class to define custom report types.

Attributes:

Name Type Description
label Annotated[str | None, Field(max_length=100)]

Human-readable report title (max 100 chars).

description Annotated[str | None, Field(max_length=500)]

Optional help text (max 500 chars).

source str | None

Key used to locate the report payload in the submit result.

ui dict[str, Any] | None

Arbitrary UI-layer props forwarded to the component.

EmptyDataFrameError(df)

Bases: FieldServiceError

Conflict: Service received a DataFrame with zero rows or columns.

FallbackStrategyMissingError(dtype_str: str)

Bases: FieldServiceError

Conflict: No registered strategy matches the column dtype, and the fallback "text" strategy is absent.

FieldRegistryError(param: str, value: Any, message: str | None = None, *, context: dict[str, Any] | None = None)

Bases: InvalidValueError

Domain-root for registry-layer validation failures.

Triggered when an operation on the strategy registry receives an invalid, conflicting, or otherwise disallowed value.

Parameters:

Name Type Description Default
param str

Logical argument name that caused the fault

required
value Any

Offending value (already normalised by the caller).

required
message str | None

Human-readable description. If None, a neutral default is autogenerated.

None
context dict[str, Any] | None

Arbitrary diagnostics—module, strategy ID, etc.

None

Attributes:

Name Type Description
param str

Same as the param constructor argument.

value Any

Same as the value constructor argument.

context dict[str, Any] | None

Same as the context constructor argument.

Example
if type_name in registry:
    raise FieldRegistryError(
        param="type_name",
        value=type_name,
        message=f"Strategy {type_name!r} already exists",
        context={"existing_cls": registry[type_name]},
    )

FieldServiceError(param: str, value: Any, message: str | None = None, *, context: dict[str, Any] | None = None)

Bases: InvalidValueError

Domain-root for service-layer validation failures.

Triggered when runtime data or configuration supplied to the Service component is missing, malformed, or otherwise unusable.

Parameters:

Name Type Description Default
param str

Logical argument name that caused the fault (e.g., "dataframe", "dtype", "fallback_strategy").

required
value Any

Offending value—typically the incoming DataFrame, a dtype string, or a strategy identifier.

required
message str | None

Human-readable description. If None, a neutral default is autogenerated.

None
context dict[str, Any] | None

Arbitrary diagnostics for logs or metrics (row/column counts, offending dtype, etc.).

None

Attributes:

Name Type Description
param str

Mirrors the param constructor argument.

value Any

Mirrors the value constructor argument.

context dict[str, Any] | None

Mirrors the context constructor argument.

Example
if df.empty:
    raise FieldServiceError(
        param="dataframe",
        value=df,
        message="Input DataFrame is empty",
        context={"rows": 0, "cols": 0},
    )

InvalidValueError(param: str, value: Any, message: str | None = None, *, context: dict[str, Any] | None = None)

Bases: MLSchemaError, ValueError

Standard signal for configuration or user-input violations.

Raised when a supplied argument, configuration value, or runtime artefact fails validation. Subclasses narrow the scope to specific domains (e.g., registry vs. service faults).

Args

param: Logical argument name that triggered the failure ("dtype", "type_name", …). value: Offending value already normalised by the caller. message: Human-readable description. If None, a neutral default is auto-generated. context: Arbitrary diagnostics for observability pipelines (e.g., {"strategy": "NumberStrategy"}).

Attributes: param: Same as the param constructor argument. value: Same as the value constructor argument. context: Same as the context constructor argument. Same as the context constructor argument.

Examples
if dtype_key in registry:
    raise InvalidValueError(
        param="dtype",
        value=dtype_key,
        message=f"dtype {dtype_key!r} already mapped",
        context={"registered_strategy": registry[dtype_key]},
    )

MLSchemaError(message: str | None = None, *, context: dict[str, Any] | None = None)

Bases: Exception

Project-root for every mlschema runtime failure.

All domain-specific exceptions ultimately derive from this class, enabling both narrow and broad interception patterns:

try:
    schema = ms.build(df)
except MLSchemaError as exc:  # catch-all
    logger.error("Schema failure: %s", exc, exc_info=True)
    raise HTTPException(422, detail=str(exc)) from exc

Attributes:

Name Type Description
context dict[str, Any] | None

Optional, machine-friendly diagnostics (e.g., offending dtype, column name, strategy ID). Contents are stable only for the public leaf exceptions; treat additional keys as informational.

Strategy(*, type_name: str, schema_cls: type[BaseField], dtypes: Sequence[str | Any])

Abstract base class for all MLSchema field strategies.

Each concrete strategy maps a single pandas dtype (or group of dtypes) to a validated JSON field specification. Strategies are opt-in: they influence schema generation only after being registered via MLSchema.register().

Usage contract
  • Do not mutate the incoming Series; treat it as read-only.
  • Subclasses should override attributes_from_series() to emit extra metadata, but must avoid the reserved keys: "label", "kind", "required", "description".
  • Registration is idempotent—duplicate type_name's must be replaced via MLSchema.update().

Parameters:

Name Type Description Default
type_name str

Identifier for the strategy type.

required
schema_cls type[BaseField]

Pydantic class that models the field.

required
dtypes Sequence[str | Any]

Sequence of dtype (instances or names) to which the strategy applies.

required

dtypes: tuple[str, ...] property

Tuple of supported dtype names.

schema_cls: type[BaseField] property

Pydantic class used to serialize the schema.

type_name: str property

Identifier for the strategy type.

attributes_from_series(series: Series) -> dict

Calculate field-specific attributes.

This method can be overridden by subclasses to add implementation-specific metadata to the schema.

Parameters:

Name Type Description Default
series Series

Dataframe column to analyze.

required

Returns:

Type Description
dict

Dictionary with additional attributes; never includes the standard keys

dict

label, kind, required, description.

build_dict(series: Series) -> dict

Create the JSON representation of the schema.

Combines the standard attributes with those returned by attributes_from_series and serializes the result with the associated Pydantic class.

Parameters:

Name Type Description Default
series Series

Dataframe column to analyze.

required

Returns:

Type Description
dict

JSON with the field schema.

content_probe(series: Series) -> bool

Content-based detection hook.

Override to opt in to content-based strategy selection. When dtype lookup yields no match, the Service iterates registered strategies and selects the first whose content_probe returns True.

Parameters:

Name Type Description Default
series Series

DataFrame column to inspect.

required

Returns:

Type Description
bool

True if this strategy can handle the column based on its values.

set_registry(registry: Any) -> None

Hook called by Service after registration.

Override in subclasses that need access to the registry at inference time (e.g. to route sub-column dtypes through registered strategies).

Parameters:

Name Type Description Default
registry Any

The :class:Registry instance managing all active strategies.

required

StrategyDtypeAlreadyRegisteredError(dtype_key: str)

Bases: FieldRegistryError

Conflict: two strategies contend for the same dtype key.

StrategyNameAlreadyRegisteredError(type_name: str)

Bases: FieldRegistryError

Conflict: two strategies contend for the same type_name.

mlschema.strategies

Strategies sub-package for MLSchema

This namespace aggregates the concrete Strategy implementations that map pandas dtypes to validated JSON field definitions. All classes inherit from mlschema.core.Strategy and are opt-in, they become active only after an explicit MLSchema.register() call.

Strategies Available
Class Description
BooleanStrategy Strategy for handling boolean data types.
CategoryStrategy Strategy for handling categorical data types.
DateStrategy Strategy for handling date and datetime data types.
NumberStrategy Strategy for handling numeric data types.
SeriesStrategy Strategy for handling two-axis series columns (compound cell values).
TextStrategy Strategy for handling text and string data types.

Design notes

Principle Description
Single-responsibility Each strategy handles one logical field type.
Pluggable New strategies register via MLSchema.register(), replace via MLSchema.update(), and deregister via MLSchema.unregister().
Declarative output Strategies emit validated BaseField subclasses, ensuring schema integrity from ingestion to UI rendering.

BooleanStrategy()

Bases: Strategy

Instance of Strategy for boolean fields.

Name

boolean

Dtypes
Name Type
bool BooleanDtype
boolean BooleanDtype
Model Attributes
Name Type Description
kind Literal["boolean"] Fixed type for the strategy.
defaultValue bool | None Initial value of the field.

CategoryStrategy()

Bases: Strategy

Instance of Strategy for category fields.

Name

category

Dtypes
Name Type
category CategoricalDtype
Model Attributes
Name Type Description
kind Literal["category"] Fixed type for the strategy.
options list[str] List of allowed categories.
defaultValue str | None Initial value of the field.
Model Restrictions
Description Error Type Error Message
defaultValue in options PydanticCustomError defaultValue must be in options

attributes_from_series(series: Series) -> dict

Derives the list of options from the series.

Parameters:

Name Type Description Default
series Series

Pandas series with categorical values.

required

Returns:

Type Description
dict

Dictionary with the options key and the list of unique values.

DateStrategy()

Bases: Strategy

Instance of Strategy for date fields.

Name

date

Dtypes
Name Type
datetime64[ns] DatetimeTZDtype
datetime64 DatetimeDtype
Model Attributes
Name Type Description
kind Literal["date"] Fixed type for the strategy.
defaultValue str | None Initial ISO date value of the field.
min str | None Minimum allowed ISO date.
max str | None Maximum allowed ISO date.
step PositiveInt Increment in days.
Model Restrictions
Description Error Type Error Message
minmax PydanticCustomError min {min} must be ≤ max {max}
defaultValuemin PydanticCustomError defaultValue must be ≥ min
defaultValuemax PydanticCustomError defaultValue must be ≤ max

NumberStrategy()

Bases: Strategy

Instance of Strategy for number fields.

Name

number

Dtypes
Name Type
int64 Int64Dtype
float64 Float64Dtype
int32 Int32Dtype
float32 Float32Dtype
Model Attributes
Name Type Description
kind Literal["number"] Fixed type for the strategy.
defaultValue int | float | None Initial value of the field.
step float | int Increment for numeric values.
min int | float | None Minimum allowed value.
max int | float | None Maximum allowed value.
unit str | None Unit of measurement for the numeric value.
placeholder str | None Placeholder text for the field.
Model Restrictions
Description Error Type Error Message
minmax PydanticCustomError min {min} must be ≤ max {max}
defaultValuemin PydanticCustomError defaultValue {value} must be ≥ min {min}
defaultValuemax PydanticCustomError defaultValue {value} must be ≤ max {max}

attributes_from_series(series: Series) -> dict

Derives the step attribute from the dtype.

Parameters:

Name Type Description Default
series Series

Pandas series with numeric values.

required

Returns:

Type Description
dict

Dictionary with the step key.

SeriesStrategy()

Bases: Strategy

Strategy for series (two-axis) fields.

Each cell must be a 2-element compound value
  • tuple/list: (v1, v2) — positional, names default to "field1" / "field2"
  • dict: {"key1": v1, "key2": v2} — named, dict keys become sub-field labels

Field schemas are auto-inferred from sampled cell values via the injected registry. No dtypes registered — selected automatically via :meth:content_probe or applied manually.

Name

series

Dtypes

None — content-based detection only.

Model Attributes
Name Type Description
kind Literal["series"] Fixed type identifier.
field1 BaseField Schema of the first element of each cell.
field2 BaseField Schema of the second element of each cell.
Model Restrictions
Description Error Type Error Message
field1/field2 not series PydanticCustomError SeriesField cannot be nested
field1/field2 kind known PydanticCustomError Unknown sub-field type: '{kind_name}'. Register via add_series_sub_field()
Note

minPoints and maxPoints are constraints on :class:SeriesField itself and must be set there directly — they are not strategy-level parameters.

attributes_from_series(series: Series) -> dict

Derive field1 and field2 sub-schemas from the series.

Extracts element sub-Series from compound cells, infers their dtypes via the injected registry, and delegates schema building to the matching strategy. Falls back to a bare text schema when no registry is available or the dtype is unrecognised.

Parameters:

Name Type Description Default
series Series

DataFrame column with 2-element compound values.

required

Returns:

Type Description
dict

Dictionary with field1 and field2 sub-schemas.

content_probe(series: Series) -> bool

Return True if all non-null values are 2-element tuples, lists, or dicts.

Parameters:

Name Type Description Default
series Series

DataFrame column to inspect.

required

Returns:

Type Description
bool

True if the column contains only 2-element compound values.

set_registry(registry: Any) -> None

Store registry reference injected by Service after registration.

TextStrategy()

Bases: Strategy

Instance of Strategy for text fields.

Name

text

Dtypes
Name Type
object object
string StringDtype
Model Attributes
Name Type Description
kind Literal["text"] Fixed type for the strategy.
defaultValue str | None Initial value of the field.
placeholder str | None Placeholder text for the field.
min_length int | None Minimum length of the text.
max_length int | None Maximum length of the text.
pattern str | None Regular expression pattern for validation.
Model Restrictions
Description Error Type Error Message
min_lengthmax_length PydanticCustomError minLength {minLength} must be ≤ maxLength {maxLength}
defaultValue length ≥ min_length PydanticCustomError defaultValue length {value_length} must be ≥ minLength {minLength}
defaultValue length ≤ max_length PydanticCustomError defaultValue length {value_length} must be ≤ maxLength {maxLength}