Referencia API¶
mlschema
¶
Production-grade orchestration for translating pandas DataFrames into validated JSON field schemas.
The package exports a single façade—:class:mlschema.MLSchema
(alias :pydata:mlform.MLForm) which wraps the internal Service/Registry
subsystem. Client code typically:
- Registers concrete field strategies.
- Builds a JSON-serialisable schema from a DataFrame.
Public surface
MLSchema— canonical entry pointmlschema.core.Strategy— extension contract (advanced)mlschema.core.BaseField— Pydantic base model (advanced)- All runtime errors derive from
mlschema.core.MLSchemaError.
Example
from mlschema import MLSchema
from mlschema.strategies import NumberStrategy
import pandas as pd
ms = MLSchema()
ms.register(NumberStrategy())
df = pd.DataFrame({"age": [22, 37, 29]})
schema = ms.build(df)
MLSchema()
¶
Facade that orchestrates strategy registration and schema generation.
The class wraps an internal :class:mlschema.core.app.Service instance and
surfaces a minimal, stable API for client code. It is therefore the
canonical entry point when integrating mlschema into an application or
pipeline.
Attributes:
| Name | Type | Description |
|---|---|---|
field_service |
Internal service component that performs the heavy lifting (registry management and JSON payload generation). |
build(df: DataFrame) -> dict[str, list[dict[str, Any]]]
¶
Translate a DataFrame into a JSON-serialisable form schema
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
Source data whose columns will be analysed and mapped to field definitions. |
required |
Returns:
| Type | Description |
|---|---|
dict[str, list[dict[str, Any]]]
|
Dictionary with the schema information, where keys are field names |
Raises:
| Type | Description |
|---|---|
EmptyDataFrameError
|
If the DataFrame is empty. |
FallbackStrategyMissingError
|
If no fallback strategy is available for the DataFrame. |
PydanticCustomError
|
If there are validation errors in the schema. |
register(strategy: Strategy) -> None
¶
Register a new strategy.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
strategy
|
Strategy
|
Instance of a concrete :class: |
required |
Raises:
| Type | Description |
|---|---|
StrategyNameAlreadyRegisteredError
|
If a strategy with the same name is already registered. |
StrategyDtypeAlreadyRegisteredError
|
If a strategy with the same dtype is already registered. |
mlschema.core
¶
Core abstractions and error contracts for MLSchema.
This module defines the extension surface on which all custom behaviour is built. Integrators subclass the abstractions below to introduce new data types or override default processing logic, and they trap the accompanying exceptions to maintain deterministic error handling across the pipeline.
BaseField
¶
Bases: BaseModel
Standard metadata present in all fields.
Extend this class to define custom field types.
Attributes:
| Name | Type | Description |
|---|---|---|
title |
Annotated[str, Field(min_length=1, max_length=100)]
|
Human-readable field identifier (1-100 characters). |
description |
Annotated[str | None, Field(max_length=500)]
|
Optional description (max. 500 characters). |
required |
bool
|
True if the original column contains no null values. |
EmptyDataFrameError(df)
¶
FallbackStrategyMissingError(dtype_str: str)
¶
Bases: FieldServiceError
Conflict: No registered strategy matches the column dtype, and the fallback "text" strategy is absent.
FieldRegistryError(param: str, value: Any, message: str | None = None, *, context: dict[str, Any] | None = None)
¶
Bases: InvalidValueError
Domain-root for registry-layer validation failures.
Triggered when an operation on the strategy registry receives an invalid, conflicting, or otherwise disallowed value.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
param
|
str
|
Logical argument name that caused the fault |
required |
value
|
Any
|
Offending value (already normalised by the caller). |
required |
message
|
str | None
|
Human-readable description. If None, a neutral default is autogenerated. |
None
|
context
|
dict[str, Any] | None
|
Arbitrary diagnostics—module, strategy ID, etc. |
None
|
Attributes:
| Name | Type | Description |
|---|---|---|
param |
str
|
Same as the param constructor argument. |
value |
Any
|
Same as the value constructor argument. |
context |
dict[str, Any] | None
|
Same as the context constructor argument. |
Example¶
if type_name in registry:
raise FieldRegistryError(
param="type_name",
value=type_name,
message=f"Strategy {type_name!r} already exists",
context={"existing_cls": registry[type_name]},
)
FieldServiceError(param: str, value: Any, message: str | None = None, *, context: dict[str, Any] | None = None)
¶
Bases: InvalidValueError
Domain-root for service-layer validation failures.
Triggered when runtime data or configuration supplied to the Service component is missing, malformed, or otherwise unusable.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
param
|
str
|
Logical argument name that caused the fault
(e.g., |
required |
value
|
Any
|
Offending value—typically the incoming DataFrame, a dtype string, or a strategy identifier. |
required |
message
|
str | None
|
Human-readable description. If None, a neutral default is autogenerated. |
None
|
context
|
dict[str, Any] | None
|
Arbitrary diagnostics for logs or metrics (row/column counts, offending dtype, etc.). |
None
|
Attributes:
| Name | Type | Description |
|---|---|---|
param |
str
|
Mirrors the param constructor argument. |
value |
Any
|
Mirrors the value constructor argument. |
context |
dict[str, Any] | None
|
Mirrors the context constructor argument. |
Example¶
if df.empty:
raise FieldServiceError(
param="dataframe",
value=df,
message="Input DataFrame is empty",
context={"rows": 0, "cols": 0},
)
InvalidValueError(param: str, value: Any, message: str | None = None, *, context: dict[str, Any] | None = None)
¶
Bases: MLSchemaError, ValueError
Standard signal for configuration or user-input violations.
Raised when a supplied argument, configuration value, or runtime artefact fails validation. Subclasses narrow the scope to specific domains (e.g., registry vs. service faults).
Args¶
param: Logical argument name that triggered the failure
("dtype", "type_name", …).
value: Offending value already normalised by the caller.
message: Human-readable description. If None, a neutral default is
auto-generated.
context: Arbitrary diagnostics for observability pipelines
(e.g., {"strategy": "NumberStrategy"}).
Attributes: param: Same as the param constructor argument. value: Same as the value constructor argument. context: Same as the context constructor argument. Same as the context constructor argument.
Examples¶
if dtype_key in registry:
raise InvalidValueError(
param="dtype",
value=dtype_key,
message=f"dtype {dtype_key!r} already mapped",
context={"registered_strategy": registry[dtype_key]},
)
MLSchemaError(message: str | None = None, *, context: dict[str, Any] | None = None)
¶
Bases: Exception
Project-root for every mlschema runtime failure.
All domain-specific exceptions ultimately derive from this class, enabling both narrow and broad interception patterns:
try:
schema = ms.build(df)
except MLSchemaError as exc: # catch-all
logger.error("Schema failure: %s", exc, exc_info=True)
raise HTTPException(422, detail=str(exc)) from exc
Attributes:
| Name | Type | Description |
|---|---|---|
context |
dict[str, Any] | None
|
Optional, machine-friendly diagnostics (e.g., offending
|
Strategy(*, type_name: str, schema_cls: type[BaseField], dtypes: Sequence[str | Any])
¶
Abstract base class for all MLSchema field strategies.
Each concrete strategy maps a single pandas dtype (or group of dtypes) to a
validated JSON field specification. Strategies are opt-in: they influence
schema generation only after being registered via MLSchema.register().
Usage contract
- Do not mutate the incoming
Series; treat it as read-only. - Subclasses should override
attributes_from_series()to emit extra metadata, but must avoid the reserved keys:"title","type","required","description". - Registration is idempotent—duplicate
type_name's must be replaced viaMLSchema.update().
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
type_name
|
str
|
Identifier for the strategy type. |
required |
schema_cls
|
type[BaseField]
|
Pydantic class that models the field. |
required |
dtypes
|
Sequence[str | Any]
|
Sequence of |
required |
dtypes: tuple[str, ...]
property
¶
Tuple of supported dtype names.
schema_cls: type[BaseField]
property
¶
Pydantic class used to serialize the schema.
type_name: str
property
¶
Identifier for the strategy type.
attributes_from_series(series: Series) -> dict
¶
Calculate field-specific attributes.
This method can be overridden by subclasses to add implementation-specific metadata to the schema.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
series
|
Series
|
Dataframe column to analyze. |
required |
Returns:
| Type | Description |
|---|---|
dict
|
Dictionary with additional attributes; never includes the standard keys |
build_dict(series: Series) -> dict
¶
Create the JSON representation of the schema.
Combines the standard attributes with those returned by attributes_from_series and serializes the result with the associated Pydantic class.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
series
|
Series
|
Dataframe column to analyze. |
required |
Returns:
| Type | Description |
|---|---|
dict
|
JSON with the field schema. |
StrategyDtypeAlreadyRegisteredError(dtype_key: str)
¶
StrategyNameAlreadyRegisteredError(type_name: str)
¶
mlschema.strategies
¶
Strategies sub-package for MLSchema
This namespace aggregates the concrete Strategy implementations that map
pandas dtypes to validated JSON field definitions. All classes inherit from
mlschema.core.Strategy and are opt-in, they become active only after an
explicit MLSchema.register() call.
Strategies Available
| Class | Description |
|---|---|
| BooleanStrategy | Strategy for handling boolean data types. |
| CategoryStrategy | Strategy for handling categorical data types. |
| DateStrategy | Strategy for handling date and datetime data types. |
| NumberStrategy | Strategy for handling numeric data types. |
| TextStrategy | Strategy for handling text and string data types. |
Design notes¶
| Principle | Description |
|---|---|
| Single-responsibility | Each strategy handles one logical field type. |
| Pluggable | New strategies register via MLSchema.register(), replace via MLSchema.update(), and deregister via MLSchema.unregister(). |
| Declarative output | Strategies emit validated BaseField subclasses, ensuring schema integrity from ingestion to UI rendering. |
BooleanStrategy()
¶
Bases: Strategy
Instance of Strategy for boolean fields.
Name
boolean
Dtypes
| Name | Type |
|---|---|
| bool | BooleanDtype |
| boolean | BooleanDtype |
Model Attributes
| Name | Type | Description |
|---|---|---|
| type | Literal["boolean"] |
Fixed type for the strategy. |
| value | bool | None |
The current value of the field. |
CategoryStrategy()
¶
Bases: Strategy
Instance of Strategy for category fields.
Name
category
Dtypes
| Name | Type |
|---|---|
| category | CategoricalDtype |
Model Attributes
| Name | Type | Description |
|---|---|---|
| type | Literal["category"] |
Fixed type for the strategy. |
| options | list[str] |
List of allowed categories. |
| value | str | None |
Current value of the field. |
Model Restrictions
| Description | Error Type | Error Message |
|---|---|---|
value in options |
PydanticCustomError |
value {value} must be in options {options} |
attributes_from_series(series: Series) -> dict
¶
Derives the list of options from the series.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
series
|
Series
|
Pandas series with categorical values. |
required |
Returns:
| Type | Description |
|---|---|
dict
|
Dictionary with the |
DateStrategy()
¶
Bases: Strategy
Instance of Strategy for date fields.
Name
date
Dtypes
| Name | Type |
|---|---|
| datetime64[ns] | DatetimeTZDtype |
| datetime64 | DatetimeDtype |
Model Attributes
| Name | Type | Description |
|---|---|---|
| type | Literal["date"] |
Fixed type for the strategy. |
| value | date | None |
The current value of the field. |
| min | date | None |
Minimum allowed date. |
| max | date | None |
Maximum allowed date. |
| step | PositiveInt |
Increment in days. |
Model Restrictions
| Description | Error Type | Error Message |
|---|---|---|
min ≤ max |
PydanticCustomError |
min {min} must be ≤ max {max} |
value ≥ min |
PydanticCustomError |
value {value} must be ≥ min {min} |
value ≤ max |
PydanticCustomError |
value {value} must be ≤ max {max} |
NumberStrategy()
¶
Bases: Strategy
Instance of Strategy for number fields.
Name
number
Dtypes
| Name | Type |
|---|---|
| int64 | Int64Dtype |
| float64 | Float64Dtype |
| int32 | Int32Dtype |
| float32 | Float32Dtype |
Model Attributes
| Name | Type | Description |
|---|---|---|
| type | Literal["number"] |
Fixed type for the strategy. |
| value | int | float | None |
The current value of the field. |
| step | float | int |
Increment for numeric values. |
| min | int | float | None |
Minimum allowed value. |
| max | int | float | None |
Maximum allowed value. |
| unit | str | None |
Unit of measurement for the numeric value. |
| placeholder | str | None |
Placeholder text for the field. |
Model Restrictions
| Description | Error Type | Error Message |
|---|---|---|
min ≤ max |
PydanticCustomError |
min {min} must be ≤ max {max} |
value ≥ min |
PydanticCustomError |
value {value} must be ≥ min {min} |
value ≤ max |
PydanticCustomError |
value {value} must be ≤ max {max} |
attributes_from_series(series: Series) -> dict
¶
Derives the step attribute from the dtype.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
series
|
Series
|
Pandas series with numeric values. |
required |
Returns:
| Type | Description |
|---|---|
dict
|
Dictionary with the |
TextStrategy()
¶
Bases: Strategy
Instance of Strategy for text fields.
Name
text
Dtypes
| Name | Type |
|---|---|
| object | object |
| string | StringDtype |
Model Attributes
| Name | Type | Description |
|---|---|---|
| type | Literal["text"] |
Fixed type for the strategy. |
| value | str | None |
The current value of the field. |
| placeholder | str | None |
Placeholder text for the field. |
| min_length | int | None |
Minimum length of the text. |
| max_length | int | None |
Maximum length of the text. |
| pattern | str | None |
Regular expression pattern for validation. |
Model Restrictions
| Description | Error Type | Error Message |
|---|---|---|
min_length ≤ max_length |
PydanticCustomError |
minLength {minLength} must be ≤ maxLength {maxLength} |
value length ≥ min_length |
PydanticCustomError |
value length {value_length} must be ≥ minLength {minLength} |
value length ≤ max_length |
PydanticCustomError |
value length {value_length} must be ≤ maxLength {maxLength} |