Skip to content

Changelog

All notable changes to MLSchema are documented in this file.

The format follows Keep a Changelog, and this project follows Semantic Versioning.

[Unreleased]

No unreleased changes documented yet.

[0.2.1] - 2026-06-18

Added

  • Added mandatory mappedTo output for normal fields.
  • Added onehot-category inference for named one-hot encoded feature columns.
  • Added options[].mappedTo targets for onehot-category options.
  • Added onehot_separator to infer_schema(); the default separator is "__" for feature__value columns.
  • Added positional-column fallback labels such as feature_0, feature_1, and integer mappedTo targets.

Changed

  • Changed named DataFrame columns to emit string mappedTo targets matching the column name.
  • Changed positional DataFrame columns to emit generated labels and zero-based integer mappedTo targets.
  • Changed one-hot grouping to require named encoded feature columns; positional binary columns now remain ordinary fields.
  • Updated README, usage docs, and schema-standard docs for the mappedTo and onehot-category contracts.

Fixed

  • Fixed schema mapping ambiguity between display labels and backend targets.
  • Fixed one-hot output so the parent field has no mappedTo; each option owns its backend target.
  • Fixed regression coverage for named feature mappings, positional mappings, custom one-hot separators, and positional binary columns.

[0.2.0] - 2026-06-01

0.2.0 is a breaking release. MLSchema moves from the previous class-and-registry API to a smaller function-first API centred on infer_schema(df). The schema output is now the field list itself, not a top-level object containing fields, reports, or explanations.

This release narrows the public surface, removes the report domain from MLSchema, and makes schema inference more direct for consumers that only need a validated field contract.

Added

  • Added the function-first public API:

  • infer_schema

  • kind
  • BaseField
  • FieldBuilder
  • FieldContext
  • FieldDict
  • FieldKind
  • Added callable extension points through builders=[...].
  • Added strict custom kind registration through kinds=[kind(model=..., infer=...)].
  • Added default builtin inference for:

  • series

  • boolean
  • category
  • date
  • number
  • text
  • Added explicit support for final field patches through overrides={...}.
  • Added stricter inference exceptions for invalid builders, duplicate kinds, unknown kinds, invalid kind models, and missing override targets.
  • Added pandas 3 dtype coverage for str, datetime64[us], and timedelta64[us].
  • Added and expanded Google-style public docstrings for MkDocs API rendering.

Changed

  • Replaced the public MLSchema() orchestration workflow with infer_schema(df).
  • Changed schema generation to return list[FieldDict] directly.
  • Changed builtin kinds to be enabled by default.
  • Changed extension from class-based strategies to callable builders and strict custom kinds.
  • Changed builder resolution to follow this order:

  • user builders

  • custom kind builders
  • builtin builders
  • Changed field serialisation to use JSON mode with exclude_none=True.
  • Changed documentation to describe the field-list contract as the canonical schema output.
  • Changed tests to assert the direct field-list return contract.
  • Split builtin builders and inference exceptions into focused modules.
  • Consolidated the package version source into pyproject.toml.
  • Consolidated Pyright configuration into pyproject.toml.
  • Updated runtime dependencies:

  • pandas >=3.0.3,<4.0.0

  • pydantic >=2.13.4,<3.0.0
  • Updated development tooling:

  • pytest >=9.0.3

  • pytest-cov >=7.1.0
  • ruff >=0.15.15
  • pyright >=1.1.409
  • pre-commit >=4.6.0
  • Updated README, usage docs, schema standard, security notes, and third-party dependency documentation for the new public API.

Removed

  • Removed the public MLSchema facade.
  • Removed public Strategy, Registry, Service, and strategy-class registration APIs.
  • Removed public strategy classes from the supported documentation surface:

  • TextStrategy

  • NumberStrategy
  • CategoryStrategy
  • BooleanStrategy
  • DateStrategy
  • SeriesStrategy
  • Removed class-based registration operations:

  • register()

  • update()
  • unregister()
  • build()
  • Removed report-domain models from the public domain model:

  • BaseReport

  • ClassifierReport
  • RegressorReport
  • ReportTypes
  • Removed generated reports and explanations members from schema output.
  • Removed the top-level payload object used by earlier versions.
  • Removed the duplicate __version__ package constant.
  • Removed pyrightconfig.json.

Fixed

  • Fixed pandas 3 dtype handling in builtin inference and tests.
  • Fixed test expectations that still assumed the old top-level payload shape.
  • Fixed documentation drift around the old registry and strategy APIs.
  • Fixed version and type-checker configuration duplication.

Migration Notes

Replace the old orchestrator workflow:

from mlschema import MLSchema
from mlschema.strategies import TextStrategy

schema_builder = MLSchema()
schema_builder.register(TextStrategy())

schema = schema_builder.build(df)

with the new public API:

from mlschema import infer_schema

schema = infer_schema(df)

Replace custom strategy classes with callable builders when the target kind already exists:

from pandas import Series

from mlschema import FieldContext, infer_schema

def money_builder(series: Series, ctx: FieldContext) -> dict | None:
    if ctx.name != "amount_eur":
        return None

    return {
        "kind": "number",
        "label": "Amount",
        "required": ctx.required,
        "step": 0.01,
        "unit": "EUR",
        "min": 0,
    }

schema = infer_schema(df, builders=[money_builder])

Use kind() only when the schema needs a new field discriminator and a dedicated Pydantic model:

from typing import Literal

from pandas import Series

from mlschema import BaseField, FieldContext, infer_schema, kind

class DurationField(BaseField):
    kind: Literal["duration"] = "duration"
    unit: Literal["seconds"] = "seconds"
    minSeconds: int
    maxSeconds: int

def duration_builder(series: Series, ctx: FieldContext) -> dict | None:
    if ctx.dtype not in {"timedelta64[ns]", "timedelta64[us]"}:
        return None

    return {
        "kind": "duration",
        "label": ctx.name,
        "required": ctx.required,
        "unit": "seconds",
        "minSeconds": int(series.min().total_seconds()),
        "maxSeconds": int(series.max().total_seconds()),
    }

schema = infer_schema(
    df,
    kinds=[
        kind(model=DurationField, infer=duration_builder),
    ],
)

If previous consumers expected a top-level payload such as:

{
  "fields": [],
  "reports": [],
  "explanations": []
}

they must now consume the field list directly:

[
  {
    "kind": "text",
    "label": "name",
    "required": true
  }
]

Applications that still need an envelope should create it at the application boundary.

[0.1.6] - 2026-04-21

Added

  • Added explanations to the top-level schema payload.

Changed

  • Updated README and documentation to describe the fields, reports, and explanations payload shape.
  • Bumped package version to 0.1.6.

[0.1.5] - 2026-04-17

Changed

  • Renamed the top-level schema payload from inputs and outputs to fields and reports.
  • Updated README, schema standard, usage docs, service output, and integration tests for the renamed payload members.
  • Bumped package version to 0.1.5.

[0.1.4] - 2026-04-17

Added

  • Added report-domain models:

  • BaseReport

  • ClassifierReport
  • RegressorReport
  • ReportTypes

Changed

  • Refactored field schemas to use kind as the type discriminator.
  • Renamed field defaults to defaultValue.
  • Expanded BaseField with UI and state metadata.
  • Added inactive-field behaviour to the base field contract.
  • Updated builtin strategies, documentation, and tests for the revised field attribute contract.
  • Bumped package version to 0.1.4.

[0.1.3] - 2026-04-16

Added

  • Added SeriesStrategy for content-based detection of two-axis compound columns.
  • Added SeriesField with field1, field2, minPoints, and maxPoints.
  • Added add_series_sub_field() for custom series subfield registration.
  • Added Strategy.content_probe() for content-based strategy matching.
  • Added Strategy.set_registry() for registry-aware strategies.
  • Added Registry.strategy_for_content() for content-driven lookup.
  • Added the first schema-standard documentation page.

Changed

  • Updated service field generation to prefer content-probe matches before dtype lookup and text fallback.
  • Exported SeriesStrategy and add_series_sub_field() from the public strategies API.
  • Updated README and usage docs with series-column examples and constraints.
  • Bumped package version to 0.1.3.

[0.1.2] - 2025-10-29

Added

  • Added SPDX license headers across source and test files.

Changed

  • Reworked README content and project documentation.
  • Updated documentation after the 0.1.1 release.
  • Bumped package version to 0.1.2.

[0.1.1] - 2025-10-16

Added

  • Added the initial MLSchema facade with register, unregister, update, and build operations.
  • Added registry, service, and strategy application layers.
  • Added Pydantic field schemas for:

  • boolean fields

  • number fields
  • text fields
  • date fields
  • category fields
  • Added builtin strategies for common pandas dtypes.
  • Added typed domain, exception, and utility modules.
  • Added unit and integration tests.
  • Added MkDocs documentation.
  • Added README content, installation guide, usage guide, and issue templates.
  • Added GitHub Actions CI, publishing workflow, pre-commit hooks, and project metadata.

Changed

  • Changed schema generation to return dictionaries instead of JSON strings.
  • Refactored strategy classes around a unified Strategy base class.
  • Updated CI dependency installation to use uv.
  • Updated README and documentation for package usage and release workflow.
  • Bumped package version to 0.1.1.