MLSchema¶
Lightweight SDK for turning pandas dataframes into front-end-ready JSON schemas, designed to integrate seamlessly with the mlform library, yet fully usable on its own.
1. Executive Summary¶
MLSchema is a Python micro‑library that converts pandas dataframes into fully‑validated JSON Schemas. The goal: eliminate hand‑rolled form definitions, accelerate prototype‑to‑production cycles, and enforce data‑contract governance across your analytics stack.
| Metric | Outcome |
|---|---|
| Time‑to‑schema | < 150 ms on 10 k columns / 1 M rows (benchmarked on x86‑64, Python 3.14) |
| Boilerplate reduced | ≈ 90 % fewer lines of bespoke form code |
| Value Driver | Detail |
|---|---|
| Contract enforcement | Pydantic v2 validation guarantees column–to–field fidelity. |
| Extensible | Register or swap strategies at runtime—no consumer refactor. |
| Zero friction | Sensible defaults; no configuration required for common dtypes. |
2. Quick Installation¶
For green‑field projects or CI pipelines, a single command sets up MLSchema and its dependency graph using uv:
uv add mlschema
For other package managers, refer to the dedicated Installation guide.
3. 90‑Second Onboarding¶
import pandas as pd
from mlschema import MLSchema
from mlschema.strategies import TextStrategy
# 1️⃣ Source your data
df = pd.read_csv("data.csv")
# 2️⃣ Spin up the orchestrator and register baseline strategies
ms = MLSchema()
ms.register(TextStrategy())
# 3️⃣ Produces a JSON schema
schema = ms.build(df)
Pair the resulting JSON with mlform to render an HTML form instantly.
4. Architectural Building Blocks¶
| Component | Role | Extensibility Point |
|---|---|---|
mlschema.MLSchema |
Strategy registry, validation pipeline, JSON emitter | register(), update(), unregister() |
| Field Strategies | Map pandas dtypes => form controls | Implement Strategy subclasses |
BaseField (Pydantic) |
Canonical schema blueprint | Custom Pydantic models inherit from it |
Why a Strategy Pattern?¶
- Single‑responsibility: Each strategy owns one field type.
- Hot‑swap: Swap implementations without touching consumer code.
- Forward compatibility: Introduce domain‑specific controls as geospatial, IoT or custom widgets as needed.
5. Feature Highlights¶
- Automatic schema inference – text, numeric, categorical, boolean and date handled out of the box.
- Pydantic v2 validators – schema is fully typed and runtime-safe.
- No external services – all processing is in-process; suitable for air-gapped environments.
- Typed returns – JSON schema is delivered as a Pydantic model for IDE autocompletion.
6. Further Reading¶
Respecting proven patterns, built for tomorrow’s stack.