Coframe Analytics Data Platform — Design Document¶
Status: v2.0 (draft) Author: reeeneeee Compiled: May 2026 Spec authority: Coframe Core Manual (the Manual). Where this document and the Manual disagree, the Manual wins.
Changes from v1.0:
This document is a substantive revision of v1.0 reflecting a rethink of how AC authoring is organized. The key architectural decisions that flip:
- AC Authoring is unified, not per-backend. v1.0 had
coframe_polars.authorandcoframe_duckdb.authoras independent toolchains living inside each backend. v2.0 introduces a singlecoframe-authormodule that contains all authoring logic; backends contribute only a thin data-access API (extending the Backend protocol with operations likesample_rows,column_nunique,pair_distinct,attest_dna_edge). schema.initis retired as an input format. v1.0 framedschema.initas the AC declaration the human author writes and the framework consumes. v2.0 treats the workbench session as the source of truth; the AC's serialized form is output of the workbench, not input to the framework. Several serializations exist (a YAML AC declaration, a provenance bundle, a DQ deliverable, the quasi-metadata bundle) and all are workbench outputs consumed by the runtime.- AC Authoring is interactive, not linear. v1.0's authoring CLI walked the engineer through proposals one at a time. v2.0's authoring is a UI-driven workbench: start from a blank canvas, ask the backend to enumerate tables, pick what's of interest, iterate at the user's pace. Verifications, quasi-metadata extraction, FD-DAG visualization, integrity-catalog inspection — each is an operation the user invokes when relevant, not a phase the workflow passes through.
- DQ is no longer a holistic phase. In v1.0 the DQ process was a coherent component executed in three phases. In v2.0 DQ becomes a collection of verification operations, each summonable from the workbench when the user wants it. The integrity catalog stays as the canonical enumeration of what can be verified; the workbench's UI surfaces which conditions have been verified and at what warrant level.
- A new
coframe-sqlitebackend is added as the file-based reference implementation. SQLite is stdlib (no install cost), embedded (no server), supports real SQL with DDL and types, and lets the platform's downstream pieces — especially the workbench UI — be built and exercised without waiting for the heavier polars/duckdb backends. - Web UI is a v1.0 deliverable. The original "ship the engine; UI later" stance is replaced with a v1.0 commitment to a polished interactive UI for the workbench.
The v1.0 document remains valuable as a record of the prior architectural state; this v2.0 supersedes it on every section listed in the change-summary above.
0. Scope and intent¶
This document specifies the engineering design of the open-source v1.0 release of the Coframe basic platform: how the components are laid out as Python packages, how they depend on each other, what their public surfaces are, and in what order they get built.
Coframe is the Analytic Layer for analytical data — a peer category to the Semantic Layer, named for the analytical-correctness work it does (FD discovery, integrity attestation, constructive query resolution, A/AA/AAA verification). The v2.1 supplement §1 develops the category claim in full; this document is the engineering shape that instantiates it.
It is not a re-specification of the framework. The Manual specifies what the framework does. This document specifies how that gets organized into Python packages, modules, and tests.
Read alongside the v2.1 supplement.
drafts/coframe_platform_design_v2_1_supplement.mdextends this document with multi-AC at the installation level, the L1/L2/L3 metadata layering, AC-level filter as the fourth orthogonal customization control, and the frozen-scope phase. The 2026-05-23 amendments (supplement §10) additionally introduce AC Surfaces, the three-UI / four-package frontend restructure, and a vertical-slice-first re-phasing. v2.0 + v2.1 together are the current design.
0.1 v1.0 component set¶
Six packages, one repository:
coframe-core— the grammar-layer engine. Includes Frame-QL parsing and semantic analysis (coframe.ql), AC name_map and operator/mapper customization, identifier translation during resolution, the natural-language query layer (coframe.dialogue), and the integrity machinery (data-free I0/I1/I2/I7/I8/I9 + the integrity catalog).coframe-connect— the Backend protocol package. Defines two protocol surfaces:- Execution (Frame-QL AST → query result), unchanged from v1.0.
- Authoring data-API (operations the workbench calls into: table enumeration, column profiling, FD-candidate testing, lineage extraction support, per-DNA-edge attestation). Plus source-binding types and entry-point conventions for backend discovery.
coframe-author— (new in v2.0) the unified AC Authoring workbench. Backend-independent. Houses the interactive UI, the workbench session-state model, all authoring operations (quasi-metadata, FD-discovery, lineage extraction from processing code, integrity-catalog management), and the serialization layer that emits the AC's various output formats (declaration YAML, provenance bundle, DQ deliverable). Talks to backends via the data-API incoframe-connect.coframe-sqlite— (new in v2.0) SQLite-backed execution and data-API. Embedded, stdlib-only (sqlite3), real SQL semantics. The platform's reference implementation for small ACs, demos, and CI; sufficient for the demo workflow without waiting on polars/duckdb.coframe-polars— execution backend using Polars. No.authorsubmodule. Implements the execution Backend protocol + the data-API surface.coframe-duckdb— execution backend using DuckDB. Same shape ascoframe-polars: execution + data-API.coframe-mcp— Model Context Protocol server. Backend-blind: depends only oncoframe-core(and oncoframe-authorfor the workbench-driven authoring flow, optionally). Loads backends by name via entry-point discovery.
Note on counting: §0.1 lists seven packages — six core components plus
coframe-mcp— which was the v2.0 set. The v2.1 supplement §10.3 amendment adds three more Python packages (coframe-runtime,coframe-management,coframe-frontend— see the supplement for the three-UI / four-package frontend restructure). The current v2.0 + v2.1 set is eleven packages total: seven listed below, plus the four added in §10.3 of the supplement (coframe-runtime,coframe-management,coframe-frontend, and the supplement's renumbering movescoframe-author's scope into a narrower workbench-backend role withcoframe-runtimecarrying the AC Surfaces serving). The dependency-graph diagram in §1.1 below shows the seven-package picture; for the eleven-package picture see CLAUDE.md or the supplement.
0.2 Decisions locked in this document¶
Decisions carried over from v1.0:
- Language: Python 3.11+.
- Repo: monorepo,
uvworkspace. - AC catalog format: YAML on disk; Pydantic models in memory. (The catalog is still YAML; what changed is that the workbench produces it rather than the human author writing it.)
- Public API: fluent
ac.query("...")style. AC must be backend-bound to be queryable. - Backend execution contract: resolved Frame-QL AST with physical column names (translated from AC
name_map) and logical system operator/mapper names (after AC customizations are expanded). - Backend binding: one execution backend per AC. Multi-source = multiple ACs.
- AC name_map: single global mapping per AC. Integrity-checked.
- Local operator/mapper customization: declarative-only in v1. No Python in ACs. Code-bearing extensions deferred to Coframe Pro.
- NLQ (
coframe.dialogue): logical-only, no data access, vendor-independent LLM client. Lives incoframe-core. - MCP layering:
coframe-mcpdepends only oncoframe-core. Backends discovered via entry points declared incoframe-connect. - Per-DNA-edge value attestation: enabled by default per Manual §7.6.8. Now exposed as a workbench operation (the user invokes it; not run as part of a holistic "Phase 3" pipeline).
- License: Apache 2.0 for all Coframe Core packages.
- SCA, generalized functional grammar layer, recursive hierarchies: out of scope for v1.0. Deferred to Coframe Pro per Manual §1.5.
Decisions that flip from v1.0:
-
AC Authoring lives in a unified
coframe-authormodule, not per-backend. v1.0 had two independent toolchains (coframe_polars.author,coframe_duckdb.author) with the rationale "authoring is fundamentally backend-specific." That rationale doesn't hold up — the hard parts of authoring (FD discovery, quasi-metadata extraction, lineage inference, integrity validation) are about the shape of the data, not the backend. The backend is a data source. v2.0 puts the authoring reasoning in one place and asks each backend to implement a thin data-access API. -
The Backend protocol grows a data-access API for authoring. v1.0's Backend protocol covered only execution (
execute(query) → result). v2.0's protocol adds methods the unified authoring module calls into:enumerate_tables,read_ddl,sample_rows,column_nunique,pair_distinct,column_profile,test_fd_edge,read_processing_code,attest_dna_edge. See §3 for the full surface. -
schema.initis retired as a user-authored input format. The framework still consumes a YAML AC declaration (call it the "AC catalog" to disambiguate); but that file is output of the workbench, not input the human writes by hand. There are also sibling outputs the workbench produces (provenance bundle, DQ deliverable, quasi-metadata bundle). All of these together constitute the "AC artifact" the runtime consumes. See §X for the serialization model. -
Authoring workflow is interactive, not linear. v1.0's
<backend>-authorCLI walked the engineer through a sequence (discover → profile → propose → review → emit). v2.0's workbench has no fixed sequence: the user starts with a blank canvas, asks the workbench to enumerate tables from the bound backend, picks tables of interest, asks the workbench to fetch DDL or processing code, requests verifications when relevant, inspects the FD-DAG and integrity-catalog status at any point, iterates until they declare done. Every operation is on-demand. See §1.5 and §X. -
DQ is decomposed. v1.0 framed Data Quality as a three-phase pipeline (declaration integrity → structural data verification → metric-value attestation) executed as a coherent component. v2.0 keeps the integrity catalog (the canonical enumeration of conditions verified) but distributes the execution of those checks across workbench operations. The user invokes each verification when they want it; the workbench tracks which conditions are verified and at what warrant level. The verification level (A / AA / AAA) is still computed from the integrity-catalog results — it's a derived property the workbench surfaces, not a phase the workflow has to reach.
-
Per-backend authoring CLIs are gone. No
coframe-polars-author, nocoframe-duckdb-author. There is one workbench, surfaced through a web UI (the primary surface) and a programmatic API (for scripting / CI). Backends are switched by binding to a different backend in the workbench. -
A web UI is a v1.0 deliverable. v1.0 deferred the UI ("ship the engine; UI later"). v2.0 commits to a polished interactive UI for the workbench in the v1.0 release. The UI is intrinsic to the workflow — interactive exploration of data is what makes the workbench fit how authoring actually happens.
-
A new
coframe-sqlitebackend. Stdlib-only (sqlite3), embedded, real SQL semantics. The reference backend for small ACs, demos, CI, and unblocking workbench-UI development before the heavier polars/duckdb backends land.
0.3 Naming and identifier conventions¶
(Unchanged from v1.0 §0.3. Three registers: product names, package names, import paths. New entries for v2.0:)
| Register | Form | Example use |
|---|---|---|
| Product | Coframe Workbench |
"Open the Coframe Workbench, bind it to SQLite." |
| Package | coframe-author |
pip install coframe-author |
| Module | coframe.author |
from coframe.author import Workbench |
| Product | Coframe SQLite |
(rarely used in prose; refer to "the SQLite backend") |
| Package | coframe-sqlite |
pip install coframe-sqlite |
| Module | coframe.sqlite |
from coframe.sqlite import SQLiteBackend |
1. Architecture overview¶
1.1 Component dependency graph¶
┌──────────────────┐
│ coframe-mcp │
└────────┬─────────┘
│ (depends on core; optionally on
│ coframe-author for workbench-driven
│ AC creation surfaces)
▼
┌────────────────────────────────┐
│ coframe-core │
│ + coframe.ql │
│ + coframe.dialogue │
│ + integrity catalog (I0–I10) │
│ + quasi_metadata types │
│ (logical-only; no data) │
└─────────────┬──────────────────┘
▲
┌─────────────┴───────────────┐
│ coframe-connect │
│ Backend protocol — TWO │
│ surfaces: │
│ 1. Execution │
│ 2. Authoring data-API │
│ + entry-point discovery │
└─────────────┬───────────────┘
▲
┌──────────────────────┼──────────────────────┐
│ │ │
┌─────────┴─────────┐ ┌─────────┴─────────┐ ┌─────────┴─────────┐
│ coframe-sqlite │ │ coframe-polars │ │ coframe-duckdb │
│ (execution + │ │ (execution + │ │ (execution + │
│ data-API) │ │ data-API) │ │ data-API) │
│ embedded sqlite │ │ pandas/polars │ │ duckdb engine │
└───────────────────┘ └───────────────────┘ └───────────────────┘
▲ ▲ ▲
│ │ │
└──────────────────────┼──────────────────────┘
│ (workbench binds to one
│ backend; talks via data-API)
┌─────────────┴───────────────┐
│ coframe-author │
│ the unified workbench │
│ + interactive web UI │
│ + session-state model │
│ + AC serialization layer │
└─────────────────────────────┘
Compared to v1.0's diagram (§1.1 of v1.0), the major shifts are:
- Backends are leaf nodes only — they have no authoring submodule. Each one is execution + data-API.
coframe-connectcarries two protocol surfaces. Execution unchanged; authoring data-API is new.coframe-authoris its own package, sitting parallel to the backends rather than embedded inside them. It binds to a chosen backend at the workbench-session level.coframe-mcp's dependency oncoframe-authoris optional and limited: MCP exposes the runtime (Frame-QL execution, NL→Frame-QL, validation) without requiring the workbench; if a deployment wants to surface workbench-driven AC creation via MCP (likely a Coframe Pro feature), it depends oncoframe-author.
v2.1 amendment: the eleven-package picture¶
The v2.1 supplement §10.3 amendment splits runtime-serving concerns out of coframe-author and adds a three-UI / four-package frontend architecture. The dependency graph above remains accurate for the seven-package v2.0 subset; the additional packages slot in like this:
┌──────────────────────────────────────────────────────────┐
│ Three UI apps (TypeScript / React), one frontend pkg: │
│ │
│ Workbench UI ─────────┐ │
│ AC Management UI ─────┼─→ coframe-frontend │
│ Query UI ─────────────┘ │
└─────────────────┬────────────────────────────────────────┘
│ HTTP / WS
▼
┌──────────────────────┬──────────────────────┬──────────────────┐
│ coframe-author │ coframe-management │ coframe-runtime │
│ (workbench backend) │ (installation + │ (AC Surfaces: │
│ │ AC lifecycle) │ Frame-QL, │
│ │ │ NL, MCP, HTTP) │
└─────────────┬────────┴──────────┬───────────┴────────┬─────────┘
│ │ │
└───────────────────┴────────────────────┘
│
▼
┌──────────────────────────────┐
│ coframe-core / -connect / │
│ -sqlite / -polars / -duckdb │
│ (the seven-package subset │
│ shown in the diagram above)│
└──────────────────────────────┘
+ coframe-mcp (long-running prototype) — superseded by
coframe-runtime's MCP host per §10.3.
The v2.0 diagram above shows the core + backends layer; this v2.1 addendum shows the serving + management + UI layer that sits on top. Together they're the eleven packages (per the §0.1 counting note). The CLAUDE.md disposition table is the canonical guide to which package owns what.
1.2 Conceptual roles¶
-
coframe-coreowns the AC's logical surface: catalog loading, validating data-free integrity (I0/I1/I2/I7/I8/I9), resolving Frame-QL queries to ASTs, producing resolved ASTs for backend execution. The integrity catalog (the canonical enumeration of every integrity condition the platform verifies) lives here, along with the per-condition warrant model (author-asserted / code-affirmed / data-attested / catalog).coframe.dialogueprovides natural-language → Frame-QL translation as a separate, logical-only LLM surface — entirely independent of any backend. -
coframe-connectdefines the Backend protocol on its two surfaces: execution (for runtime queries) and the authoring data-API (for the workbench). It also covers source-binding types and entry-point conventions. It is the boundary between core/author logic and backend specifics. -
coframe-authoris the workbench. It owns: - The session-state model — every AC under construction is a session whose state evolves as the user interacts.
- The set of authoring operations the UI exposes (table enumeration, schema inspection, column picking, quasi-metadata extraction, FD-discovery, lineage extraction, integrity-condition verification, AC declaration assembly, provenance recording).
- The web UI (the primary user surface) and the programmatic API (for scripting / CI).
- The serialization layer that emits the workbench's various output formats: the AC catalog YAML, the provenance bundle, the DQ deliverable, the quasi-metadata snapshot.
-
The integration with
coframe.dialoguefor optional LLM assistance during authoring (categorization, naming, propose-then-confirm patterns — always advisory, never automatic). -
coframe-sqlite/coframe-polars/coframe-duckdbare execution backends. Each ships: - An execution
Backendimplementation walking the resolved AST. - A data-API implementation supplying the operations the workbench calls into for authoring.
- Per-backend translation tables (system operator/mapper → backend-native).
-
Entry-point registration so the platform discovers the backend by name.
-
coframe-mcpwraps a bound AC behind an MCP server. Backend-blind. Exposes Frame-QL execution,nl_query(viacoframe.dialogue), andvalidate_ac(which surfaces the integrity catalog's status + computed verification level). MCP is one of the AC's AC Surfaces (see immediately below); per the v2.1 supplement §10 amendments, this long-running prototype is superseded by the MCP host that ships insidecoframe-runtime.
The access protocols an AC exposes — Frame-QL, NL Query, MCP, HTTP API, the Workbench, and the Validation surface — are collectively the AC's AC Surfaces. The v2.1 supplement §10.2 names this umbrella explicitly: each surface is an independently-documented conformance contract, and a deployment can offer some surfaces and not others. The components above (coframe-core, coframe-author, coframe-mcp, plus the v2.1-introduced coframe-runtime) are the packages that realize those surfaces.
1.3 The two AI roles, separated¶
The platform still has two distinct AI/LLM surfaces; v2.0 keeps them architecturally distinct.
| Role | Where it lives | Sees physical names? | Sees data? |
|---|---|---|---|
| NL → Frame-QL (query elicitation) | coframe.dialogue (in coframe-core) |
No | No |
| Authoring assistance (naming, categorization, propose-then-confirm) | coframe-author (the workbench) |
Yes | Yes |
-
coframe.dialogueis the user-facing LLM surface at runtime. It speaks logical names (the AC's user-facing vocabulary), has no data access, and translates utterances to Frame-QL. -
The authoring AI surface lives inside
coframe-author. It sees physical names (because authoring fundamentally involves looking at the data) and has data access via the backend's data-API. It is invoked only by the workbench user, on-demand, for specific advisory tasks (e.g., "suggest a logical name for this physical column," "propose a categorical hierarchy"). The user always confirms — the LLM never commits a declaration to the AC by itself. This is the propose-then-confirm discipline.
In v1.0 the authoring AI lived inside each backend's .author submodule. In v2.0 it lives inside the unified coframe-author module, but the separation between the two AI roles is preserved.
1.4 What runs where, at runtime — query path¶
- AC is constructed (by the workbench, see §1.5) and bound to a backend at startup, carrying its
name_mapand any local operator/mapper customizations. The AC's frozenac_filter(v2.1 §4) is loaded as part of construction. - User submits one of:
- A Frame-QL query string — direct path.
- A natural-language utterance —
coframe.dialoguetranslates to Frame-QL first; the rest of the path is identical. coframe.qlparses to a raw AST.coframe.resolutionresolves: schema selection, identity matching, MTI, dubious-query detection, WITH-block desugaring, ratio expansion, customization expansion, column-name translation. The AC-level filter is AND-fused into the user'sWHEREclause at this step (per v2.1 §4.3) — the filter is invisible to the consumer but always applied. An AC withac_filter: [](full scope) is a no-op fuse.- Output: a resolved AST (physical column names, logical system operator/mapper names, AC-filter conjoined into the predicate).
- The bound backend walks the AST and produces a result via its execution surface.
- Result returned in the requested format.
This path is mostly unchanged from v1.0; the v2.1 addition is the AC-filter fuse at step 4. Consumers see only the filtered slice of L1 the AC's scope defines.
1.5 What runs where, at AC-authoring time — workbench session¶
The authoring path is completely rewritten from v1.0. The new shape:
- User opens the workbench (web UI) and starts a session. The session can be:
- A new blank canvas (no AC yet).
- A continuation of a prior session (state persisted in the user's
.coframe/workspace). -
A clone of an existing AC (workbench loads the AC's catalog YAML as initial session state).
-
User binds the session to a backend (e.g.,
coframe-sqlitepointed at a.dbfile, orcoframe-polarspointed at a directory of Parquet files). This wires the data-API into the session. -
The user explores, interactively. No fixed sequence; some examples of what they can do, in any order:
- Ask the workbench to enumerate tables in the bound backend.
- Pick a table, inspect its DDL (the workbench calls
Backend.read_ddl). - Pick columns of interest, ask for quasi-metadata (the workbench calls
Backend.column_profileand renders the result). - Ask the workbench to fetch processing-code files (SQL scripts, Python ETL) the backend knows about; the workbench parses and shows the inferred lineage / grain / operator-type classifications.
- Request an FD-DAG view over a set of columns (the workbench calls
Backend.test_fd_edgefor the candidate pairs; renders the resulting DAG). - Declare or revise AC commitments: dimension families, metric families, schema declarations, operator customizations, name maps.
- Invoke specific integrity conditions from the catalog (e.g., "verify FD-edge attestation for
store_id → city"; "run per-DNA-edge attestation for therevenuemetric family"). Each verification runs as an isolated workbench operation and updates the session's per-condition warrant status. - View the AC's current verification level (A / AA / AAA) as a derived property of the catalog's per-condition status.
- Ask the LLM (via
coframe.author.assistance) for advisory naming or categorization suggestions. The workbench presents them as proposals; the user confirms or rejects. - Serialize the current session state to disk as a workbench checkpoint (this is mostly automatic, on every meaningful state change, into the user's
.coframe/workspace). -
Export the AC: emit the catalog YAML + provenance bundle + DQ deliverable + quasi-metadata snapshot. This is what downstream Coframe components (the runtime, MCP) consume.
-
The session ends when the user decides. "Done" isn't a workflow terminus; it's the user judging that the AC is in a state worth publishing. The export is what makes the AC available to the rest of the platform.
What's important about this shape:
- The user is in charge of order. The workbench is a tool that surfaces operations; it doesn't enforce a sequence.
- Every meaningful operation produces an artifact the user can inspect, save, or revise: a quasi-metadata report, an FD-DAG figure, an integrity-verification result, a proposed declaration. The workbench's state is the union of these.
- Verifications are user-invoked operations, not phases. What was "DQ Phase 1/2/3" in v1.0 is now a set of catalog conditions the user can verify individually. The verification level rolls up automatically as conditions get verified.
- The AC declaration is built up incrementally rather than written by hand. The user makes structural commitments through UI actions (e.g., "designate this column as the grain-role"; "add this FD-edge to the
geographydimension family"); the workbench accumulates them into the AC declaration.
Sections §2A (coframe-author) and §11 (build phasing) say more about the workbench's structure and how it gets built.
2. coframe-core¶
(Mostly unchanged from v1.0 §2; this section retains its v1.0 responsibilities. The notable v2.0 changes:)
2.1 Responsibilities (v2.0 deltas)¶
In addition to v1.0's responsibilities, coframe-core now owns:
- The integrity catalog — the canonical enumeration of every integrity condition the platform verifies, with per-condition metadata (category, DQ phase association, layer, possible warrants, blocking semantics, cost class, verifiable-by-module reference). The workbench, the runtime, and MCP all read this catalog as the source-of-truth list of what gets checked.
- The quasi-metadata column types — the per-column profile shape (cardinality, kind hint, type-specific stats) that the workbench's data-API consumer expects. v1.0 had quasi-metadata too; v2.0 elevates it because the workbench leans on it as a first-class artifact.
- The verification-level computation — pure function over the integrity-catalog's per-condition warrant status that produces the AC's A / AA / AAA level. Unchanged in spirit from v1.0; relocated to a clean computational core that the workbench and the runtime both consume.
2.2 What's not in coframe-core (v2.0 deltas)¶
- Anything backend-specific. As before.
- The Frame-QL execution path. As before.
- The workbench session model, UI, or authoring operations. These now live in
coframe-author.coframe-corestill owns the types the workbench uses (column specs, integrity-catalog entries, quasi-metadata profiles, the AC catalog shape) — but not the orchestration of authoring work.
3. coframe-connect — Backend protocol (with data-API)¶
3.1 Responsibilities¶
coframe-connect defines the protocol every backend implements. In v2.0 the protocol has two surfaces:
- Execution surface — for the runtime query path (unchanged in spirit from v1.0).
- Authoring data-API surface — for the workbench (NEW in v2.0).
Plus source-binding types and entry-point conventions for backend discovery.
The package is small. It carries no authoring logic — only the protocol the workbench calls into.
3.2 The Backend protocol — execution surface¶
(Unchanged from v1.0 §3.3. Walking a resolved Frame-QL AST, returning a result, supporting per-DNA-edge attestation via attest_dna_edge. See v1.0 §3.3 for the full method signatures; v2.0 inherits them.)
3.3 The Backend protocol — authoring data-API surface (NEW in v2.0)¶
The data-API is the set of operations the workbench calls into. It is small, well-defined, and uniform across backends:
class Backend(Protocol):
# ── execution surface (carried over from v1.0) ──
def execute(self, ast: ResolvedAST) -> Result: ...
def attest_dna_edge(self, edge: DNAEdge,
config: AttestationConfig) -> AttestationResult: ...
# ── authoring data-API surface (NEW in v2.0) ──
def enumerate_tables(self) -> list[TableMeta]:
"""List all tables/views available in the bound source."""
def read_ddl(self, table: str) -> str:
"""Return the table's CREATE statement (or equivalent), as text."""
def read_processing_code(self) -> list[ProcessingFile]:
"""If the backend knows about producing-code (e.g., a directory of
SQL/Python jobs), return the files it can surface to the workbench."""
def sample_rows(self, table: str, n: int = 10,
seed: Optional[int] = None) -> RowSample:
"""Pull a sample of rows from the table. Used in the UI for
inspection; not sufficient for statistics."""
def column_profile(self, table: str, column: str) -> ColumnProfile:
"""Return quasi-metadata for one column: dtype, n_rows, n_non_null,
nunique, cardinality_class, type-specific stats. Caches at the
workbench layer."""
def pair_distinct(self, table: str, col_a: str, col_b: str) -> int:
"""Count of distinct (col_a, col_b) pairs. Used for FD candidate
testing: X→Y iff pair_distinct(X,Y) == nunique(X)."""
def test_fd_edge(self, table: str, antecedent: str,
consequent: str,
threshold: float = 1.0) -> FDTestResult:
"""Test whether antecedent → consequent holds in the data at the
given compliance threshold (strict 1.0 by default)."""
def attest_dna_edge_data(self, edge: DNAEdge) -> AttestationResult:
"""Per-DNA-edge attestation. Same as the execution-surface method;
listed here too so the workbench can run it on-demand without going
through query resolution."""
The interface is designed so backends with very different internals (SQLite, Polars, DuckDB) implement the same surface. The workbench is then truly backend-independent.
3.4 Why a separate package¶
coframe-connect exists separately so:
- The
Backendprotocol can be a single, narrow dependency for backend authors. coframe-mcpandcoframe-authordiscover backends without depending on any specific one.- Conformance tests for the protocol can live with the protocol definition (a
coframe-connect-conformancetest suite that any backend must pass).
3.5 Source binding¶
(Unchanged from v1.0 §3.4. Each backend defines a SourceBinding type describing how it connects to its data source — file path for SQLite, directory of Parquet for Polars, connection string for DuckDB, etc.)
3.6 Backend discovery via entry points¶
(Unchanged from v1.0 §3.6. Backends register under the coframe.backends entry-point group; the workbench and the runtime discover them by name.)
4. coframe-sqlite — file-based reference backend (NEW in v2.0)¶
4.1 Responsibilities¶
coframe-sqlite is the platform's smallest reference backend. It exists to:
- Unblock workbench-UI development before the heavier polars/duckdb backends land.
- Serve as the demo backend (the retail demo's CSVs load trivially into SQLite).
- Provide a stdlib-only execution surface (no external deps via
sqlite3). - Be useful in production for small ACs over file-based data — a legitimate, not just a placeholder, use case.
4.2 Source binding¶
SQLiteBinding {
db_path: Path # path to .db file
processing_dir: Optional[Path] # optional dir of SQL/Python jobs the
# workbench can surface as
# processing_code
ddl_dir: Optional[Path] # optional dir of *.sql files (CREATE
# statements). If absent, DDL is
# pulled from sqlite_master.
}
4.3 Execution surface¶
Implements Backend.execute by walking the resolved Frame-QL AST into SQLite SQL. Operator translation table is small (SQLite supports the common reducers + simple mappers natively); a few operators have to be polyfilled in Python for cases SQLite lacks (e.g., approximate-distinct sketches like HLL).
Per-DNA-edge attestation via attest_dna_edge is straightforward: compute the rolled-up predecessor query, compare row-by-row to the successor, return deltas. SQLite's EXCEPT operator makes the row-set comparison cheap.
4.4 Authoring data-API surface¶
All operations from §3.3 are implemented:
enumerate_tables:SELECT name FROM sqlite_master WHERE type IN ('table', 'view').read_ddl:SELECT sql FROM sqlite_master WHERE name = ?(or read from the optionalddl_dir).read_processing_code: scan the optionalprocessing_dir.sample_rows:SELECT * FROM <table> ORDER BY RANDOM() LIMIT ?(orTABLESAMPLEpatterns for larger tables).column_profile:SELECT COUNT(*), COUNT(<col>), COUNT(DISTINCT <col>), MIN(<col>), MAX(<col>), AVG(<col>), ...plus a histogram for numeric columns.pair_distinct:SELECT COUNT(*) FROM (SELECT DISTINCT a, b FROM <table>).test_fd_edge: pair-distinct + nunique comparison; strict threshold yieldscompliant/violating_rows.
4.5 Loading data into SQLite¶
A helper script (coframe.sqlite.loaders) provides convenience functions to populate a SQLite DB from common file formats:
from coframe.sqlite.loaders import load_csv_dir, load_parquet_dir
load_csv_dir("drafts/data/retail_demo/", out_db="retail_demo.db")
# scans for *.csv files, creates one table per file, infers types,
# carries DDL comments from any sibling .sql file with the same stem
This is the path from "we have CSVs" to "we have a SQLite-backed AC the workbench can author against." Useful for the demo and for small-scale production use.
4.6 Entry-point registration¶
Registers under coframe.backends as sqlite. Workbench / runtime invoke via wb.bind_backend("sqlite", source=...).
4.7 Notable limits¶
- Single-writer SQLite: concurrent workbench sessions against the same
.dbfile will serialize. Acceptable for v1.0; multi-session workbench support is Pro. - Large datasets (>~10GB): SQLite handles them, but query performance degrades vs. DuckDB or Polars. Authoring works fine; production execution at scale benefits from migrating to a heavier backend.
- No native sketches (HLL, t-digest): polyfilled in Python; less efficient than DuckDB / Polars where they're native.
5. coframe-polars (execution + data-API, no .author submodule)¶
(Materially smaller than v1.0 §4. The execution surface is unchanged from v1.0; what's removed is the coframe_polars.author submodule, since authoring now lives in coframe-author. What's added is the data-API implementation.)
5.1 Responsibilities¶
- Execution
Backendimplementation using Polars LazyFrame (unchanged from v1.0). - Data-API implementation: pulls samples and statistics from LazyFrames, including
column_profile,pair_distinct,test_fd_edge, andattest_dna_edge_data. - Per-DNA-edge attestation with full-attestation up to RAM-bounded sizes; sampling fallback per config.
- Entry-point registration as
polars.
5.2 Internal module layout (v2.0)¶
coframe-polars/
├── coframe/
│ └── polars/
│ ├── backend.py # execution Backend (unchanged from v1.0)
│ ├── translation.py # operator/mapper translation tables
│ ├── walker.py # AST → LazyFrame walker
│ ├── data_api.py # NEW: data-API surface implementations
│ ├── attestation.py # attest_dna_edge implementation
│ └── source_binding.py
└── pyproject.toml
The .author directory is gone.
6. coframe-duckdb (execution + data-API, no .author submodule)¶
(Same shape as §5: execution Backend + data-API implementation; no .author submodule.)
6.1 Responsibilities (v2.0 deltas)¶
- Execution
Backend(unchanged from v1.0). - Data-API implementation: same surface as the other backends, using DuckDB's native SQL.
- Per-DNA-edge attestation with spill-to-disk for full attestation on tables exceeding RAM, and DuckDB-native stratified sampling for the size-budget fallback.
- Persistent-connection mode caches attested-edge status in a system table; differential re-attestation runs only against changed schemas.
- Entry-point registration as
duckdb.
7. coframe-author — the workbench (NEW in v2.0)¶
7.1 Responsibilities¶
coframe-author is the unified AC Authoring workbench. It owns:
- Session-state model. Every AC under construction is a workbench session whose state evolves as the user interacts. Sessions persist between user visits and across the user's local working directory.
- The catalog of authoring operations. Each operation is a small, well-defined action the user can invoke from the UI (enumerate-tables, read-DDL, profile-column, test-FD-edge, declare-dimension-family, run-per-DNA-edge-attestation, etc.). The catalog is enumerable: the UI lists what operations are available given the current session state.
- The web UI. The primary user-facing surface. Backend-agnostic (UI talks to a Python-side workbench server which talks to the bound backend's data-API).
- The programmatic API. A
Workbenchclass plus operation functions for scripting and CI use cases. Both surfaces — UI and API — go through the same operation dispatch layer. - The AC serialization layer. Emits the various output formats the rest of the platform consumes: the AC catalog YAML, the provenance bundle (per-condition warrant trail), the DQ deliverable, the quasi-metadata bundle, optional artifact exports (FD-DAG diagrams, dependency reports).
- The integration with
coframe.dialoguefor advisory LLM assistance during authoring. Always opt-in, always propose-then-confirm. The workbench surfaces LLM suggestions as proposals in the UI; the user confirms or rejects each. - Workspace management. The user's
.coframe/directory, where session state and exported artifacts live; load / save / fork operations on workbench sessions.
7.2 Session-state model¶
A workbench session carries the following state, all of which is persistable and inspectable:
WorkbenchSession {
session_id: str # opaque
workspace_path: Path # .coframe/ on the user's filesystem
backend_binding: Optional[BackendBinding] # which backend, what source
ac_state: ACState {
catalog_name: str
version: str
dimension_families: list[DimensionFamily] # declared so far
metric_families: list[MetricFamily]
schemas: list[SchemaDeclaration]
name_map: dict[str, str]
customizations: list[OperatorCustomization]
attestation_config: AttestationConfig
}
artifacts: Artifacts {
quasi_metadata: dict[table_name, QuasiMetadata]
lineage_reports: list[LineageReport]
fd_dag: FDDAG # accumulating
integrity_status: dict[ConditionId, WarrantStatus]
}
provenance: ProvenanceLog {
events: list[ProvenanceEvent]
# each event captures: timestamp, operation_id, inputs, outputs,
# user_decisions, llm_suggestions_offered, llm_suggestions_accepted
}
open_questions: list[OpenQuestion] # elicitation queue
}
The session state IS the AC artifact under construction. Every workbench operation reads and writes parts of this state; the serialization layer projects it into the various output formats when the user exports.
7.3 The operation catalog¶
Every meaningful action the user can take is a workbench operation. The catalog is enumerable; the UI inspects the catalog at runtime to render the available operations given the current session state. Operations fall into these categories:
| Category | Examples | What the operation does |
|---|---|---|
| Discovery | enumerate_tables, read_ddl, read_processing_code, sample_rows |
Pulls information from the bound backend |
| Profiling | column_profile, column_nunique, pair_distinct, column_distribution |
Computes per-column / per-pair statistics |
| Structural inference | discover_fd_edges, extract_lineage_from_code, propose_dimension_family, infer_grain |
Derives structural commitments from data + code |
| Declaration | declare_dimension_family, add_fd_edge, declare_metric_family, set_name_map, add_customization |
Updates the AC's structural commitments |
| Verification | verify_grain_uniqueness, verify_fd_edge, verify_scope, attest_dna_edge, verify_cross_schema_mapping |
Runs a specific integrity condition; updates per-condition warrant status |
| Visualization | render_fd_dag, render_lineage_graph, render_integrity_status, render_verification_level |
Produces a view of the current session state |
| LLM assistance | propose_logical_name, propose_categorization, propose_operator |
Calls into the authoring AI for advisory suggestions |
| Serialization | export_ac_catalog, export_provenance, export_dq_deliverable, checkpoint_session |
Emits artifacts |
Operations are idempotent in intent — running an operation multiple times produces the same result if inputs haven't changed (with quasi-metadata and verification results cached in Artifacts). The user can re-run any operation to refresh.
7.4 UI surface¶
The web UI is the primary user surface. High-level structure (subject to design iteration; this is a starting shape):
- Left rail: session navigator. Tree view of the current AC structure (dimension families, metric families, schemas), the bound backend's available tables, and recent operations. The user navigates by clicking entries.
- Main canvas: contextual workspace. When a column is selected: shows the column's profile, quasi-metadata, current commitments (anchor, missingness, operator), and verification status. When a dimension family is selected: shows the family's FD-DAG, member columns, and verification status. When a metric family is selected: shows the lineage graph, ip_reducer, block sets, attestation status. When nothing is selected: shows the AC overview (verification level, catalog status, recent activity).
- Right rail: operations panel. Shows the available workbench operations given the current selection. Clicking an operation opens a parameter form (most operations are zero-parameter or have a small number of obvious parameters); operations execute on confirm and write their results back into the session state.
- Top bar: session controls. Backend binding indicator, verification-level chip, checkpoint button, export menu, LLM-assistance toggle.
- Bottom drawer: operation log and integrity-catalog status. Recent operations, their inputs/outputs, the user's decisions; collapsible per-condition status from the integrity catalog with one-click drill-in to inspect a condition's evidence.
The UI is not a wizard. There's no "next step." The user navigates and operates at their own pace.
7.5 Serialization model (replaces the v1.0 schema.init)¶
The workbench emits multiple artifacts on export. The AC's runtime presence is the union of these.
my_ac.coframe/
├── ac.yaml # the AC catalog (formerly schema.init)
├── provenance.yaml # per-condition warrant trail
├── dq_deliverable.yaml # integrity-catalog status snapshot
├── quasi_metadata/ # per-table column profiles
│ ├── stores.qm.json
│ ├── products.qm.json
│ └── ...
├── artifacts/ # workbench-produced figures + reports
│ ├── fd_dag.dot
│ ├── lineage_graph.dot
│ └── ...
└── workbench_session.json # session-state checkpoint (replayable)
ac.yaml— the AC catalog. Same content as v1.0'sschema.initbut framed as workbench output rather than human-authored input. The runtime (DQ pipeline, query resolver, MCP) loads this. The format is unchanged from v1.0 §3.provenance.yaml— for every integrity condition in the catalog, records the warrant achieved (author-asserted / code-affirmed / data-attested / catalog / opted-out) and the evidence pointer (which workbench operation, what timestamp, what inputs). This is the audit trail.dq_deliverable.yaml— the structured DQ deliverable (Manual §7.7), aggregated from the per-condition verification operations the user ran in the workbench. Contains the computed verification level (A / AA / AAA) and the per-condition status.quasi_metadata/— per-table column profiles. The runtime uses these for query planning (cardinality hints, missingness flags); the workbench keeps them as a baseline for future drift detection.artifacts/— workbench-produced visualizations and reports. Not strictly needed by the runtime; useful for human reviewers and for documentation.workbench_session.json— the session state at export time. Loading this back into the workbench reproduces the exact AC for further refinement.
The runtime consumes the .coframe/ directory as a whole, not just ac.yaml. This is the "AC artifact" the platform's downstream pieces speak about.
7.6 Public API (programmatic)¶
The workbench is also addressable as a Python module — for scripting, CI, headless authoring, and for the demo/test infrastructure.
from coframe.author import Workbench
# Start a new session
wb = Workbench(workspace="./my_ac.coframe")
wb.bind_backend("sqlite", source="retail_demo.db")
# Run operations
tables = wb.enumerate_tables() # list[TableMeta]
profile = wb.column_profile("transactions", "revenue")
fds = wb.discover_fd_edges("stores", columns=["store_id", "city", "region"])
# ... etc
# Declare AC commitments
wb.declare_dimension_family("geography", base="store",
members=["store", "city", "region"])
wb.add_fd_edge("geography", from_="store", to="city")
wb.add_fd_edge("geography", from_="city", to="region")
# Run verifications
wb.verify_fd_edge("geography", from_="store", to="city")
wb.verify_grain_uniqueness(schema="transactions")
wb.attest_dna_edge(metric_family="revenue",
from_anchor=["store", "sku", "date"],
to_anchor=["region", "date"])
# Inspect state
print(wb.verification_level) # 'A' | 'AA' | 'AAA' | None
print(wb.integrity_status_summary()) # per-condition warrant counts
# Export
wb.export() # writes ./my_ac.coframe/{ac.yaml, ...}
The UI is a thin shell over this API. Anything the UI does is also doable programmatically — useful for CI/CD pipelines that author or refresh ACs on schedule.
7.7 Internal module layout¶
coframe-author/
├── coframe/
│ └── author/
│ ├── __init__.py # public API: Workbench
│ ├── session.py # session-state model + persistence
│ ├── operations/ # the operation catalog
│ │ ├── discovery.py # enumerate_tables, read_ddl, sample_rows, ...
│ │ ├── profiling.py # column_profile, pair_distinct, ...
│ │ ├── inference.py # discover_fd_edges, extract_lineage, ...
│ │ ├── declaration.py # declare_dimension_family, add_fd_edge, ...
│ │ ├── verification.py # verify_*, attest_dna_edge, ...
│ │ └── llm_assistance.py # propose_logical_name, propose_categorization, ...
│ ├── serialization/
│ │ ├── ac_catalog.py # emit ac.yaml
│ │ ├── provenance.py # emit provenance.yaml
│ │ ├── dq_deliverable.py # emit dq_deliverable.yaml
│ │ └── workspace.py # the .coframe/ directory contract
│ ├── ui/ # the web UI server (separate workspace)
│ │ ├── server.py # FastAPI app
│ │ ├── ws.py # workbench↔UI WS protocol
│ │ └── static/ # built React UI
│ └── cli.py # CLI entry point (`coframe-author serve`)
└── pyproject.toml
The UI's frontend lives in a separate workspace (e.g., coframe-author-ui/) and gets built into coframe/author/ui/static/ at package build time. The Python package ships the built static assets so end users don't need Node.js.
7.8 LLM-assistance integration¶
The authoring AI surface lives in coframe.author.operations.llm_assistance. It is propose-then-confirm:
- The user clicks "suggest logical name for this column" in the UI.
- The operation calls the LLM with: column physical name, a sample of values, the column's quasi-metadata, optionally a hint about the column's role (dimension / measure / identifier).
- The LLM returns one or more proposals.
- The proposals are surfaced in the UI as suggestions; the user clicks one to accept, or types their own.
- The chosen name (whether LLM-suggested or user-typed) is written to the AC declaration; the provenance log records both the LLM's suggestions and the user's decision.
LLM-assistance is opt-in: a workbench setting controls whether the assistance operations are surfaced at all, and (if enabled) the user manually invokes them. Nothing happens automatically. The LLM never writes to the AC; only the user does.
Vendor independence (Anthropic / OpenAI / Groq / local) follows the same pattern as coframe.dialogue: a thin client abstraction over an LLM provider, with the provider selectable via config.
7.9 What's NOT in coframe-author¶
- Frame-QL query execution. That's the runtime path; not the workbench's concern.
- Real-time AC editing for production ACs. The workbench is for authoring; once an AC is exported and bound for runtime use, edits go back through a new workbench session that loads the AC. Hot-edit-while-running is out of scope.
- Multi-user concurrent editing of the same session. v1.0 of the workbench is single-user. Multi-user collaboration is a Pro feature.
- Built-in version control. The
.coframe/workspace is a directory the user cangit-track themselves; the workbench doesn't ship git integration.
8. coframe-mcp — MCP server¶
(Mostly unchanged from v1.0 §6. The v2.0 deltas:)
- Dependency on
coframe-authoris optional. If a deployment wants to surface workbench-driven AC creation via MCP, it depends oncoframe-author; otherwise it depends only oncoframe-core. The default v1.0 deployment is core-only. - MCP capabilities (execution,
nl_query,validate_ac,coherence_posturepropagation including the AC's verification level) remain as in v1.0 §6.2. - MCP now loads the
.coframe/workspace directory (per §7.5) rather than a bareschema.init. The runtime usesac.yamlas the primary declaration and can read the sibling files (provenance, DQ deliverable, quasi-metadata) to enrich its responses.
v2.1 supplement note. MCP is one of the AC Surfaces (§1.2; supplement §10.2). The long-running
coframe-mcpprototype is superseded by the MCP host that ships insidecoframe-runtime(supplement §10.3 — the four-package frontend restructure pulls runtime-serving concerns into a dedicated package). The capability set above remains correct; the package that owns it changes.
9. coframe.dialogue — natural-language query layer¶
Unchanged from v1.0 §7. Logical-only, no data access, vendor-independent LLM client. Lives in coframe-core. See v1.0 §7 for the full spec.
10. Repository layout¶
coframe/ # monorepo root
├── packages/
│ ├── coframe-core/ # types, ql, resolution, dialogue,
│ │ # integrity catalog, quasi_metadata types
│ ├── coframe-connect/ # Backend protocol (execution + data-API)
│ ├── coframe-author/ # NEW (workbench + UI)
│ ├── coframe-sqlite/ # NEW (file-based backend)
│ ├── coframe-polars/ # execution + data-API (no .author)
│ ├── coframe-duckdb/ # execution + data-API (no .author)
│ └── coframe-mcp/ # MCP server
├── workspaces/
│ └── coframe-author-ui/ # React UI source (built into
│ # coframe-author/coframe/author/ui/static/)
├── tests/
│ ├── reference_suite/ # the canonical query corpus
│ └── conformance/ # cross-backend protocol tests
├── docs/
│ ├── manual/
│ ├── design/ # this document, v1.0, etc.
│ └── tutorials/
└── pyproject.toml # workspace root (uv)
11. Testing strategy¶
Mostly carried over from v1.0 §9. v2.0 deltas:
- Workbench operation tests. Each operation in
coframe-author's catalog gets unit tests (input → output, idempotency, error handling) plus integration tests that run againstcoframe-sqlitewith the retail demo dataset. - UI tests. The web UI gets a Playwright-based end-to-end test suite covering the canonical user journeys (new-AC, load-existing, run-verifications, export).
- Serialization round-trip tests. Author an AC in the workbench → export → load back → confirm session state matches.
- Cross-backend conformance tests. Same authoring operations + verifications run against
coframe-sqlite,coframe-polars,coframe-duckdb; results must match (modulo backend-specific edge cases declared in the conformance spec). - Demo workflow as integration test. The retail demo's flywheel script becomes a scripted end-to-end test invoking workbench operations in the demo's described order.
12. Public API stability¶
(Largely carried over from v1.0 §10. New v2.0 surface to stabilize:)
coframe.author.Workbenchand the operation catalog. v1.0 fixes the programmatic API; new operations can be added in v1.x; existing operation signatures can't change.- The
.coframe/workspace format. v1.0 fixes the directory layout, filenames, and the YAML/JSON schemas. Backward-compatible additions in v1.x; major-version bumps require migration tooling. - The Backend protocol's data-API surface. v1.0 fixes the operation set; new operations in v1.x are additive.
13. Build phasing (REWRITTEN for v2.0)¶
Superseded by v2.1 supplement §10.4 (vertical-slice-first phasing). The component decomposition and ownership in this section remain correct; the timing and ordering in supplement §10.4 are canonical. The amendment re-orders the work to ship the front-end + SQLite back-end together as an alpha milestone at ~16-18 weeks, with Frame-QL, polars, duckdb, MCP, and dialogue landing after. Read the section below for component scope; consult §10.4 of the supplement for what gets built when.
The phasing changes substantively from v1.0. The most important shift: coframe-author with UI is the new biggest deliverable, and coframe-sqlite is added as an early backend to unblock workbench development.
Phase 0 — Skeleton (1 week)¶
Monorepo + seven package skeletons + CI + manual/design docs.
Exit: uv sync && pytest runs cleanly.
Phase 1 — coframe-core foundations (3-4 weeks)¶
coframe-core modules excluding ql/, resolution/, dialogue/:
- Type primitives, integrity catalog (data-free I0/I1/I2/I7/I8/I9),
quasi_metadatacolumn-profile types,fd_dag,column_spec,name_map,ratios,missing,attestation/config,attestation/plan,ac,catalog.
Exit: load a hand-authored retail AC catalog YAML; data-free integrity clean; the planner correctly classifies the retail AC's edges; the integrity catalog enumerates all 27 conditions per drafts/specs/integrity_catalog.yaml.
Phase 2 — coframe-connect + coframe-sqlite (3-4 weeks)¶
Backend protocol + first reference backend. SQLite chosen as the first backend because it's stdlib (no install cost), embedded, real SQL, and small. Lets Phase 3+ proceed without waiting on heavier backends.
coframe-connect:Backendprotocol (execution + authoring data-API surfaces), source-binding types, entry-point discovery, conformance suite.coframe-sqlite: execution backend, data-API implementation,attest_dna_edge, loaders for CSV/Parquet→SQLite, entry-point registration.
Exit: retail demo's CSVs load into SQLite via coframe.sqlite.loaders; all data-API operations work against the loaded DB; conformance suite green.
Phase 3 — Frame-QL parser & semantic analysis (2-3 weeks)¶
coframe.ql.lexer,parser,ast,semantics,pretty.- Conform to Manual Appendix A BNF.
Exit: parse every example query in Manual Chapter 8; pretty-prints to equivalent input.
Phase 4 — Resolution + customization expansion + translation (4-5 weeks)¶
- All
resolution/modules including customization and translation passes. - Resolved AST node classes finalized.
plan/resolver.pyorchestrator.
Exit: resolve all reference-suite queries into resolved ASTs against the retail AC; reject the dubious subset with correct diagnostics; execute the well-formed subset against coframe-sqlite.
Phase 5 — coframe-author (workbench + UI) (8-10 weeks)¶
The new biggest commitment. Two parallel tracks:
- Track A — Python workbench backend. Session model, operation catalog, serialization layer, LLM-assistance integration. Each operation is unit-tested + integration-tested against
coframe-sqlite. - Track B — Web UI. React app. Session navigator, contextual workspace, operations panel, integrity-catalog drawer. WebSocket protocol between UI and workbench backend.
Phase 5 also formalizes the .coframe/ workspace format and writes a worked retail AC end-to-end through the workbench (start blank → bind to retail demo SQLite → enumerate tables → declare structure → run verifications → export).
Exit: - Full end-to-end demo: open the workbench, bind to retail SQLite, author the retail AC interactively, achieve AAA, export. ~30 minutes of user time. - UI passes Playwright canonical-journey tests. - Programmatic API can reproduce the same AC from a Python script (CI use case). - LLM-assistance opt-in works against at least Claude.
Phase 6 — coframe-polars (execution + data-API) (3-4 weeks)¶
Mirror of Phase 2 but for Polars. No .author submodule (it lives in coframe-author).
- Execution backend (LazyFrame walker, translation tables).
- Data-API implementation: all operations from §3.3.
attest_dna_edgewith full-attestation + sampling fallback.
Exit: retail AC re-authored through the workbench against coframe-polars (instead of coframe-sqlite); results identical to the SQLite path on the reference dataset.
Phase 7 — coframe-duckdb (execution + data-API) (3-4 weeks)¶
Mirror of Phase 6 for DuckDB. Plus persistent-connection mode + cached attested-edge status per v1.0 design.
Exit: retail AC re-authored against coframe-duckdb; cross-backend results identical.
Phase 8 — coframe.dialogue (1-2 weeks)¶
NL → Frame-QL translation layer, prompt templates, validators, NLQ test suite.
Exit: translate the curated test suite against the retail AC with target accuracy (90%+ valid, 80%+ matching expected within tolerance) across Claude at minimum.
Phase 9 — coframe-mcp (2-3 weeks)¶
MCP server depending on coframe-core + optionally coframe-author. Execution capabilities, nl_query, validate_ac, coherence-posture propagation.
Exit: Claude Desktop connects via stdio, lists ACs, describes selves, executes queries, NL-translates+executes; same coframe-mcp source ser
ves sqlite-backed, polars-backed, and duckdb-backed configurations.
Phase 10 — Hardening, docs, release (3-4 weeks)¶
Performance pass with quantitative targets (carried from v1.0 §11 Phase 8). Documentation: API docs, getting-started, workbench tutorial, MCP deployment guide, AC customization guide, NLQ usage guide, attestation operational guide, verification levels guide, time-varying-data guide, hierarchical-data guide.
3-5 worked-example ACs at varying verification levels.
Pre-release versioning, changelog, release process. Public v1.0 release.
Total estimate: ~32-40 weeks. Larger than v1.0's ~24-30 weeks because of Phase 5 (workbench + UI). Phases 6/7 are smaller than v1.0's polars/duckdb because authoring is no longer in those packages.
14. Open questions and deferred decisions¶
Resolved at v2.0 publication (carried from v1.0 §12):
- License: Apache 2.0.
- AC catalog format versioning: top-level coframe_version field.
- Diagnostic format: two-form structured + human-rendered.
Resolved newly in v2.0:
- UI tech. React + Vite, built into static assets shipped with the coframe-author Python package. WebSocket-based UI-to-workbench protocol. Rationale: matches the broader Coframe team's existing stack; a Python developer installs coframe-author and gets the UI without needing Node.
- Workbench session persistence. Local filesystem (.coframe/ directory) for v1.0. Multi-user / cloud-hosted sessions are a Pro feature.
Deferred to later phases:
- Quasi-metadata schema versioning. Settle Phase 1.
- Binding override file format. Settle Phase 1.
- I9 strictness. Settle Phase 1.
- The exact set of LLM-assistance operations to surface in v1.0 vs v1.x. Settle Phase 5.
- The set of UI canonical journeys covered by the Playwright suite. Settle Phase 5.
- Whether the workbench surfaces Frame-QL preview during authoring or defers to the runtime. Lean: defer.
Settled subsequently in the v2.1 supplement and its 2026-05-23 amendments:
- Multi-AC at installation level, L1/L2/L3 metadata layering, AC-level filter as fourth orthogonal customization control, frozen-scope phase, L2 stability filter, incremental update insulation — supplement §§2-7.
- Analytic Collection rename (was Analytics Collection; same acronym) — supplement §10.1.
- AC Surfaces as the umbrella term for the AC's access protocols — supplement §10.2.
- Three-UI / four-package frontend architecture (Workbench / AC Management / Query UIs;
coframe-author/coframe-management/coframe-runtime/coframe-frontendpackages) — supplement §10.3. - Vertical-slice-first build phasing with the front-end + SQLite alpha milestone at ~16-18 weeks — supplement §10.4 (supersedes §13 above).
15. What this document is not¶
This is not the Manual. The Manual specifies what the framework does. This document specifies how that gets organized into Python packages, modules, and tests.
This is not a marketing document. The position article (drafts/coframe_position_v2_0.md) carries the framework's positioning.
This is not a tutorial. The Phase 10 documentation deliverables include the tutorials; this design document just names them.
16. Forward-looking — what this v1.0 enables for Coframe Pro¶
The v1.0 platform — and especially the workbench — is the foundation for Pro extensions:
- Multi-user collaboration on workbench sessions (real-time co-editing of an AC).
- Cloud-hosted workbench with workspace-as-a-service.
- Slowly-Changing Attributes (SCA).
- Generalized functional grammar layer.
- Recursive (self-referential) hierarchies.
- Audit-grade verification certificates.
- Per-operator-asserted family attestation.
The v2.0 architecture — unified workbench, thin backend protocol, structured .coframe/ artifact — is designed to accommodate these additively.