Retail Demo — Phase 8 + Phase 9 Walkthrough¶
A copy-paste-able walkthrough showing what the retail demo produces after Phase 8 (the Metric Engine) and Phase 9 (the DuckDB backend) landed. Every command here is real and runs against the canonical retail fixtures in drafts/data/retail_demo/.
If the older Demo page on the left covers what Coframe is, this page covers what the end-to-end demo bootstrap actually does today — with the engine in the loop and DuckDB as the backend.
Prerequisites¶
The [http] extra brings FastAPI + the Workbench HTTP app; [engine] brings coframe-metric-engine + Polars (the engine's internal substrate).
What the demo bundles¶
Five hand-authored retail tables under drafts/data/retail_demo/:
| Table | Grain | Rows |
|---|---|---|
stores |
store_id | 13 |
products |
sku | 2,231 |
transactions |
transaction_id | 523,023 |
store_monthly_inventory |
store × month | 195 |
region_daily_summary |
region × day | 1,329 |
Plus a committed AC at drafts/data/retail_demo/retail.coframe/ac.yaml and an installation manifest at drafts/data/retail_demo/installation.coframe/installation.yaml.
The installation declares:
backend:
type: duckdb
source: ./retail_demo.duckdb
metric_engine:
enabled: true
max_bytes_per_ac: 1073741824 # 1 GiB
Two metric families declare hot grains (Phase 8 F2 § 9.2 — push opt-in for the cache):
- name: revenue
family_root: {schema: transactions, column: revenue}
ip_reducers:
- {operator: SUM, a_block: []}
cache_hint:
materialize_at:
- [region]
- [region, day]
- name: cost
family_root: {schema: transactions, column: cost}
ip_reducers:
- {operator: SUM, a_block: []}
cache_hint:
materialize_at:
- [region]
Step 1 — Bootstrap the dev server¶
What it does, in order:
- Loads
drafts/data/retail_demo/*.csvinto.coframe-dev-data/retail.duckdb(skipped if it already exists). - Constructs a
BackendRegistry+EngineRegistrykeyed on"retail-demo". - Runs the warmup walker against
revenueandcost'scache_hintblocks → pre-materialises three entries:revenue@(region,),revenue@(region, day),cost@(region,). - Mounts the runtime app under
/runtime/*on the same uvicorn instance. - Listens on
127.0.0.1:8000.
Expected boot log:
coframe dev server (DuckDB + Metric Engine)
data dir: /…/.coframe-dev-data
→ loading /…/drafts/data/retail_demo/*.csv into …/retail.duckdb …
products: 2,231 rows
region_daily_summary: 1,329 rows
store_monthly_inventory: 195 rows
stores: 13 rows
transactions: 523,023 rows
✓ wrote …/installation.yaml
→ warming up engine for AC 'retail_full' …
✓ revenue @ ('region',)
✓ revenue @ ('day', 'region')
✓ cost @ ('region',)
✓ pre-materialised 3 (family, anchor) entries
Serving http://127.0.0.1:8000
workbench: http://127.0.0.1:8000/healthz
runtime: http://127.0.0.1:8000/runtime/healthz
installation_id: retail-demo
Override behaviour via env vars: COFRAME_DEV_DATA_DIR, COFRAME_DEV_PORT, COFRAME_DEV_RELOAD=1, COFRAME_DEV_SKIP_WARMUP=1.
Step 2 — The canonical query → engine cache hit¶
The flagship demo query:
curl -s -X POST http://127.0.0.1:8000/runtime/query \
-H 'content-type: application/json' \
-d '{"installation_id":"retail-demo",
"source":"SELECT region, SUM(revenue) AS total_revenue AT region"}'
Response (served_from is the headline — F2 + F4 in action):
{
"ac_name": "retail_full",
"columns": ["region", "total_revenue"],
"rows": [["Central", 3868636.36], ["East", ...], ["West", ...]],
"row_count": 3,
"served_from": "engine_cache"
}
served_from: "engine_cache" means the backend was not touched — the warmup walker pre-materialised revenue@(region,) so the engine answered from Parquet.
Run the same call again → identical served_from: "engine_cache". (If the engine were off, you'd see "backend".)
Step 3 — A cold grain → FD-DAG rollup¶
Try a grain that wasn't pre-materialised:
curl -s -X POST http://127.0.0.1:8000/runtime/query \
-H 'content-type: application/json' \
-d '{"installation_id":"retail-demo",
"source":"SELECT day, SUM(revenue) AS r AT day"}'
The engine doesn't have revenue@(day,), but it does have revenue@(region, day) (warmed). It recognises {day} as a subset of {region, day}, rolls up the finer cached entry by grouping on day, returns the result — and memoises the rollup at (day,). Response carries served_from: "engine_cache" (slice 5's serve() handled it before the per-metric loop even fell through to the backend path).
Run the query a second time → still "engine_cache", this time via the exact-match branch over the newly-memoised entry.
Step 4 — A genuine miss → engine_backend¶
A query at a grain that's neither warmed nor reachable via rollup from a warmed grain:
curl -s -X POST http://127.0.0.1:8000/runtime/query \
-H 'content-type: application/json' \
-d '{"installation_id":"retail-demo",
"source":"SELECT region, store, SUM(revenue) AS r FROM transactions AT (region, store)"}'
First call: served_from: "engine_backend". The engine ingested the result and memoised it.
Second identical call: served_from: "engine_cache". Backend skipped.
Step 5 — Cross-backend invariant (optional)¶
Same query, three backends, three identical Frames (per packages/coframe-duckdb/tests/test_cross_backend_invariants.py):
You'll see 16 tests pass: 6 canonical queries × 3 backend pairs = 18 pairwise Frame comparisons, plus FD attestations + discovery invariants.
Step 6 — Workbench Engine page (visual)¶
Open http://127.0.0.1:8000/healthz first to confirm the server is up. Then start the frontend dev server:
Create a workbench session, then navigate to Engine in the sidebar. You'll see:
- Manifest panel: entries / bytes used / budget (with %-of-budget hint); a table of the materialised entries showing family, anchor, operator, row count, byte size, access count, age.
- Promotion recommendations panel: empty initially under the default
min_hits=100 / min_age_days=7thresholds. Lower the thresholds in the UI (e.g., setmin_hits=0,min_age_days=0) and click Apply — the freshly-materialised entries appear as candidates with paste-able YAML stanzas extending the family'scache_hint.
You can also see the per-query served_from badge on the Query page next to the row count: green "Engine cache", amber "Engine + backend" / "Engine → backend", or grey "Backend" (engine disabled).
Step 7a — Derived metrics via compose()'s frame_expression hook¶
profit = revenue - cost is a post-aggregation derived metric — well-defined at any grain because both inputs are partition-invariant SUMs. (Naming note: per the convention profit = revenue − cost; gross profit = revenue − COGS; margin = profit / revenue; gross margin = gross profit / revenue. The retail dataset's cost column is the all-in per-transaction cost, so the subtraction is profit, not gross profit.) AC-level declaration of cross-family lineage (so a Frame-QL author can write SELECT region, profit AT region directly) is forward-compat resolver work. Today the engine exposes this via compose()'s frame_expression hook (slice 6); callers can derive metrics programmatically from Python:
import polars as pl
from coframe.metric_engine import Domain, MetricEngine
from coframe.duckdb import DuckDBBackend
backend = DuckDBBackend(database=".coframe-dev-data/retail.duckdb", read_only=True)
engine = MetricEngine(ac=ac, store_root=".coframe-dev-data/metric_engine/retail-demo")
# 1) Pull per-metric LazyFrames from the engine (warmup already
# populated both at (region,) per the AC's cache_hints).
revenue_lf = engine.serve(Domain.METRIC, "revenue", ("region",),
operator="SUM", partition_invariant=True)
cost_lf = engine.serve(Domain.METRIC, "cost", ("region",),
operator="SUM", partition_invariant=True)
# 2) Compose with a frame_expression deriving profit
# post-aggregation. The lambda is a (pl.LazyFrame → pl.LazyFrame)
# transformer applied after the per-metric merge.
def derive(lf: pl.LazyFrame) -> pl.LazyFrame:
return lf.with_columns(
(pl.col("revenue") - pl.col("cost")).alias("profit"),
)
frame = engine.compose([revenue_lf, cost_lf], ("region",), frame_expression=derive)
print(frame)
# shape: (3, 4)
# ┌─────────┬───────────┬───────────┬───────────┐
# │ region ┆ revenue ┆ cost ┆ profit │
# │ --- ┆ --- ┆ --- ┆ --- │
# │ str ┆ f64 ┆ f64 ┆ f64 │
# ╞═════════╪═══════════╪═══════════╪═══════════╡
# │ Central ┆ 3868636.4 ┆ 2710830.2 ┆ 1157806.2 │
# │ East ┆ … ┆ … ┆ … │
# │ West ┆ … ┆ … ┆ … │
# └─────────┴───────────┴───────────┴───────────┘
The frame_expression is arbitrary Polars: weighted ratios, percent-of-total, cohort math, anything Polars can express on a LazyFrame works. Coverage in packages/coframe-duckdb/tests/test_demo_smoke.py::test_profit_via_frame_expression_end_to_end.
Step 7b — Derived metrics from Frame-QL (Reading B)¶
The retail AC declares profit as a derived family:
- name: profit
description: "Derived: revenue - cost."
derived:
formula: "revenue - cost"
inputs: [revenue, cost]
A Frame-QL author can now reference profit directly — no programmatic compose() needed:
curl -s -X POST http://127.0.0.1:8000/runtime/query \
-H 'content-type: application/json' \
-d '{"installation_id":"retail-demo",
"source":"SELECT region, profit AT region"}'
Response:
{
"ac_name": "retail_full",
"columns": ["region", "profit"],
"rows": [["Central", 1157806.16], ["East", ...], ["West", ...]],
"row_count": 3,
"served_from": "engine_cache"
}
How it works (per drafts/coframe_derived_metrics_design_v0_1.md — Reading B):
| Layer | What it sees |
|---|---|
| User | A family called profit. No knowledge it's derived. |
| Planner | Resolves profit as a schema-virtual family — skips Rules 3 / 4. Passes a single ServingRequest to the engine. |
| Engine | Branch 0 fires: recursively serves revenue and cost (both hit engine_cache from the warmup), applies the AC-declared formula revenue - cost via compose()'s frame_expression, returns a LazyFrame projected to [region, profit]. |
| Backend | Sees only the primitive aggregate(revenue ...) and aggregate(cost ...) requests that landed revenue@(region,) and cost@(region,) into cache at warmup. Never sees the symbol profit. |
The engine never memoises the derived result — every call re-runs the subtraction over the cached components. Trivial cost, no invalidation problem. The substrate stays primitive.
Try a mixed query — primitive revenue and derived profit in the same SELECT:
curl -s -X POST http://127.0.0.1:8000/runtime/query \
-H 'content-type: application/json' \
-d '{"installation_id":"retail-demo",
"source":"SELECT region, revenue, profit AT region"}'
served_from: "engine_cache" — both metrics hit cache (revenue directly, profit via Branch 0's recursive serves which hit revenue's cache and cost's cache).
Step 7 — Promotion lifecycle (the F5 closer)¶
After enough queries, the engine's access stats accumulate. The promotion recommendation surface (Phase 8 F5) is exposed via:
curl -s "http://127.0.0.1:8000/sessions/<id>/engine/recommendations?min_hits=0&min_age_days=0&min_hits_per_day=0"
Each recommendation includes a paste-able YAML stanza:
# Add to metric_family 'revenue' in ac.yaml:
cache_hint:
materialize_at:
- [day] # Engine-recommended: 142 hits / 9 days = 15.8 hits/day
The intended workflow: paste back into ac.yaml, re-COMMIT, the next warmup pass pre-materialises it deterministically. The flywheel closes — ambient cache → declared cache_hint → next-COMMIT pre-materialisation.
What you saw¶
In ~20 commands you've exercised every capability of Phase 8 + Phase 9:
| Capability | Where it surfaced |
|---|---|
DuckDB as a v2 DataAPIBackend |
Step 1 boot log shows DuckDB load + connection |
| Cross-backend protocol equivalence | Step 5 — same Frame on SQLite + Polars + DuckDB |
| Engine pre-materialisation (F2) | Step 1 warmup log + Step 2 cache-hit on the warmed grain |
| FD-DAG rollup (F3, slice 5) | Step 3 — day served by rolling up (region, day) |
| Lazy memoisation (slice 8 / F4) | Step 4 — cold miss → backend → memoise → second call hits cache |
served_from reporting (D4) |
Steps 2-4 — every response carries the badge label |
| Promotion recommendations (F5) | Step 7 + Engine page in Step 6 |
| Engine UI surface (D5) | Step 6 — Manifest + Recommendations panels |
Derived-metric composition (slice 6 frame_expression) |
Step 7a — programmatic profit = revenue - cost |
The design doc framing (engine = "ambient working memory"; AC = "long-term declared memory"; promotion = the bridge) is what you've just demonstrated end-to-end.