Skip to content

AC-Level Derived Dimensions — Design Document, v0.1

Status: v0.1 design (2026-05-25). Companion to coframe_derived_metrics_design_v0_1.md. Captures the Shape A — declarative mapping/alias dimensions approach agreed on 2026-05-25 before implementation.

Author: reeeneeee Audience: implementers of Coframe Core v2.3+; reviewers checking the architectural commitments before code lands.

Scope. This document specifies how Coframe Core represents and executes AC-level derived dimensions — dimensions that are not physically present in any backend table but are derived from existing dimensions via an AC-declared mapping (e.g., coarse_region = MAP({East: Atlantic, West: Pacific, Central: Plains})). It complements the existing function-derived dimension machinery (month = MONTH_OF(day)) by adding user-declared mappings as a first-class AC declaration.

Out of scope. Expression-derived dimensions over metric values (revenue_tier = CASE WHEN revenue > 1000 THEN 'high' ELSE 'low' END) — the Shape C case from the architecture discussion. Those break the dimension/metric trichotomy and are Pro-tier. Arbitrary user catalog functions (Shape B beyond what's already in Chapter 10) are a separate registry concern, also deferred.


0. Naming + framing

A derived dimension is an AC-dimension whose values are computed at query time from an existing AC-dimension via an AC-declared mapping. The mapping is a pure function source → derived with no data dependency beyond the source value.

This is structurally similar to function-derived dimensions (the existing MONTH_OF(day) machinery, Manual §3.4.6) — both produce a per-row value from another dimension via a function. The difference:

Function-derived (existing) Mapping-derived (this design)
Declared via derived_by: MONTH_OF in a hierarchy path mapping: {East: Atlantic, ...} in a derived_dimensions block
Function source Chapter 10 operator catalog The AC's own declaration
Authoring overhead Catalog operator must exist None — pure data in YAML
Example day → month region → coarse_region

Mapping-derived dimensions are the natural extension of function-derived ones: same structural slot in the FD-DAG, same per-row computation model, just with an AC-supplied lookup instead of a catalog operator.


1. Motivation

Three common real-world needs Coframe Core doesn't directly support today:

  1. Coarseningcoarse_region = {East: Atlantic, West: Pacific, Central: Plains} for executive dashboards.
  2. Aliasingcountry_code = {USA: us, Canada: ca, Mexico: mx} to bridge a friendly column name to a system code.
  3. Bucketing fixed setspriority_tier = {p1: critical, p2: critical, p3: standard, p4: standard, p5: low} for ops dashboards that need fewer categories than the source.

In all three cases the author can today only get this by: - Materialising the derived column in the warehouse (heavy, has to be ETL'd), - Asking the user to write the mapping inline in every query (error-prone, not centralised), - Or registering a custom catalog operator (Pro-tier, overkill for a literal mapping).

The design ships a fourth option: declare the mapping on the AC, query against the derived dimension as if it were physical.


2. The Reading B principle (load-bearing)

The architectural commitment, consistent with derived metrics:

The user does not know coarse_region is derived. The backend doesn't know either (no SQL pushdown in v0.1; v0.2+ could push down a CASE). The metric engine knows and executes the mapping. The planner is partially aware — it has to substitute the derived dim's source when building the per-metric request, because grouping by the derived dim requires reading rows at the source-dim grain.

Slightly different asymmetry from derived metrics: dimensions affect the grain of the query, so the planner can't be completely blind. But the awareness is contained — Rule 1/2 see the derived dim as a first-class dim (via the auto-added FD-DAG edge), and only the per-metric request construction in execution.py needs to know about the substitution.

The principle is mapping, not data: a derived dimension is a declaration of how to map, not a new column to materialise. The substrate stays primitive — each metric is cached at the source dim's grain (e.g., revenue@(region,)); the derived view is computed on demand by mapping + re-aggregating.


3. AC schema for derived dimensions

The derived_dimensions block lives on DimensionFamily — because a derived dimension naturally belongs to one family (it coarsens or aliases within that family's coordinate space).

dimension_families:
  - name: geography
    description: "Store geography."
    base_level: store
    members: [store, city, region]      # primitive members; derived ones auto-appended
    hierarchies:
      - name: administrative
        path: [store, city, region]
    derived_dimensions:
      - name: coarse_region
        description: "Coarsened region grouping for executive views."
        derived_from: region
        mapping:
          East:    Atlantic
          West:    Pacific
          Central: Plains
        default: null                   # optional; null = strict (reject unknowns)

After loading: - coarse_region is appended to members (so it shows in members) - The FD-DAG gets a new edge: region → coarse_region, with source="mapping_derived" and a reference to the mapping - The mapping is stored on the AC for runtime access via ac.derived_dimension(name) → DerivedDimensionSpec

Pydantic shape

class DerivedDimensionSpec(BaseModel):
    name: str                    # the derived dim's AC-dimension name
    description: str | None = None
    derived_from: str            # source dim — must be in this family's members
    mapping: dict[str, str]      # source-value → derived-value
    default: str | None = None   # value for unmapped source values; None = strict refuse

    model_config = ConfigDict(frozen=True, extra="forbid")

The mapping's key/value types are str for v0.1 — Coframe dimensions are almost always strings (region names, categories, codes). Numeric / date sources can be supported in a later iteration by widening the value type.


4. Validation rules

At AC load + cross-ref time:

Rule Detail
DD-100 Source dim exists derived_from resolves to a declared member of the SAME family
DD-101 Derived name unique name doesn't collide with any other AC-dimension (across all families)
DD-102 Source not itself derived derived_from points to a primitive or function-derived dimension, not another mapping-derived one (v0.1 conservative; v0.2 may relax)
DD-103 Mapping non-empty At least one entry in mapping; empty is forbidden as it'd produce empty Frames
DD-104 Default optional If default is None, strict mode — unknown source values produce an error at query time; if a string, that string is used for unmapped values

The "DD-" prefix parallels "D-" for derived metrics; both join the integrity catalog (Manual §2.10) under a new naming scheme.


5. FD-DAG integration

Mapping-derived dimensions extend the existing FD-DAG edge taxonomy:

FDEdgeSource = Literal[
    "hierarchy",         # existing — implied by path: [d1, d2, d3]
    "function_derived",  # existing — implied by {ac_dimension: m, derived_by: F}
    "data_attested",     # existing — verified at Phase 2
    "extra",             # existing — extra_fd_edges list
    "mapping_derived",   # NEW
]

A mapping-derived edge carries no derived_by (no catalog operator) but a reference to the mapping. Storage choice for v0.1: keep the FDEdge model lean (no payload), and look up the mapping on demand via ac.derived_dimension(tail_name).

Reachability behaviour is unchanged — derived dims become FD-reachable from their source, so all existing schema-selection / cross-schema-coherence logic flows through transparently.


6. Execution: engine-side mapping + re-aggregate

When the engine sees a request for metric@(<derived_dim>, ...):

serve(METRIC, "revenue", ("coarse_region",))
├── Branch 0: not derived (skip)
├── Branch 1: exact match on (coarse_region,)? Usually no.
├── Branch 2a (existing): subset rollup from a finer cached entry.
│   E.g., revenue@(region, day) → revenue@(coarse_region,) would need
│   substitution; not handled by current subset-only Branch 2.
├── Branch 2b (NEW for derived dims): FD-edge rollup via mapping.
│   - Detect: the requested anchor contains a derived dim
│   - Substitute: serve at the source-dim-substituted anchor instead
│     (recursive: serve(METRIC, "revenue", ("region",)))
│   - Apply mapping in Polars: with_columns(col(region).replace(mapping))
│   - Re-aggregate: group_by("coarse_region").agg(sum("revenue"))
│   - Return as LazyFrame with [coarse_region, revenue] columns
└── Branch 3: backend fallback (only if Branch 2b's recursion couldn't
              serve the source anchor either).

The re-aggregation step uses the metric family's first ip_reducer (typically SUM) — same partition-invariance rule as the existing Branch 2.

Crucially: the engine never materialises an entry at the derived-dim grain. The substrate stays primitive (entries at source-dim grains only); the derived view is computed on demand by mapping + re-aggregation. Same principle as derived metrics (formula not data).

Multi-derived-dim anchors

If the requested anchor has multiple derived dims (e.g., (coarse_region, coarse_quarter)), the engine substitutes each independently → serves at (region, quarter) → applies each mapping → re-aggregates on the derived-dim columns. Conceptually clean; implementation iterates over the anchor.


7. The planner's narrow involvement

Unlike derived metrics (where the planner stays fully unaware), derived dims require a small planner touch:

  • Rule 1 (family resolution): unchanged.
  • Rule 2 (anchor-set capability): the derived dim is just a member of its family; block-set checks treat it like any other dim.
  • Rule 3 (schema selection): the derived dim isn't a physical column, so the planner's "find sibling at this grain" search must skip it. Instead, the planner substitutes derived dims with their source dims when building the candidate schema search.
  • Rule 4 (cross-schema coherence): unchanged.

In _resolve_metric / apply_rule_3: before searching for a schema that hosts a sibling at target_grain, substitute every derived dim in target_grain with its source dim. Pass the substituted grain to the schema search. The schema selection finds a schema that has the source dim physically. The substitution metadata is stashed on the ResolvedMetric for the executor to apply.

The execution path is then: 1. Backend computes aggregates at the source-dim grain 2. Engine maps source → derived values + re-aggregates 3. Returned Frame's columns carry the derived dim's name (not the source)


8. Trade-offs accepted

8.1 No SQL pushdown of mappings in v0.1

The backend always groups by the source dim; the engine handles the mapping post-read. Costs: one extra Python/Polars pass after each metric serve. For typical AC sizes (a few dozen distinct values in a mapping) this is sub-millisecond. Pro could add SQL pushdown by extending AggregateRequest.group_by to allow CASE expressions — forward-compat, not needed for v0.1.

8.2 v0.1 doesn't support nested derivation

A mapping-derived dim cannot itself be the source of another mapping-derived dim (coarse_region → continent rejected). This keeps the FD-DAG cycle check simple. v0.2 could relax this.

8.3 Strict vs default mode for unmapped values

When the backend returns a source value not in the mapping: - Strict mode (default: null): the engine raises UnmappedDimensionValueError with the offending value(s). - Default mode (default: "Other"): unmapped values get the default; the result is grouped correctly.

Strict is the v0.1 default. Forces the AC author to be explicit about coverage.

8.4 Re-aggregation requires a partition-invariant ip_reducer

If a metric is anchor-locked (no ip_reducer) at the source-dim grain, querying it at a derived-dim grain is refused (the mapping might collapse multiple source values into one derived value; we'd need to re-aggregate, but there's no reducer). Same rule as the existing Branch 2 rollup. Failure is a clear AnchorLockedError.


9. Non-goals

Non-goal Note
Expression-derived dims over metrics (revenue_tier) Shape C; Pro-tier; breaks the trichotomy
Catalog-function dims beyond mapping (QUARTER_OF) Already supported via Manual §3.4.6 hierarchy syntax + Chapter 10 catalog
Nested mappings (mapping-derived from mapping-derived) v0.2 affordance
Mappings with non-string source/target types v0.2 affordance
User-supplied mapping override at query time Would defeat the AC-as-source-of-truth posture

10. Implementation plan

Three slices, in dependency order:

Slice 1 — AC schema + validation + FD-DAG integration

  • Extend derived.py with DerivedDimensionSpec Pydantic model + a sentinel MAPPING_DERIVED_EDGE source.
  • Extend DimensionFamily with optional derived_dimensions: tuple[DerivedDimensionSpec, ...].
  • After-construction validator: appends derived names to members if not already there.
  • Extend FDEdgeSource literal with "mapping_derived".
  • Extend _edges_from_dimension_family to emit the new edges.
  • AC cross-ref: DD-100..DD-104 validation rules.
  • AC accessors: is_derived_dimension, derived_dimension(name).
  • Tests: declaration, FD-DAG reachability, validation rules.

Slice 2 — Engine execution

  • New _serve_via_derived_dim_rollup helper in engine.py (Branch 2b).
  • Detect derived dims in serve()'s requested anchor; substitute → recursive serve → mapping + re-aggregate.
  • Polars implementation: with_columns(col(src).replace_strict(mapping)) + group_by + agg.
  • Strict / default mode behaviour.
  • Tests: warm components, cold components, multi-derived-dim anchors, anchor-locked refusal.

Slice 3 — Planner + retail demo + end-to-end

  • Planner: substitute derived dims with source dims in schema search (apply_rule_3 + _resolve_metric).
  • Stash substitution metadata on ResolvedMetric.
  • Executor: pass substituted grain to backend; apply derived-dim transform after engine returns.
  • Retail demo: declare coarse_region (East: Atlantic, West: Pacific, Central: Plains).
  • Walkthrough Step 7c: SELECT coarse_region, SUM(revenue) AT coarse_region.
  • End-to-end test through Frame-QL.

Estimated effort: ~250–400 LOC + tests across the three slices.


11. Summary

AC-level derived dimensions are added as mappings declared on the AC, executed by the engine, with narrow planner support. The key commitments:

  1. Mapping, not data — derived dims never get a backend column or a cached entry at the derived grain. The mapping is applied on demand post-read.
  2. FD-DAG integration is structural — derived dims appear as ordinary nodes reachable from their source via a new mapping_derived edge type. Reachability + schema selection flow through transparently.
  3. Planner is mostly unaware — Rules 1, 2, 4 unchanged. Rule 3 gets one targeted substitution (derived → source) for schema search.
  4. Engine owns the execution — new Branch 2b: substitute, recursive-serve at source grain, apply mapping + re-aggregate.
  5. Backend never sees them — only primitive dim columns go over the protocol. The mapping is engine-internal.

The design follows the Reading B principle adapted for the dimension/metric asymmetry (dims must be applied pre-aggregation, so the planner can't be quite as blind). The cleanest principle-preserving extension: declared mappings, engine-side execution, no SQL pushdown in v0.1.

When implementation lands (per §10), the canonical retail AC will declare coarse_region directly, and Frame-QL authors will write SELECT coarse_region, SUM(revenue) AT coarse_region and have it just work — with served_from: engine_cache whenever revenue@(region,) is warmed.