docs: update design principles and repo map after test audit

- Add virtual category boundary rule: use regular_category_names() for
  user-facing logic, never expose _Index/_Dim
- Document formula tokenizer keyword-aware identifier breaking
- Update repo-map test counts (356 → 510) and add regular_category_names

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Edward Langley
2026-04-08 23:44:25 -07:00
parent 96d783a1c5
commit 09df7bf181
2 changed files with 24 additions and 8 deletions

View File

@ -62,7 +62,10 @@ editing) is two commands composed at the binding level, not a monolithic handler
- `CategoryKind` is `Regular | VirtualIndex | VirtualDim | Label`. Business rules
(e.g., the 12-category limit counts only `Regular`) are enforced by matching
on the enum, not by checking name prefixes.
on the enum, not by checking name prefixes. Virtual categories (`_Index`,
`_Dim`) exist solely for drill-down mechanics and must never leak into
user-facing logic — use `Model::regular_category_names()` when selecting a
default category for formulas, prompts, or other user-visible choices.
### When You Add a Variant, the Compiler Finds Every Call Site
@ -112,6 +115,18 @@ Formulas are parsed into a typed AST (`Expr` enum) at entry time. If the syntax
is invalid, the user gets an error immediately. The evaluator only sees
well-formed trees — it does not need to handle malformed input.
### Formula Tokenizer: Multi-Word Identifiers and Keywords
The formula tokenizer supports multi-word identifiers (e.g., `Total Revenue`)
by allowing spaces within identifier tokens when followed by non-operator
characters. However, keywords (`WHERE`, `SUM`, `AVG`, `MIN`, `MAX`, `COUNT`,
`IF`) act as token boundaries — the tokenizer breaks an identifier when:
1. The identifier collected **so far** is a keyword (e.g., `WHERE ` stops at `WHERE`).
2. The **next word** after a space is a keyword (e.g., `Revenue WHERE` stops at `Revenue`).
This ensures `SUM(Revenue WHERE Region = "East")` tokenizes correctly as
separate tokens while `Total Revenue` remains a single identifier.
---
## 4. Separation of Concerns