docs: update design principles and repo map after test audit
- Add virtual category boundary rule: use regular_category_names() for user-facing logic, never expose _Index/_Dim - Document formula tokenizer keyword-aware identifier breaking - Update repo-map test counts (356 → 510) and add regular_category_names Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@ -62,7 +62,10 @@ editing) is two commands composed at the binding level, not a monolithic handler
|
||||
|
||||
- `CategoryKind` is `Regular | VirtualIndex | VirtualDim | Label`. Business rules
|
||||
(e.g., the 12-category limit counts only `Regular`) are enforced by matching
|
||||
on the enum, not by checking name prefixes.
|
||||
on the enum, not by checking name prefixes. Virtual categories (`_Index`,
|
||||
`_Dim`) exist solely for drill-down mechanics and must never leak into
|
||||
user-facing logic — use `Model::regular_category_names()` when selecting a
|
||||
default category for formulas, prompts, or other user-visible choices.
|
||||
|
||||
### When You Add a Variant, the Compiler Finds Every Call Site
|
||||
|
||||
@ -112,6 +115,18 @@ Formulas are parsed into a typed AST (`Expr` enum) at entry time. If the syntax
|
||||
is invalid, the user gets an error immediately. The evaluator only sees
|
||||
well-formed trees — it does not need to handle malformed input.
|
||||
|
||||
### Formula Tokenizer: Multi-Word Identifiers and Keywords
|
||||
|
||||
The formula tokenizer supports multi-word identifiers (e.g., `Total Revenue`)
|
||||
by allowing spaces within identifier tokens when followed by non-operator
|
||||
characters. However, keywords (`WHERE`, `SUM`, `AVG`, `MIN`, `MAX`, `COUNT`,
|
||||
`IF`) act as token boundaries — the tokenizer breaks an identifier when:
|
||||
1. The identifier collected **so far** is a keyword (e.g., `WHERE ` stops at `WHERE`).
|
||||
2. The **next word** after a space is a keyword (e.g., `Revenue WHERE` stops at `Revenue`).
|
||||
|
||||
This ensures `SUM(Revenue WHERE Region = "East")` tokenizes correctly as
|
||||
separate tokens while `Total Revenue` remains a single identifier.
|
||||
|
||||
---
|
||||
|
||||
## 4. Separation of Concerns
|
||||
|
||||
Reference in New Issue
Block a user