docs: update repo-map and design-principles for pest parser

- Document PEG grammar as single source of truth for .improv format
- Update file format section with v2025-04-09 syntax: version line,
  Initial View, pipe quoting, Views→Formulas→Categories→Data order
- Add pipe quoting convention and grammar-driven testing principles
- Update file inventory (persistence: 124+2291 lines, 83 tests)
- Add pest/pest_meta to dependency table
- Update persistence testing guidance for grammar-walking generator

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Executed-By: spot
This commit is contained in:
Edward Langley
2026-04-09 02:58:38 -07:00
parent 70e7cfbef7
commit 07c8f4a40a
2 changed files with 100 additions and 37 deletions

View File

@ -38,8 +38,10 @@ editing) is two commands composed at the binding level, not a monolithic handler
### Dispatch Through Traits and Registries, Not Match Blocks ### Dispatch Through Traits and Registries, Not Match Blocks
- **Commands**: 40+ types each implement `Cmd`. A `CmdRegistry` maps names to - **Commands**: 40+ types each implement `Cmd`, organized by concern across
constructor closures. Dispatching a key presses looks up the binding, resolves submodules in `command/cmd/` (navigation, cell, commit, grid, mode, panel,
search, text_buffer, tile, effect_cmds). A `CmdRegistry` maps names to
constructor closures. Dispatching a key press looks up the binding, resolves
the command name through the registry, and calls `execute`. No central the command name through the registry, and calls `execute`. No central
`match command_name { ... }` block. `match command_name { ... }` block.
@ -117,6 +119,26 @@ Formulas are parsed into a typed AST (`Expr` enum) at entry time. If the syntax
is invalid, the user gets an error immediately. The evaluator only sees is invalid, the user gets an error immediately. The evaluator only sees
well-formed trees — it does not need to handle malformed input. well-formed trees — it does not need to handle malformed input.
### Grammar-Defined File Format
The `.improv` file format is defined by a PEG grammar (`persistence/improv.pest`)
and parsed by pest. The grammar is the single source of truth — the parser is a
tree-walker over the grammar's parse tree, not an ad-hoc line scanner. This means:
- Adding a new format feature means updating the grammar first, then the walker.
- The grammar can be read as a specification independent of the Rust code.
- A grammar-walking test generator reads the grammar AST at test time (via
`pest_meta`) and produces random valid files, ensuring the parser accepts
everything the grammar describes.
### CL-Style Pipe Quoting for Names
Names in the `.improv` format use CL-style `|...|` pipe quoting. A name is bare
if it matches `[A-Za-z_][A-Za-z0-9_-]*`; everything else must be pipe-quoted.
Escapes inside pipes: `\|` (literal pipe), `\\` (backslash), `\n` (newline).
This convention is shared between the `.improv` persistence format and the
formula parser's identifier syntax.
### Formula Tokenizer: Identifiers and Quoting ### Formula Tokenizer: Identifiers and Quoting
**Bare identifiers** support multi-word names (e.g., `Total Revenue`) by **Bare identifiers** support multi-word names (e.g., `Total Revenue`) by
@ -244,9 +266,14 @@ milliseconds each are a sign something is wrong.
- **Model, formula, view**: the core logic. Unit tests for each operation and - **Model, formula, view**: the core logic. Unit tests for each operation and
edge case. Property tests for invariants. These are the highest-value tests. edge case. Property tests for invariants. These are the highest-value tests.
- **Commands**: build a `CmdContext`, call `execute`, assert on the returned - **Commands**: build a `CmdContext`, call `execute`, assert on the returned
effects. Pure functions — no terminal needed. effects. Pure functions — no terminal needed. Tests are colocated in each
command submodule (`command/cmd/<module>.rs``mod tests`), with shared
test helpers in `command/cmd/mod.rs::test_helpers`.
- **Persistence**: round-trip tests (`save → load → save` produces identical - **Persistence**: round-trip tests (`save → load → save` produces identical
output). Cover groups, formulas, views, hidden items, legacy JSON. output) plus grammar-driven property tests. The generator walks the pest
grammar AST to produce random valid files; proptests verify
`parse(generate())` succeeds and `parse(format(parse(generate())))` is
stable. Cover groups, formulas, views, hidden items, pipe quoting edges.
- **Format**: boundary cases for comma placement, rounding, negative numbers. - **Format**: boundary cases for comma placement, rounding, negative numbers.
- **Import**: field classification heuristics, CSV quoting, multi-file merge. - **Import**: field classification heuristics, CSV quoting, multi-file merge.

View File

@ -10,14 +10,14 @@ Crate `improvise` v0.1.0, Apache-2.0, edition 2021.
| I need to... | Look in | | I need to... | Look in |
|---------------------------------------|----------------------------------------------| |---------------------------------------|----------------------------------------------|
| Add a new keybinding | `command/keymap.rs``default_keymaps()` | | Add a new keybinding | `command/keymap.rs``default_keymaps()` |
| Add a new user-facing command | `command/cmd.rs` → implement `Cmd`, register in `default_registry()` | | Add a new user-facing command | `command/cmd/` → implement `Cmd` in the relevant submodule, register in `registry.rs` |
| Add a new state mutation | `ui/effect.rs` → implement `Effect` | | Add a new state mutation | `ui/effect.rs` → implement `Effect` |
| Change formula evaluation | `model/types.rs``eval_formula()`, `eval_expr()` | | Change formula evaluation | `model/types.rs``eval_formula()`, `eval_expr()` |
| Change how cells are stored/queried | `model/cell.rs``DataStore` | | Change how cells are stored/queried | `model/cell.rs``DataStore` |
| Change category/item behavior | `model/category.rs``Category` | | Change category/item behavior | `model/category.rs``Category` |
| Change view axis logic | `view/types.rs``View` | | Change view axis logic | `view/types.rs``View` |
| Change grid layout computation | `view/layout.rs``GridLayout` | | Change grid layout computation | `view/layout.rs``GridLayout` |
| Change .improv file format | `persistence/mod.rs``format_md()`, `parse_md()` | | Change .improv file format | `persistence/improv.pest` (grammar), `persistence/mod.rs``format_md()`, `parse_md()` |
| Change number display formatting | `format.rs``format_f64()` | | Change number display formatting | `format.rs``format_f64()` |
| Change CLI arguments | `main.rs` → clap structs | | Change CLI arguments | `main.rs` → clap structs |
| Change import wizard logic | `import/wizard.rs``ImportPipeline` | | Change import wizard logic | `import/wizard.rs``ImportPipeline` |
@ -25,7 +25,7 @@ Crate `improvise` v0.1.0, Apache-2.0, edition 2021.
| Change TUI frame layout | `draw.rs``draw()` | | Change TUI frame layout | `draw.rs``draw()` |
| Change app state / mode transitions | `ui/app.rs``App`, `AppMode` | | Change app state / mode transitions | `ui/app.rs``App`, `AppMode` |
| Write a test for model logic | `model/types.rs``mod tests` / `mod formula_tests` | | Write a test for model logic | `model/types.rs``mod tests` / `mod formula_tests` |
| Write a test for a command | `command/cmd.rs``mod tests` | | Write a test for a command | `command/cmd/<module>.rs` → colocated `mod tests` |
--- ---
@ -39,7 +39,7 @@ User keypress → Keymap lookup → Cmd::execute(&CmdContext) → Vec<Box<dyn Ef
``` ```
```rust ```rust
// src/command/cmd.rs // src/command/cmd/core.rs
pub trait Cmd: Debug + Send + Sync { pub trait Cmd: Debug + Send + Sync {
fn name(&self) -> &'static str; fn name(&self) -> &'static str;
fn execute(&self, ctx: &CmdContext) -> Vec<Box<dyn Effect>>; fn execute(&self, ctx: &CmdContext) -> Vec<Box<dyn Effect>>;
@ -69,7 +69,7 @@ pub trait Effect: Debug {
} }
``` ```
**To add a command**: implement `Cmd`, then in `default_registry()` call `r.register(...)` or use the `effect_cmd!` macro for simple cases. Bind it in `default_keymaps()`. **To add a command**: implement `Cmd` in the appropriate `command/cmd/` submodule, then register in `command/cmd/registry.rs`. Use the `effect_cmd!` macro (in `effect_cmds.rs`) for simple effect-wrapping commands. Bind it in `default_keymaps()`.
**To add an effect**: implement `Effect` in `effect.rs`, add a constructor function. **To add an effect**: implement `Effect` in `effect.rs`, add a constructor function.
@ -273,40 +273,61 @@ pub enum ModeKey {
## File Format (.improv) ## File Format (.improv)
Plain-text markdown-like. **Not JSON** (JSON is legacy, auto-detected by `{` prefix). Plain-text markdown-like, defined by a PEG grammar (`persistence/improv.pest`).
Parsed by pest; the grammar is the single source of truth for both the parser
and the grammar-walking test generator.
**Not JSON** (JSON is legacy, auto-detected by `{` prefix).
``` ```
v2025-04-09
# Model Name # Model Name
Initial View: Default
## Category: Region ## View: Default
- North Region: row
- South Measure: column
- East [Coastal] ← item in group "Coastal" |Time Period|: page, Q1 ← pipe-quoted name, page with selection
- West [Coastal] hidden: Region/Internal
> Coastal ← group definition collapsed: |Time Period|/|2024|
format: ,.2f
## Category: Measure
- Revenue
- Cost
- Profit
## Formulas ## Formulas
- Profit = Revenue - Cost [Measure] ← [TargetCategory] - Profit = Revenue - Cost [Measure] ← [TargetCategory]
## Category: Region
- North, South, East, West ← bare items, comma-separated
- Coastal_East[Coastal] ← grouped item (one per line)
- Coastal_West[Coastal]
> Coastal ← group definition
## Category: Measure
- Revenue, Cost, Profit
## Data ## Data
Region=East, Measure=Revenue = 1200 Region=East, Measure=Revenue = 1200
Region=East, Measure=Cost = 800 Region=East, Measure=Cost = 800
Region=West, Measure=Revenue = "pending" ← text value in quotes Region=West, Measure=Revenue = |pending| pipe-quoted text value
## View: Default (active)
Region: row
Measure: column
Time: page, Q1 ← page axis with selected item
hidden: Region/Internal
collapsed: Time/2024
format: ,.2f
``` ```
### Name quoting
Bare names match `[A-Za-z_][A-Za-z0-9_-]*`. Everything else uses CL-style
pipe quoting: `|Income, Gross|`, `|2025|`, `|Name with spaces|`.
Escapes inside pipes: `\|` (literal pipe), `\\` (backslash), `\n` (newline).
### Section order
`format_md` writes Views → Formulas → Categories → Data (smallest to largest).
The parser accepts sections in any order.
### Key design choices
- Version line (`v2025-04-09`) enables future format changes.
- `Initial View:` is a top-level header, not embedded in view sections.
- Text cell values are always pipe-quoted to distinguish from numbers.
- Bare items are comma-separated on one line; grouped items get one line each.
Gzip variant: `.improv.gz` (same content, gzipped). Persistence code: `persistence/mod.rs`. Gzip variant: `.improv.gz` (same content, gzipped). Persistence code: `persistence/mod.rs`.
--- ---
@ -335,10 +356,11 @@ Import flags: `--category`, `--measure`, `--time`, `--skip`, `--extract`, `--axi
| indexmap 2 | Ordered maps (categories, views) | | indexmap 2 | Ordered maps (categories, views) |
| anyhow | Error handling | | anyhow | Error handling |
| chrono 0.4 | Date parsing in import | | chrono 0.4 | Date parsing in import |
| pest + pest_derive | PEG parser for .improv format |
| flate2 | Gzip for .improv.gz | | flate2 | Gzip for .improv.gz |
| csv | CSV parsing | | csv | CSV parsing |
| enum_dispatch | CLI subcommand dispatch | | enum_dispatch | CLI subcommand dispatch |
| **dev:** proptest, tempfile | Property testing, temp dirs | | **dev:** proptest, tempfile, pest_meta | Property testing, temp dirs, grammar AST for test generator |
--- ---
@ -372,8 +394,21 @@ Lines / tests / path — grouped by layer.
### Command layer ### Command layer
``` ```
3373 / 74t command/cmd.rs Cmd trait, CmdContext, CmdRegistry, 40+ commands command/cmd/ Cmd trait, CmdContext, CmdRegistry, 40+ commands
1068 / 22t command/keymap.rs KeyPattern, Binding, Keymap, ModeKey, 14 mode keymaps 297 / 2t core.rs Cmd trait, CmdContext, CmdRegistry, parse helpers
586 / 0t registry.rs default_registry() — all command registrations
475 / 10t navigation.rs Move, EnterAdvance, PageNext/Prev
198 / 6t cell.rs ClearCell, YankCell, PasteCell, TransposeAxes, SaveCmd
330 / 7t commit.rs CommitFormula, CommitCategoryAdd/ItemAdd, CommitExport
437 / 5t effect_cmds.rs effect_cmd! macro, 25+ parseable effect-wrapper commands
409 / 7t grid.rs ToggleGroup, ViewNavigate, DrillIntoCell, TogglePruneEmpty
308 / 8t mode.rs EnterMode, Quit, EditOrDrill, EnterTileSelect, etc.
587 / 13t panel.rs Panel toggle/cycle/cursor, formula/category/view panel cmds
202 / 4t search.rs SearchNavigate, SearchOrCategoryAdd, ExitSearchMode
256 / 7t text_buffer.rs AppendChar, PopChar, CommandModeBackspace, ExecuteCommand
160 / 5t tile.rs MoveTileCursor, TileAxisOp
121 / 0t mod.rs Module declarations, re-exports, test helpers
1066 / 22t command/keymap.rs KeyPattern, Binding, Keymap, ModeKey, 14 mode keymaps
236 / 19t command/parse.rs Script/command-line parser (prefix syntax) 236 / 19t command/parse.rs Script/command-line parser (prefix syntax)
12 / 0t command/mod.rs 12 / 0t command/mod.rs
``` ```
@ -408,7 +443,8 @@ Lines / tests / path — grouped by layer.
400 / 0t draw.rs TUI event loop (run_tui), frame composition 400 / 0t draw.rs TUI event loop (run_tui), frame composition
391 / 0t main.rs CLI entry (clap): open, import, cmd, script 391 / 0t main.rs CLI entry (clap): open, import, cmd, script
228 / 29t format.rs Number display formatting (view-only rounding) 228 / 29t format.rs Number display formatting (view-only rounding)
806 / 38t persistence/mod.rs .improv save/load (markdown format + gzip + legacy JSON) 124 / 0t persistence/improv.pest PEG grammar — single source of truth for .improv format
2291 / 83t persistence/mod.rs .improv save/load (pest parser + format + gzip + legacy JSON)
``` ```
### Context docs ### Context docs
@ -419,7 +455,7 @@ context/repo-map.md This file
docs/design-notes.md Product vision & non-goals (salvaged from former SPEC.md) docs/design-notes.md Product vision & non-goals (salvaged from former SPEC.md)
``` ```
**Total: ~16,500 lines, 510 tests.** **Total: ~21,400 lines, 568 tests.**
--- ---
@ -440,7 +476,7 @@ widgets or write tests that just exercise trivial getters. Coverage should be ru
| **Formula** (parser, eval) | Unit tests per operator/construct | Cover each BinOp, AggFunc, IF, WHERE, unary minus, chained formulas, error cases (div-by-zero, missing ref). Ensure eval uses full f64 precision — never display-rounded values. | | **Formula** (parser, eval) | Unit tests per operator/construct | Cover each BinOp, AggFunc, IF, WHERE, unary minus, chained formulas, error cases (div-by-zero, missing ref). Ensure eval uses full f64 precision — never display-rounded values. |
| **View** (types, layout) | Unit tests + **proptest** | Property tests for axis assignment invariants (each category on exactly one axis, transpose is involutive, etc.). Unit tests for layout computation, records mode detection, drill. | | **View** (types, layout) | Unit tests + **proptest** | Property tests for axis assignment invariants (each category on exactly one axis, transpose is involutive, etc.). Unit tests for layout computation, records mode detection, drill. |
| **Command** (cmd, keymap, parse) | Unit tests | Test command execution by building a `CmdContext` and asserting on returned effects. Test keymap lookup fallback chain. Test script parser with edge cases (quoting, comments, dots). | | **Command** (cmd, keymap, parse) | Unit tests | Test command execution by building a `CmdContext` and asserting on returned effects. Test keymap lookup fallback chain. Test script parser with edge cases (quoting, comments, dots). |
| **Persistence** | Round-trip tests | `save → load → save` must be identical. Cover groups, formulas, views, hidden items, legacy JSON detection. | | **Persistence** | Round-trip + grammar-generated | `save → load → save` must be identical. Grammar-walking generator produces random valid files from the pest AST; proptests verify `parse(generate())` and `parse(format(parse(generate())))`. Cover groups, formulas, views, hidden items, pipe quoting edge cases. |
| **Format** | Unit tests | Boundary cases: comma placement at 3/4/7 digits, negative numbers, rounding half-away-from-zero (not banker's), zero, small fractions. | | **Format** | Unit tests | Boundary cases: comma placement at 3/4/7 digits, negative numbers, rounding half-away-from-zero (not banker's), zero, small fractions. |
| **Import** (analyzer, csv, wizard) | Unit tests | Field classification heuristics, CSV quoting (RFC 4180), multi-file merge, date extraction. | | **Import** (analyzer, csv, wizard) | Unit tests | Field classification heuristics, CSV quoting (RFC 4180), multi-file merge, date extraction. |
| **UI rendering** (grid, panels, draw, help) | Generally skip | Ratatui widgets are hard to unit-test and change frequently. Test the *logic* they consume (layout, cat_tree, format) rather than the rendering itself. | | **UI rendering** (grid, panels, draw, help) | Generally skip | Ratatui widgets are hard to unit-test and change frequently. Test the *logic* they consume (layout, cat_tree, format) rather than the rendering itself. |
@ -488,6 +524,6 @@ examples).
5b. **Formula evaluation is fixed-point.** `recompute_formulas(none_cats)` iterates formula evaluation until values stabilize, using a cache. `evaluate_aggregated` checks the cache for formula results. Circular refs produce `CellValue::Error("circular")`. 5b. **Formula evaluation is fixed-point.** `recompute_formulas(none_cats)` iterates formula evaluation until values stabilize, using a cache. `evaluate_aggregated` checks the cache for formula results. Circular refs produce `CellValue::Error("circular")`.
6. **Keybindings are per-mode.** `ModeKey::from_app_mode()` resolves the current mode, then the corresponding `Keymap` is looked up. Normal + `search_mode=true` maps to `SearchMode`. 6. **Keybindings are per-mode.** `ModeKey::from_app_mode()` resolves the current mode, then the corresponding `Keymap` is looked up. Normal + `search_mode=true` maps to `SearchMode`.
7. **`effect_cmd!` macro** generates a command struct that just produces effects. Use for simple commands without complex logic. 7. **`effect_cmd!` macro** generates a command struct that just produces effects. Use for simple commands without complex logic.
8. **`.improv` format is markdown-like**, not JSON. See `persistence/mod.rs`. JSON is legacy only. 8. **`.improv` format is defined by a PEG grammar** (`persistence/improv.pest`). Parsed by pest. Names use CL-style `|...|` pipe quoting when they aren't valid bare identifiers. JSON is legacy only.
9. **`IndexMap`** is used for categories and views to preserve insertion order. 9. **`IndexMap`** is used for categories and views to preserve insertion order.
10. **`MAX_CATEGORIES = 12`** applies only to `CategoryKind::Regular`. Virtual/Label categories are exempt. 10. **`MAX_CATEGORIES = 12`** applies only to `CategoryKind::Regular`. Virtual/Label categories are exempt.