docs: update repo-map and design-principles for pest parser

- Document PEG grammar as single source of truth for .improv format
- Update file format section with v2025-04-09 syntax: version line,
  Initial View, pipe quoting, Views→Formulas→Categories→Data order
- Add pipe quoting convention and grammar-driven testing principles
- Update file inventory (persistence: 124+2291 lines, 83 tests)
- Add pest/pest_meta to dependency table
- Update persistence testing guidance for grammar-walking generator

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Executed-By: spot
This commit is contained in:
Edward Langley
2026-04-09 02:58:38 -07:00
parent 70e7cfbef7
commit 07c8f4a40a
2 changed files with 100 additions and 37 deletions

View File

@ -38,8 +38,10 @@ editing) is two commands composed at the binding level, not a monolithic handler
### Dispatch Through Traits and Registries, Not Match Blocks
- **Commands**: 40+ types each implement `Cmd`. A `CmdRegistry` maps names to
constructor closures. Dispatching a key presses looks up the binding, resolves
- **Commands**: 40+ types each implement `Cmd`, organized by concern across
submodules in `command/cmd/` (navigation, cell, commit, grid, mode, panel,
search, text_buffer, tile, effect_cmds). A `CmdRegistry` maps names to
constructor closures. Dispatching a key press looks up the binding, resolves
the command name through the registry, and calls `execute`. No central
`match command_name { ... }` block.
@ -117,6 +119,26 @@ Formulas are parsed into a typed AST (`Expr` enum) at entry time. If the syntax
is invalid, the user gets an error immediately. The evaluator only sees
well-formed trees — it does not need to handle malformed input.
### Grammar-Defined File Format
The `.improv` file format is defined by a PEG grammar (`persistence/improv.pest`)
and parsed by pest. The grammar is the single source of truth — the parser is a
tree-walker over the grammar's parse tree, not an ad-hoc line scanner. This means:
- Adding a new format feature means updating the grammar first, then the walker.
- The grammar can be read as a specification independent of the Rust code.
- A grammar-walking test generator reads the grammar AST at test time (via
`pest_meta`) and produces random valid files, ensuring the parser accepts
everything the grammar describes.
### CL-Style Pipe Quoting for Names
Names in the `.improv` format use CL-style `|...|` pipe quoting. A name is bare
if it matches `[A-Za-z_][A-Za-z0-9_-]*`; everything else must be pipe-quoted.
Escapes inside pipes: `\|` (literal pipe), `\\` (backslash), `\n` (newline).
This convention is shared between the `.improv` persistence format and the
formula parser's identifier syntax.
### Formula Tokenizer: Identifiers and Quoting
**Bare identifiers** support multi-word names (e.g., `Total Revenue`) by
@ -244,9 +266,14 @@ milliseconds each are a sign something is wrong.
- **Model, formula, view**: the core logic. Unit tests for each operation and
edge case. Property tests for invariants. These are the highest-value tests.
- **Commands**: build a `CmdContext`, call `execute`, assert on the returned
effects. Pure functions — no terminal needed.
effects. Pure functions — no terminal needed. Tests are colocated in each
command submodule (`command/cmd/<module>.rs``mod tests`), with shared
test helpers in `command/cmd/mod.rs::test_helpers`.
- **Persistence**: round-trip tests (`save → load → save` produces identical
output). Cover groups, formulas, views, hidden items, legacy JSON.
output) plus grammar-driven property tests. The generator walks the pest
grammar AST to produce random valid files; proptests verify
`parse(generate())` succeeds and `parse(format(parse(generate())))` is
stable. Cover groups, formulas, views, hidden items, pipe quoting edges.
- **Format**: boundary cases for comma placement, rounding, negative numbers.
- **Import**: field classification heuristics, CSV quoting, multi-file merge.