feat(formula): support pipe-quoted identifiers |...|

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Edward Langley
2026-04-08 23:55:06 -07:00
parent 35ed6a13bf
commit fb8b6ca053
2 changed files with 125 additions and 13 deletions

View File

@ -115,17 +115,28 @@ Formulas are parsed into a typed AST (`Expr` enum) at entry time. If the syntax
is invalid, the user gets an error immediately. The evaluator only sees
well-formed trees — it does not need to handle malformed input.
### Formula Tokenizer: Multi-Word Identifiers and Keywords
### Formula Tokenizer: Identifiers and Quoting
The formula tokenizer supports multi-word identifiers (e.g., `Total Revenue`)
by allowing spaces within identifier tokens when followed by non-operator
characters. However, keywords (`WHERE`, `SUM`, `AVG`, `MIN`, `MAX`, `COUNT`,
`IF`) act as token boundaries — the tokenizer breaks an identifier when:
1. The identifier collected **so far** is a keyword (e.g., `WHERE ` stops at `WHERE`).
2. The **next word** after a space is a keyword (e.g., `Revenue WHERE` stops at `Revenue`).
**Bare identifiers** support multi-word names (e.g., `Total Revenue`) by
allowing spaces when followed by non-operator, non-keyword characters. Keywords
(`WHERE`, `SUM`, `AVG`, `MIN`, `MAX`, `COUNT`, `IF`) act as token boundaries.
This ensures `SUM(Revenue WHERE Region = "East")` tokenizes correctly as
separate tokens while `Total Revenue` remains a single identifier.
**Pipe-quoted identifiers** (`|...|`) allow any characters — including spaces,
keywords, and operators — inside the delimiters. Use pipes when a category or
item name collides with a keyword or contains special characters:
```
|WHERE| — category named "WHERE"
|Revenue (USD)| — name with parens
|Cost + Tax| — name with operator chars
SUM(|Net Revenue| WHERE |Region Name| = |East Coast|)
```
Pipes produce `Token::Ident` (same as bare identifiers), so they work
everywhere an identifier is expected: expressions, aggregate arguments, WHERE
clause category names and filter values. Double-quoted strings (`"..."`)
remain `Token::Str` and are used only for WHERE filter values in the
`split_where` pre-parse step.
---