The tokenizer was greedily consuming spaces and potentially merging
identifiers with subsequent keywords. This change improves the tokenizer
by:
- Peeking ahead past spaces to find the next word/token.
- Breaking the identifier if the next word is a known keyword (WHERE, SUM,
AVG, MIN, MAX, COUNT, IF).
- Adding support for more delimiter characters (<, >, =, !, ").
This fixes a regression where "Revenue WHERE" was treated as a single
identifier instead of an identifier followed by a WHERE clause.
Includes a new regression test for inline WHERE filters in aggregate
functions.
Co-Authored-By: fiddlerwoaroof/git-smart-commit (unsloth/gemma-4-26B-A4B-it-GGUF:UD-Q5_K_XL)