Grammar Tools

This page is intended as user documentation for a group of grammar tools developed by the author. As such, this page should be of little or no interest to others. The tools themselves consist of a bunch of awk, sh, and perl programs that are usable, but not very robust. The tools described below are available under a Creative Commons License as a tar file.

  1. gramxref
  2. gramterm
  3. lineno
  4. Token Generators
  5. bnf2wirth
  6. derive


gramxref is a shell script which reads a Wirth BNF grammar from standard input and writes a grammar cross reference to standard output. Comment lines and blank lines are ignored; all production lines are numbered. The input itself may not contain line numbers.


gramxref  < example.grm  > example.xref


gramterm is a shell script which reads a Wirth BNF grammar from standard input and writes a list of terminal symbols to standard output. Terminal symbols are sorted into Pargen standard order, namely: reserved words (alphabetically), punctuation, other terminal symbols. The input may contain comment lines and blank lines.

Actually, gramterm provides post-processing to getterm, which is an awk script.


gramterm  < example.grm  > terminalList


lineno is an awk script which reads a Wirth BNF grammar from standard input and writes a copy of the input to standard output with the grammar productions numbered. lineno uses the same basic line numbering scheme as gramxref.


lineno < example.grm > example.lnum

Token Generators

For aid in generating lexical tokens, three distinct programs are provided: tokenstring, tokenenum, tokenint. Each is an awk script that takes from standard input a dictionary (except tokenstring) followed by a list of tokens (one per line) as generated by gramterm and outputs C++ code to standard output:

Typical usage:

tokenstring < terminalList >


bnf2wirth is an awk script for converting Pargen BNF grammars to Wirth BNF grammar grammars. Only the most basic conversion is done. Nonterminals have their angle brackets removed. Comments are converted from beginning * to #. The output is defined to be symbol is an =.

Many Pargen conventions are not currently handled and may not be present in the input:

bnf2wirth < example.bnf > example.grm


derive is a Perl program for deriving strings from a grammar, which is read from standard input. Output is to standard output.

The grammar should be in standard BNF without comments or blank lines. Each line should contain a single production. Lines may not contain tabs or trailing spaces. Each grammar symbol must be separated from every other symbol by at least one space. Nonterminals need not be enclosed in angle brackets (<...>).

By default, derive generates strings in order of length, basically using a leftmost derivation (but generating all substitutions of the leftmost nonterminal). If the same output string appears more than once, then the grammar is ambiguous. By default, derive produces all strings which are of length 5 or less.

derive still needs work!


derive  < example.grm  > stringList


Robert Noonan,
May 24, 2004